Professional Machine Learning Engineer
Job role description
A Professional Machine Learning Engineer designs, builds, and productionizes ML models to solve business challenges using Google Cloud technologies and knowledge of proven ML models and techniques. The ML Engineer collaborates closely with other job roles to ensure long-term success of models. The ML Engineer should be proficient in all aspects of model architecture, data pipeline interaction, and metrics interpretation. The ML Engineer needs familiarity with application development, infrastructure management, data engineering, and security. Through an understanding of training, retraining, deploying, scheduling, monitoring, and improving models, they design and create scalable solutions for optimal performance.
Certification Exam Guide
Section 1: ML Problem Framing
- Defining business problems
- Identifying nonML solutions
- Defining output use
- Managing incorrect results
- Identifying data sources
- Defining problem type (classification, regression, clustering, etc.)
- Defining outcome of model predictions
- Defining the input (features) and predicted output format
- Success metrics
- Key results
- Determination of when a model is deemed unsuccessful
- Assessing and communicating business impact
- Assessing ML solution readiness
- Assessing data readiness
- Aligning with Google AI principles and practices (e.g. different biases)
1.1 Translate business challenge into ML use case. Considerations include:
1.2 Define ML problem. Considerations include:
1.3 Define business success criteria. Considerations include:
1.4 Identify risks to feasibility and implementation of ML solution. Considerations include:
Section 2: ML Solution Architecture
- Optimizing data use and storage
- Data connections
- Automation of data preparation and model training/deployment
- SDLC best practices
- A variety of component types - data collection; data management
- Exploration/analysis
- Feature engineering
- Logging/management
- Automation
- Monitoring
- Serving
- Selection of quotas and compute/accelerators with components
- Building secure ML systems
- Privacy implications of data usage
- Identifying potential regulatory issues
2.1 Design reliable, scalable, highly available ML solutions. Considerations include:
2.2 Choose appropriate Google Cloud software components. Considerations include:
2.3 Choose appropriate Google Cloud hardware components. Considerations include:
2.4 Design architecture that complies with regulatory and security concerns.
Considerations include:
Section 3: Data Preparation and Processing
- Ingestion of various file types (e.g. Csv, json, img, parquet or databases, Hadoop/Spark)
- Database migration
- Streaming data (e.g. from IoT devices)
- Visualization
- Statistical fundamentals at scale
- Evaluation of data quality and feasibility
- Batching and streaming data pipelines at scale
- Data privacy and compliance
- Monitoring/changing deployed pipelines
- Data validation
- Handling missing data
- Handling outliers
- Managing large samples (TFRecords)
- Transformations (TensorFlow Transform)
- Data leakage and augmentation
- Encoding structured data types
- Feature selection
- Class imbalance
- Feature crosses
3.1 Data ingestion. Considerations include:
3.2 Data exploration (EDA). Considerations include:
3.3 Design data pipelines. Considerations include:
3.4 Build data pipelines. Considerations include:
3.5 Feature engineering. Considerations include:
Section 4: ML Model Development
- Choice of framework and model
- Modeling techniques given interpretability requirements
- Transfer learning
- Model generalization
- Overfitting
- Productionizing
- Training a model as a job in different environments
- Tracking metrics during training
- Retraining/redeployment evaluation
- Unit tests for model training and serving
- Model performance against baselines, simpler models, and across the time dimension
- Model explainability on Cloud AI Platform
- Distributed training
- Hardware accelerators
- Scalable model analysis (e.g. Cloud Storage output files, Dataflow, BigQuery, Google Data Studio)
4.1 Build a model. Considerations include:
4.2 Train a model. Considerations include:
4.3 Test a model. Considerations include:
4.4 Scale model training and serving. Considerations include:
Section 5: ML Pipeline Automation & Orchestration
- Identification of components, parameters, triggers, and compute needs
- Orchestration framework
- Hybrid or multi-cloud strategies
- Decoupling components with Cloud Build
- Constructing and testing of parameterized pipeline definition in SDK
- Tuning compute performance
- Performing data validation
- Storing data and generated artifacts
- Model binary options
- Google Cloud serving options
- Testing for target performance
- Setup of trigger and pipeline schedule
- Organization and tracking experiments and pipeline runs
- Hooking into model and dataset versioning
- Model/dataset lineage
- Hooking models into existing CI/CD deployment system
- A/B and canary testing
5.1 Design pipeline. Considerations include:
5.2 Implement training pipeline. Considerations include:
5.3 Implement serving pipeline. Considerations include:
5.4 Track and audit metadata. Considerations include:
5.5 Use CI/CD to test and deploy models. Considerations include:
Section 6: ML Solution Monitoring, Optimization, and Maintenance
- Performance and business quality of ML model predictions
- Logging strategies
- Establishing continuous evaluation metrics
- Permission issues (IAM)
- Common training and serving errors (TensorFlow)
- ML system failure and biases
- Optimization and simplification of input pipeline for training
- Simplification techniques
- Identification of appropriate retraining policy
6.1 Monitor ML solutions. Considerations include:
6.2 Troubleshoot ML solutions. Considerations include:
6.3 Tune performance of ML solutions for training & serving in production. Considerations include: