This document in the Architecture Framework: AI and ML perspective provides an overview of principles and recommendations to help you to optimize the performance of your AI and ML workloads on Google Cloud. The recommendations in this document align with the performance optimization pillar of the Architecture Framework.
AI and ML systems enable new automation and decision-making capabilities for your organization. The performance of these systems can directly affect your business drivers like revenue, costs, and customer satisfaction. To realize the full potential of your AI and ML systems, you need to optimize their performance based on your business goals and technical requirements. The performance optimization process often involves certain trade-offs. For example, a design choice that provides the required performance might lead to higher costs. The recommendations in this document prioritize performance over other considerations like costs.
To optimize AI and ML performance, you need to make decisions regarding factors like the model architecture, parameters, and training strategy. When you make these decisions, consider the entire lifecycle of the AI and ML systems and their deployment environment. For example, LLMs that are very large can be highly performant on massive training infrastructure, but very large models might not perform well in capacity-constrained environments like mobile devices.
Translate business goals to performance objectives
To make architectural decisions that optimize performance, start with a clear set of business goals. Design AI and ML systems that provide the technical performance that's required to support your business goals and priorities. Your technical teams must understand the mapping between performance objectives and business goals.
Consider the following recommendations:
- Translate business objectives into technical requirements: Translate the business objectives of your AI and ML systems into specific technical performance requirements and assess the effects of not meeting the requirements. For example, for an application that predicts customer churn, the ML model should perform well on standard metrics, like accuracy and recall, and the application should meet operational requirements like low latency.
- Monitor performance at all stages of the model lifecycle: During experimentation and training after model deployment, monitor your key performance indicators (KPIs) and observe any deviations from business objectives.
- Automate evaluation to make it reproducible and standardized: With a standardized and comparable platform and methodology for experiment evaluation, your engineers can increase the pace of performance improvement.
Run and track frequent experiments
To transform innovation and creativity into performance improvements, you need a culture and a platform that supports experimentation. Performance improvement is an ongoing process because AI and ML technologies are developing continuously and quickly. To maintain a fast-paced, iterative process, you need to separate the experimentation space from your training and serving platforms. A standardized and robust experimentation process is important.
Consider the following recommendations:
- Build an experimentation environment: Performance improvements require a dedicated, powerful, and interactive environment that supports the experimentation and collaborative development of ML pipelines.
- Embed experimentation as a culture: Run experiments before any production deployment. Release new versions iteratively and always collect performance data. Experiment with different data types, feature transformations, algorithms, and hyperparameters.
Build and automate training and serving services
Training and serving AI models are core components of your AI services. You need robust platforms and practices that support fast and reliable creation, deployment, and serving of AI models. Invest time and effort to create foundational platforms for your core AI training and serving tasks. These foundational platforms help to reduce time and effort for your teams and improve the quality of outputs in the medium and long term.
Consider the following recommendations:
- Use AI-specialized components of a training service: Such components include high-performance compute and MLOps components like feature stores, model registries, metadata stores, and model performance-evaluation services.
- Use AI-specialized components of a prediction service: Such components provide high-performance and scalable resources, support feature monitoring, and enable model performance monitoring. To prevent and manage performance degradation, implement reliable deployment and rollback strategies.
Match design choices to performance requirements
When you make design choices to improve performance, carefully assess whether the choices support your business requirements or are wasteful and counterproductive. To choose the appropriate infrastructure, models, or configurations, identify performance bottlenecks and assess how they're linked to your performance measures. For example, even on very powerful GPU accelerators, your training tasks can experience performance bottlenecks due to data I/O issues from the storage layer or due to performance limitations of the model itself.
Consider the following recommendations:
- Optimize hardware consumption based on performance goals: To train and serve ML models that meet your performance requirements, you need to optimize infrastructure at the compute, storage, and network layers. You must measure and understand the variables that affect your performance goals. These variables are different for training and inference.
- Focus on workload-specific requirements: Focus your performance optimization efforts on the unique requirements of your AI and ML workloads. Rely on managed services for the performance of the underlying infrastructure.
- Choose appropriate training strategies: Several pre-trained and foundational models are available, and more such models are released often. Choose a training strategy that can deliver optimal performance for your task. Decide whether you should build your own model, tune a pre-trained model on your data, or use a pre-trained model API.
- Recognize that performance-optimization strategies can have diminishing returns: When a particular performance-optimization strategy doesn't provide incremental business value that's measurable, stop pursuing that strategy.
Link performance metrics to design and configuration choices
To innovate, troubleshoot, and investigate performance issues, establish a clear link between design choices and performance outcomes. In addition to experimentation, you must reliably record the lineage of your assets, deployments, model outputs, and the configurations and inputs that produced the outputs.
Consider the following recommendations:
- Build a data and model lineage system: All of your deployed assets and their performance metrics must be linked back to the data, configurations, code, and the choices that resulted in the deployed systems. In addition, model outputs must be linked to specific model versions and how the outputs were produced.
- Use explainability tools to improve model performance: Adopt and standardize tools and benchmarks for model exploration and explainability. These tools help your ML engineers understand model behavior and improve performance or remove biases.
Contributors
Authors:
- Benjamin Sadik | AI and ML Specialist Customer Engineer
- Filipe Gracio, PhD | Customer Engineer
Other contributors:
- Kumar Dhanagopal | Cross-Product Solution Developer
- Marwan Al Shawi | Partner Customer Engineer
- Zach Seils | Networking Specialist