Implement machine learning

Last reviewed 2023-08-23 UTC

This document in the Google Cloud Architecture Framework explains some of the core principles and best practices for data analytics in Google Cloud. You learn about some of the key AI and machine learning (ML) services, and how they can help during the various stages of the AI and ML lifecycle. These best practices help you to meet your AI and ML needs and create your system design. This document assumes that you're familiar with basic AI and ML concepts.

To simplify the development process and minimize overhead when you build ML models on Google Cloud, consider the highest level of abstraction that makes sense for your use case. Level of abstraction is defined as the amount of complexity by which a system is viewed or programmed. The higher the level of abstraction, the less detail is available to the viewer.

To select Google AI and ML services based on your business needs, use the following table:

Persona Google services
Business users Standard solutions such as Contact Center AI Insights, Document AI, Discovery AI, and Cloud Healthcare API.
Developers with minimum ML experience Pretrained APIs address common perceptual tasks such as vision, video, and natural language. These APIs are supported by pretrained models and provide default detectors. They are ready to use without any ML expertise or model development effort. Pretrained APIs include: Vision API, Video API, Natural Language API, Speech-to-Text API, Text-to-Speech API, and Cloud Translation API.
Generative AI for Developers Vertex AI Search and Conversation lets developers use its out-of-the-box capabilities to build and deploy chatbots in minutes and search engines in hours. Developers who want to combine multiple capabilities into enterprise workflows can use the Gen App Builder API for direct integration.
Developers and data scientists AutoML enables custom model development with your own image, video, text, or tabular data. AutoML accelerates model development with automatic search through the Google model zoo for the most performant model architecture, so you don't need to build the model. AutoML handles common tasks for you, such as choosing a model architecture, hyperparameter tuning, provisioning machines for training and serving.
Data scientists and ML engineers Vertex AI custom model toolings let you train and serve custom models, and they operationalize the ML workflow. You can also run your ML workload on self-managed compute such as Compute Engine VMs.
Data scientists & machine learning engineers Generative AI support on Vertex AI (also known as genai) provides access to Google's large generative AI models so you can test, tune, and deploy the models in your AI-powered applications.
Data engineers, data scientists, and data analysts familiar with SQL interfaces BigQuery ML lets you develop SQL-based models on top of data that's stored in BigQuery.

Key services

The following table provides a high-level overview of AI and ML services:

Google service Description
Cloud Storage and BigQuery Provide flexible storage options for machine learning data and artifacts.
BigQuery ML Lets you build machine learning models directly from data housed inside BigQuery.
Pub/Sub, Dataflow,
Cloud Data Fusion, and Dataproc
Support batch and real-time data ingestion and processing. For more information, see Data Analytics.
Vertex AI Offers data scientists and machine learning engineers a single platform to create, train, test, monitor, tune, and deploy ML models for everything from generative AI to MLOps.

Tooling includes the following:
Vertex AI Search and Conversation Lets you build chatbots and search engines for websites and for use across enterprise data.
  • Conversational AI on Vertex AI Search and Conversation can help reinvent customer and employee interactions with generative-AI-powered chatbots and digital assistants. For example, with these tools, you can provide more than just information by enabling transactions from within the chat experience.
  • Enterprise Search on Vertex AI Search and Conversation, lets enterprises build search experiences for customers and employees on their public or private websites. In addition to providing high-quality multimodal search results, Enterprise Search can also summarize results and provide corresponding citations with generative AI.
Generative AI on Vertex AI Gives you access to Google's large generative AI models so you can test, tune, and deploy them for use in your AI-powered applications. Generative AI on Vertex AI is also known as genai.
  • Generative AI models, which are also known as Foundation models, are categorized by the type of content they're designed to generate. This content includes text and chat, image, code, and text embeddings.
  • Vertex AI Studio lets you rapidly prototype and test generative AI models in Google Cloud console. You can test sample prompts, design your own prompts, and customize foundation models to handle tasks that meet your application's needs.
  • Model Tuning - lets you customize foundation models for specific use cases by tuning them using a dataset of input-output examples.
  • Model Garden provides enterprise-ready foundation models, task-specific models, and APIs.
Pretrained APIs
AutoML Provides custom model tooling to build, deploy, and scale ML models. Developers can upload their own data and use the applicable AutoML service to build a custom model.
  • AutoML Image: Performs image classification and object detection on image data.
  • AutoML Video: Performs object detection, classification, and action recognition on video data.
  • AutoML Text: Performs language classification, entity extraction, and sentiment analysis on text data.
  • AutoML Translation: Detects and translates between language pairs.
  • AutoML Tabular: Lets you build a regression, classification, or forecasting model. Intended for structured data.
AI infrastructure Lets you use AI accelerators to process large-scale ML workloads. These accelerators let you train and get inference from deep learning models and from machine learning models in a cost-effective way.

GPUs can help with cost-effective inference and scale-up or scale-out training for deep learning models. Tensor Processing Units (TPUs) are custom-built ASICs to train and execute deep neural networks.
Dialogflow Delivers virtual agents that provide a conversational experience.
Contact Center AI Delivers an automated, insights-rich contact-center experience with Agent Assist functionality for human agents.
Document AI Provides document understanding at scale for documents in general, and for specific document types like lending-related and procurement-related documents.
Lending DocAI Automates mortgage document processing. Reduces processing time and streamlines data capture while supporting regulatory and compliance requirements.
Procurement DocAI Automates procurement data capture at scale by turning unstructured documents (like invoices and receipts) into structured data to increase operational efficiency, improve customer experience, and inform decision-making.
Recommendations Delivers personalized product recommendations.
Healthcare Natural Language AI Lets you review and analyze medical documents.
Media Translation API Enables real-time speech translation from audio data.

Data processing

Apply the following data processing best practices to your own environment.

Ensure that your data meets ML requirements

The data that you use for ML should meet certain basic requirements, regardless of data type. These requirements include the data's ability to predict the target, consistency in granularity between the data used for training and the data used for prediction, and accurately labeled data for training. Your data should also be sufficient in volume. For more information, see Data processing.

Store tabular data in BigQuery

If you use tabular data, consider storing all data in BigQuery and using the BigQuery Storage API to read data from it. To simplify interaction with the API, use one of the following additional tooling options, depending on where you want to read the data:

The input data type also determines the available model development tooling. Pre-trained APIs, AutoML, and BigQuery ML can provide more cost-effective and time-efficient development environments for certain image, video, text, and structured data use cases.

Ensure you have enough data to develop an ML model

To develop a useful ML model, you need to have enough data. To predict a category, the recommended number of examples for each category is 10 times the number of features. The more categories you want to predict, the more data you need. Imbalanced datasets require even more data. If you don't have enough labeled data available, consider semi-supervised learning.

Dataset size also has training and serving implications: if you have a small dataset, you can train it directly within a Notebooks instance; if you have larger datasets that require distributed training, use Vertex AI custom training service. If you want Google to train the model for your data, use AutoML.

Prepare data for consumption

Well-prepared data can accelerate model development. When you configure your data pipeline, make sure that it can process both batch and stream data so that you get consistent results from both types of data.

Model development and training

Apply the following model development and training best practices to your own environment.

Choose managed or custom-trained model development

When you build your model, consider the highest level of abstraction possible. Use AutoML when possible so that the development and training tasks are handled for you. For custom-trained models, choose managed options for scalability and flexibility, instead of self-managed options. To learn more about model development options, see Use recommended tools and products.

Consider the Vertex AI training service instead of self-managed training on Compute Engine VMs or Deep Learning VM containers. For a JupyterLab environment, consider Vertex AI Workbench, which provides both managed and user-managed JupyterLab environments. For more information, see Machine learning development and Operationalized training.

Use pre-built or custom containers for custom-trained models

For custom-trained models on Vertex AI, you can use pre-built or custom containers depending on your machine learning framework and framework version. Pre-built containers are available for Python training applications that are created for specific TensorFlow, scikit-learn, PyTorch, and XGBoost versions.

Otherwise, you can choose to build a custom container for your training job. For example, use a custom container if you want to train your model using a Python ML framework that isn't available in a pre-built container, or if you want to train using a programming language other than Python. In your custom container, pre-install your training application and all its dependencies onto an image that runs your training job.

Consider distributed training requirements

Consider your distributed training requirements. Some ML frameworks, like TensorFlow and PyTorch, let you run identical training code on multiple machines. These frameworks automatically coordinate division of work based on environment variables that are set on each machine. Other frameworks might require additional customization.

What's next

For more information about AI and machine learning, see the following:

Explore other categories in the Architecture Framework such as reliability, operational excellence, and security, privacy, and compliance.