The legacy versions of AI Platform Training, AI Platform Prediction, AI Platform Pipelines, and AI Platform Data Labeling Service are deprecated and will no longer be available on Google Cloud after their shutdown date. All the functionality of legacy AI Platform and new features are available on the Vertex AI platform. See Migrate to Vertex AI to learn how to migrate your resources.

Machine learning workflow

AI Platform enables many parts of the machine learning (ML) workflow. This document provides an introductory description of the overall ML process and explains where each AI Platform service fits into the process.

For an introduction to the services, see the technical overview of AI Platform.

A brief description of machine learning

Machine learning (ML) is a subfield of artificial intelligence (AI). The goal of ML is to make computers learn from the data that you give them. Instead of writing code that describes the action the computer should take, your code provides an algorithm that adapts based on examples of intended behavior. The resulting program, consisting of the algorithm and associated learned parameters, is called a trained model.

The ML workflow

The diagram below gives a high-level overview of the stages in an ML workflow. The blue-filled boxes indicate where AI Platform provides managed services and APIs:

To develop and manage a production-ready model, you must work through the following stages:

Source and prepare your data.
Develop your model.
Train an ML model on your data:
- Train model
- Evaluate model accuracy
- Tune hyperparameters
Deploy your trained model.
Send prediction requests to your model:
- Online prediction
- Batch prediction
Monitor the predictions on an ongoing basis.
Manage your models and model versions.

These stages are iterative. You may need to reevaluate and go back to a previous step at any point in the process.

The rest of this page discusses the stages in detail.

Before you start, evaluate the problem

Before you start thinking about how to solve a problem with ML, take some time to think about the problem you are trying to solve. Ask yourself the following questions:

Do you have a well-defined problem to solve?

Many different approaches are possible when using ML to recognize patterns in data. It's important to define the information you are trying to get out of the model and why you need that information.

Is ML the best solution for the problem?

Supervised ML (the style of ML described in this documentation) is well suited to certain kinds of problems.

You should only consider using ML for your problem if you have access to a sizable set of data from which to train your model. There are no absolutes about how much data is enough. Every feature (data attribute) that you include in your model increases the number of instances (data records) you need to properly train the model. See the ML best practices for some guidance on feature engineering.

You must also account for splitting your dataset into three subsets: one for training, one for evaluation (or validation), and one for testing.

Investigate alternatives that may provide an easier and more concrete way to solve the problem.

How can you measure the model's success?

One of the biggest challenges of creating an ML model is knowing when the model development phase is complete. It's tempting to continue refining the model forever, extracting increasingly small improvements in accuracy. You should know what success means before you begin the process. Consider the level of accuracy that is sufficient for your needs. Consider the consequences of the corresponding level of error.

Source and prepare your data

You must have access to a large set of training data that includes the attribute (called a feature in ML) that you want to be able to infer (predict) based on the other features.

For example, assume you want your model to predict the sale price of a house. Begin with a large set of data describing the characteristics of houses in a given area, including the sale price of each house.

Data analysis

Having sourced your data, you must analyze and understand the data and prepare it to be the input to the training process. For example, you may need to perform the following steps:

Join data from multiple sources and rationalize it into one dataset.
Visualize the data to look for trends.
Use data-centric languages and tools to find patterns in the data.
Identify features in your data. Features comprise the subset of data attributes that you use in your model.
Clean the data to find any anomalous values caused by errors in data entry or measurement.

Data preprocessing

In the preprocessing step, you transform valid, clean data into the format that best suits the needs of your model. Here are some examples of data preprocessing:

Normalizing numeric data to a common scale.
Applying formatting rules to data. For example, removing the HTML tagging from a text feature.
Reducing data redundancy through simplification. For example, converting a text feature to a bag of words representation.
Representing text numerically. For example, assigning values to each possible value in a categorical feature.
Assigning key values to data instances.

Google Cloud support for data exploration and preparation

TensorFlow has several preprocessing libraries that you can use with AI Platform. For example, tf.transform.

You can deploy and serve scikit-learn pipelines on AI Platform to apply built-in transforms for training and online prediction. Applying custom transformations is in beta.

You can deploy a custom prediction routine (beta) to make sure AI Platform preprocesses input at prediction time in the same way that you preprocessed data during training.

In addition, consider the following Google Cloud services:

Vertex AI Workbench user-managed notebooks are Deep Learning VM Images instances pre-packaged with JupyterLab notebooks and optimized for deep learning data science tasks, from data preparation and exploration to quick prototype development.
BigQuery is a fully managed data warehouse service that allows ad hoc analysis on real-time data with standard SQL.
Dataproc is a fully-managed cloud service for running Apache Spark and Apache Hadoop clusters.
Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness.
Dataprep is an intelligent, serverless data service for visually exploring, cleaning, and preparing structured and unstructured data.

Code your model

Develop your model using established ML techniques or by defining new operations and approaches.

Start learning by working through TensorFlow's getting started guide. You can also follow the scikit-learn documentation or the XGBoost documentation to create your model. Then examine some code samples designed to work with AI Platform.

Train, evaluate, and tune your model

AI Platform provides the services you need to train and evaluate your model in the cloud. In addition, AI Platform offers hyperparameter tuning functionality to optimize the training process.

When training your model, you feed it data for which you already know the value for your target data attribute (feature). You run the model to predict those target values for your training data, so that the model can adjust its settings to better fit the data and thus to predict the target value more accurately.

Similarly, when evaluating your trained model, you feed it data that includes the target values. You compare the results of your model's predictions to the actual values for the evaluation data and use statistical techniques appropriate to your model to gauge its success.

You can also tune the model by changing the operations or settings that you use to control the training process, such as the number of training steps to run. This technique is known as hyperparameter tuning.

Testing your model

During training, you apply the model to known data to adjust the settings to improve the results. When your results are good enough for the needs of your application, you should deploy the model to whatever system your application uses and test it.

To test your model, run data through it in a context as close as possible to your final application and your production infrastructure.

Use a different dataset from those used for training and evaluation. Ideally, you should use a separate set of data each time you test, so that your model is tested with data that it has never processed before.

You may also want to create different sets of test data depending on the nature of your model. For example, you may use different data sets for particular locations or points in time, or you may divide the instances to mimic different demographics.

During the testing process, you make adjustments to the model parameters and hyperparameters based on the results of the testing. You may uncover problems in the model or in its interaction with the rest of your application.

Host your model in the cloud

AI Platform provides tools to upload your trained ML model to the cloud, so that you can send prediction requests to the model.

In order to deploy your trained model on AI Platform, you must save your trained model using the tools provided by your machine learning framework. This involves serializing the information that represents your trained model into a file which you can deploy for prediction in the cloud.

Then you upload the saved model to a Cloud Storage bucket, and create a model resource on AI Platform, specifying the Cloud Storage path to your saved model.

When you deploy your model, you can also provide custom code (beta) to customize how it handles prediction requests.

Send prediction requests to your model

AI Platform provides the services you need to request predictions from your model in the cloud.

There are two ways to get predictions from trained models: online prediction (sometimes called HTTP prediction) and batch prediction. In both cases, you pass input data to a cloud-hosted machine-learning model and get inferences for each data instance.

Monitor your prediction service

Monitor the predictions on an ongoing basis. AI Platform provides APIs to examine running jobs. In addition, various Google Cloud tools support the operation of your deployed model, such as Cloud Logging and Cloud Monitoring.

Manage your models and model versions

AI Platform provides various interfaces for managing your model and versions, including a REST API, the gcloud ai-platform command-line tool, and the Google Cloud console.

What's next

Get started with AI Platform Training and AI Platform Prediction using Keras.
Learn how to train with custom containers.
Learn how to train TensorFlow and XGBoost models without writing code by using AI Platform built-in algorithms.
Learn how to use custom prediction routines to add preprocessing and postprocessing for your online prediction requests.
Add custom code and custom scikit-learn transformations to your online prediction pipeline.
Learn more about AI Platform Training and AI Platform Prediction.