AI Platform enables many parts of the machine learning (ML) workflow. This document describes the overall ML process to provide context for AI Platform services.
The ML workflow
The diagram below gives a high-level overview of the stages in an ML workflow. The blue-filled boxes indicate where AI Platform provides managed services and APIs:
To develop and manage a production-ready model, you must work through the following stages:
Source and prepare your data.
Develop your model.
Train an ML model on your data:
- Train model
- Evaluate model accuracy
- Tune hyperparameters
Deploy your trained model.
Send prediction requests to your model:
- Online prediction
- Batch prediction
Monitor the predictions on an ongoing basis.
Manage your models and model versions.
These stages are iterative. You may need to reevaluate and go back to a previous step at any point in the process.
The rest of this page discusses the stages in detail.
Before you start, evaluate the problem
Before you start thinking about how to solve a problem with ML, take some time to think about the problem you are trying to solve. Ask yourself the following questions:
Do you have a well-defined problem to solve?
Many different approaches are possible when using ML to recognize patterns in data. It's important to define the information you are trying to get out of the model and why you need that information.
Is ML the best solution for the problem?
Supervised ML (the style of ML described in this documentation) is well suited to certain kinds of problems.
You should only consider using ML for your problem if you have access to a sizable set of data from which to train your model. There are no absolutes about how much data is enough. Every feature (data attribute) that you include in your model increases the number of instances (data records) you need to properly train the model. See the ML best practices for some guidance on feature engineering.
You must also account for splitting your dataset into three subsets: one for training, one for evaluation (or validation), and one for testing.
Investigate alternatives that may provide an easier and more concrete way to solve the problem.
How can you measure the model's success?
One of the biggest challenges of creating an ML model is knowing when the model development phase is complete. It's tempting to continue refining the model forever, extracting increasingly small improvements in accuracy. You should know what success means before you begin the process. Consider the level of accuracy that is sufficient for your needs. Consider the consequences of the corresponding level of error.
Source and prepare your data
You must have access to a large set of training data that includes the attribute (called a feature in ML) that you want to be able to infer (predict) based on the other features.
For example, assume you want your model to predict the sale price of a house. Begin with a large set of data describing the characteristics of houses in a given area, including the sale price of each house.
Having sourced your data, you must analyze and understand the data and prepare it to be the input to the training process. For example, you may need to perform the following steps:
- Join data from multiple sources and rationalize it into one dataset.
- Visualize the data to look for trends.
- Use data-centric languages and tools to find patterns in the data.
- Identify features in your data. Features comprise the subset of data attributes that you use in your model.
- Clean the data to find any anomalous values caused by errors in data entry or measurement.
In the preprocessing step, you transform valid, clean data into the format that best suits the needs of your model. Here are some examples of data preprocessing:
- Normalizing numeric data to a common scale.
- Applying formatting rules to data. For example, removing the HTML tagging from a text feature.
- Reducing data redundancy through simplification. For example, converting a text feature to a bag of words representation.
- Representing text numerically. For example, assigning values to each possible value in a categorical feature.
- Assigning key values to data instances.
GCP support for data exploration and preparation
TensorFlow has several preprocessing libraries that you can use with AI Platform. For example, tf.transform.
In addition, consider the following GCP services:
Cloud Datalab supports many of the tasks that make up data exploration and preparation. For example, you can visually analyze your data with dynamic graphs embedded in a Cloud Datalab notebook.
BigQuery is a fully managed data warehouse service that allows ad hoc analysis on real-time data with standard SQL.
Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness.
Cloud Dataprep is an intelligent, serverless data service for visually exploring, cleaning, and preparing structured and unstructured data.
Code your model
Develop your model using established ML techniques or by defining new operations and approaches.
Start learning by working through TensorFlow's getting started guide. Then examine the samples provided with the AI Platform documentation. These samples have been developed specifically to work well with AI Platform.
Train, evaluate, and tune your model
AI Platform provides the services you need to train and evaluate your model in the cloud. In addition, AI Platform offers hyperparameter tuning functionality to optimize the training process.
When training your model, you feed it data for which you already know the value for your target data attribute (feature). You run the model to predict those target values for your training data, so that the model can adjust its settings to better fit the data and thus to predict the target value more accurately.
Similarly, when evaluating your trained model, you feed it data that includes the target values. You compare the results of your model's predictions to the actual values for the evaluation data and use statistical techniques appropriate to your model to gauge its success.
You can also tune the model by changing the operations or settings that you use to control the training process, such as the number of training steps to run. This technique is known as hyperparameter tuning.
Testing your model
During training, you apply the model to known data to adjust the settings to improve the results. When your results are good enough for the needs of your application, you should deploy the model to whatever system your application uses and test it.
To test your model, run data through it in a context as close as possible to your final application and your production infrastructure.
Use a different dataset from those used for training and evaluation. Ideally, you should use a separate set of data each time you test, so that your model is tested with data that it has never processed before.
You may also want to create different sets of test data depending on the nature of your model. For example, you may use different data sets for particular locations or points in time, or you may divide the instances to mimic different demographics.
During the testing process, you make adjustments to the model parameters and hyperparameters based on the results of the testing. You may uncover problems in the model or in its interaction with the rest of your application.
Host your model in the cloud
AI Platform provides tools to upload your trained ML model to the cloud, so that you can send prediction requests to the model.
In order to deploy your trained model on AI Platform, you must save your trained model using the tools provided by your machine learning framework. This involves serializing the information that represents your trained model into a file which you can deploy for prediction in the cloud.
Then you upload the saved model to a Cloud Storage bucket, and create a model resource on AI Platform, specifying the Cloud Storage path to your saved model.
Send prediction requests to your model
AI Platform provides the services you need to request predictions from your model in the cloud.
There are two ways to get predictions from trained models: online prediction (sometimes called HTTP prediction) and batch prediction. In both cases, you pass input data to a cloud-hosted machine-learning model and get inferences for each data instance.
Monitor your prediction service
Monitor the predictions on an ongoing basis. AI Platform provides APIs to examine running jobs. In addition, various GCP tools support the operation of your deployed model, such as the Stackdriver tools.
Manage your models and model versions
AI Platform provides various interfaces for managing your model and
versions, including a REST API, the
gcloud ml-engine command-line tool, and
the GCP Console.