Cloud Machine Learning Engine enables model training, model deployment, and prediction, three important parts of the larger machine learning workflow. This document describes the overall machine learning process to provide some context for the Cloud ML Engine services.
While this description explains some fundamentals at a high level, it is by no means a primer on machine learning. If you are new to machine learning, you can find some excellent places to start in the list of machine learning resources included with this documentation.
The machine learning workflow
For the purposes of this document, a complete machine learning scenario starts with a problem that you want to solve and ends with a mechanism by which your customers can make predictions about new instances of similar data. Your problem must be one that can be solved by finding patterns in data (even if the patterns aren't obvious or intuitive to a human analyst). You must also have access to a large set of existing data that includes the attribute (called a feature in machine learning) that you want to be able to infer from the other features. You select a machine learning algorithm (or a set of them) to use and then train it with your input data to arrive at the right settings and configuration (these settings are called hyperparameters—parameters to the logic that makes your machine learning model work).
A classic example is to begin with a large set of data describing the characteristics of houses in a given area and end with the means to predict the sale price for a house given a defined set of data about it. You can think of your model as a recipe: follow it to input the right ingredients (data features) and it yields the desired results.
To get to a production-ready model, you must work through four phases: data exploration and preparation, model development and training, model testing and deployment, and operational development and management. These phases, and the steps within them, are iterative; you mustn't expect to finish one phase and then move on to the next, never to return. You may need to reevaluate and go back to a previous step at any point in the process.
Step 0: Evaluating the problem
Before you even start thinking about how you might solve a problem with machine learning, you should take some time to think carefully about the problem you are trying to solve. Machine learning is never perfect, and is always subject to adjustment and iteration. If you don't enter the process with clarity of purpose and solid boundaries based on your business or use case, your project can be hard to evaluate.
Ask yourself the following questions before you begin engineering a machine learning approach.
Do you have a well-defined problem to solve?
Many different approaches and answers will present themselves to you once you start using machine learning to recognize patterns in data. Without a clear purpose, you can end up developing the wrong model. It's important to know what information you are trying to get out of the model and why.
Is machine learning the best solution for this problem?
Supervised machine learning (the style of machine learning described in this documentation) is well suited to certain kinds of problems. You should only consider using it for your problem if:
- You have access to a sizeable set of data from which to train your model. There are no absolutes about how much data is enough, but every feature (data attribute) that you include in your model increases the number of instances (data records) you'll need to properly train it. Also account for splitting your dataset into three subsets: one for training, one for evaluation, and one for testing.
- You can't use an easier and more concrete mathematical model to solve your problem.
How will you measure the model's success?
One of the biggest challenges of creating a machine learning model is knowing when you are done. It is always tempting to continue refining the model forever, extracting increasingly small improvements in accuracy. You should know going into the process what success means: how much accuracy is enough accuracy for your needs? Consider the consequences of that level of error.
Data exploration and preparation
Machine learning is all about data. To get a good model, you must start with good data. Having good data is only the beginning, however. You must analyze and understand your data. During this phase, you'll get your data ready to use as input to the training. That might include:
- Joining data from multiple sources and rationalizing it into one dataset.
- Visualizing the data to look for trends.
- Using data-centric languages and tools to find patterns in the data.
- Identifying features in your data—the subset of data attributes in your raw data that you use in your model.
- Cleaning the data to find any anomalous values that can be caused by data entry or errors in measurement.
Most of the work of preparing your data for use in your machine learning model is about getting a consistent dataset and deciding how it might best be used to solve your problem. The final step in getting your data ready to use is preprocessing. In this step you transform valid, clean data into the format that best suits the needs of your model. Here are some examples of data preprocessing:
- Normalizing numeric data to a common scale.
- Applying formatting rules to data, like removing the HTML tagging from a text feature.
- Reducing data redundancy through simplification, as when converting a text feature to a bag of words representation.
- Representing text numerically, as when assigning values to each possible value in a categorical feature.
- Assigning key values to data instances.
Cloud ML Engine support for data exploration and preparation
The Cloud ML Engine services do not expose any data wrangling functionality directly. However, you can use Google Cloud Datalab for many of the tasks that make up data exploration and preparation. For example, you can visually analyze your data with dynamic graphs embedded in your Cloud Datalab notebook. You can find out more in the Cloud Datalab documentation. Other Google Cloud Platform services can help with this step as well, including:
Model development and training
Once you understand your data, you can begin to design models that use it to predict target values in new data. A given dataset can be used to create many models, predicting different aspects of similar data. Even when you want to solve a specific problem, you can use many different approaches. For each model you develop with your data, you'll go through three steps in order, though you'll frequently go back to previous steps to make corrections and refinements:
- Develop your model using established machine learning techniques or (much more rarely) by defining new operations and approaches.
- Train your model by feeding it data for which you already know the target values. You have the model predict target values for your training data so that it can adjust its settings to better fit the data (meaning it more accurately predicts the targets).
- Evaluate your trained model by using data that, like your training data, includes target values. You compare the results of your model's predictions to the actual values for the evaluation data and use statistical techniques appropriate to your model to gauge your success.
- Adjust the model by changing the operations or settings that you use.
Cloud ML Engine support for model development and training
In a Cloud ML Engine solution, you develop your model using TensorFlow. You create a Python training application as you would when running TensorFlow locally. Cloud ML Engine comes into the solution at the time of training. You can use it to:
- Train your model locally, mimicking the cloud-based process, in order to test your model and quickly iterate your design.
- Run your trainer in the cloud with managed, scalable computing resources.
- Run your trainer with distributed processing in the cloud to get results faster.
- Take advantage of cloud resources to automatically optimize your model's settings using hyperparameter tuning.
- Accelerate your training jobs for models with computationally intensive operations by accessing graphics processing units (GPUs) in the cloud.
Model testing and deployment
During training, you apply the model to known data to find the right settings to get the best results. When your results are good enough for the needs of your application, you should deploy the model to whatever system your application uses and test it.
To test your model, run data through it in a context as close as possible to how it will be used in your final application. You should use a different dataset for testing than you use for training and validation. Ideally, you'll use a separate set of training data each time you test, so that your model gets tested using data that it has never processed before. You might also want to create different sets of test data depending on the nature of your model. For example, you might use different data sets for particular locations or points in time, or you might divide the instances to mimic different demographics. This is another point in the machine learning process where fully understanding the problem you are solving is vitally important; you can't make smart choices about how to test your model unless you know the data domain that you're working with.
As you test your model, you should connect it to your application's infrastructure. It may be that you will always use your model by itself to get predictions for batches of new data, but usually there is another application in which your model plays one part of many.
You make adjustments as a result of your testing. You can uncover problems in your model, or in its interaction with the rest of your application at this stage.
Cloud ML Engine support for model testing and deployment
Operational development and management
In initial deployment, your primary concern was to test your model in its real-world context. When you are satisfied that the model is working well and meshing with the rest of your solution, you put the model into production use (which means different things depending on your application). At this point, you need to be able to monitor the model's operation and manage the jobs and resources that it uses. From a practical perspective, this is the phase where development activity gives way to operations.
Cloud ML Engine support for operational development and management
You can support the operation of your deployed model by using Google Cloud
Platform tools, such as the various Stackdriver tools, and the model, version,
and job management functions of Cloud ML Engine. This management
functionality is exposed in the JSON API, through
gcloud command-line tool, and in Google Cloud Platform Console.
- Learn about the features and interfaces of Cloud ML Engine.
- Experience a complete Cloud ML Engine workflow by working through the introductory walkthrough.