Cloud ML Engine Overview

Cloud Machine Learning Engine combines the managed infrastructure of Google Cloud Platform with the power and flexibility of TensorFlow. You can use it to train your machine learning models at scale, and to host trained models to make predictions about new data in the cloud. This page describes the purpose of Cloud ML Engine and introduces its components and high-level concepts.

Cloud ML Engine doesn't abstract away the essentials of understanding your data and modeling its features. However, you don't need a deep understanding of either of these to start working with the sample applications and tutorials. You can use the samples as a foundation for your own machine learning applications.

What it does

Cloud ML Engine mainly does two things:

  • Enables you to train machine learning models at scale by running TensorFlow training applications in the cloud.
  • Hosts those trained models for you in the cloud so that you can use them to get predictions about new data.

Cloud ML Engine manages the computing resources that your training job needs to run, so you can focus more on your model than on hardware configuration or resource management. Your use of the training service is bounded by pricing and resource quota policy, but within those bounds, you have access to powerful scalable computing.

Cloud ML Engine is designed to run your TensorFlow trainer with minimal alteration. However, there are a few required architectural choices and TensorFlow best practices that help your trainer work well with the Cloud ML Engine services.

End-to-end overview

Cloud ML Engine is only one piece in the set of tools you need to make a complete machine learning solution. This section describes the end-to-end Cloud ML Engine experience.

Prepare your trainer and data for the cloud

The key to getting started with Cloud ML Engine is your training application, written in TensorFlow. This application is responsible for defining your computation graph, managing the training and validation process, and exporting your model. You can develop your trainer as you would any other TensorFlow application, but you need to follow a few guidelines about your approach to work well with cloud training. If you have a TensorFlow application that you have been running locally on a single computer, the biggest change you are likely to need to make to your application is to add support for running it with distributed TensorFlow.

You must make your trainer into a Python package and stage it on Google Cloud Storage where your training job can access it. This step is included in your training job request when you use the gcloud command-line tool, but is left to you if calling the training service programmatically.

You must also have your training and validation data prepared for running your trainer. Your input data must be in a format that TensorFlow can process or you need to account for any transformation in your application. As with your application package, your data must be stored where Cloud ML Engine can access it. The easiest solution is to store your data in Google Cloud Storage in a bucket associated with the same project that you use for Cloud ML Engine tasks.

Train your model

With your trainer package and your data prepared, you can begin using Cloud ML Engine to train your model. The training service allocates resources in the cloud according to specifications you include with your job request. It installs your trainer package on each machine it allocates and runs each instance (called a replica). The service doesn't usually interact with your running trainer. It just manages the machines and monitors the application status. When the replica of your trainer that you specify as the controller (the master) either returns successfully or encounters an unrecoverable error, the training service stops the job and releases its resources.

While your trainer runs, it can write output to Google Cloud Storage locations. A trainer typically writes regular checkpoints during training and exports the trained model at the end of the job. Also while your trainer runs, Cloud ML Engine sends logging information to Stackdriver Logging and provides information to other Google Cloud Platform services.

You can create a model resource to assign your trained model to and then deploy your model version. Your hosted model version can then be used to get predictions for new data.

Get predictions

Cloud ML Engine supports two kinds of prediction: online and batch. Online prediction is optimized for handling a high rate of requests with minimal latency. You give it your data in a JSON request string and it returns predictions for that data in its response message. Batch prediction is optimized for getting inferences for large collections of data with minimal job duration. The procedure for batch predictions is a little more involved than for online. You put your input data instances in files on Google Cloud Storage and pass their paths to a job request. The service allocates a cluster of machines to run your prediction job and distributes your input data among them. It saves your predictions to files in a Google Cloud Storage location that you specify. You can also use the batch prediction service to get predictions from a saved model that isn't deployed to Cloud ML Engine, by putting the model in Google Cloud Storage and using its URI in your prediction request.

This overview touched on all of the major processes in a simple end-to-end scenario. You can find more details about everything you can do with Cloud ML Engine in the basic concepts of training and prediction.


This section describes the pieces that make up Cloud ML Engine and gives the primary purpose of each.


The core of Cloud ML Engine is the REST API, a set of RESTful services that manages jobs, models, and versions, and makes predictions on hosted models on Google Cloud Platform. You could use the JSON API directly, but you will most likely find that the other Cloud ML Engine components provide easier ways to accomplish the same tasks.

You can use the Google Cloud Client Library for Python to access the APIs in your Python code. When you do, you use Python representations of the resources and objects used by the API. This is easier and requires less code than working with the Web requests directly.

The gcloud command-line tool

You can accomplish many Cloud ML Engine tasks at the command line by using Google Cloud Platform's gcloud command-line tool. You can manage models and versions, and request predictions with gcloud ml-engine commands.

We recommend using gcloud commands for most Cloud ML Engine tasks. There are even a few commands that add utility beyond encapsulating the REST APIs.

Google Cloud Platform Console

You can manage your models and versions from Google Cloud Platform Console. This option gives you a graphical interface for working with your machine learning resources. Using Google Cloud Platform Console is particularly helpful when managing models, versions, and jobs. As part of Google Cloud Platform, your Cloud ML Engine resources in Google Cloud Platform Console are connected to useful tools, like Stackdriver Logging and Stackdriver Monitoring.

Cloud Datalab

Cloud Machine Learning Engine functionality has been integrated into Google Cloud Datalab. The interactive computational environment provided by Cloud Datalab notebooks makes your machine learning development experience easier. While the biggest strength of Cloud Datalab is its ability to interactively visualize your data, in some cases you can develop your whole machine learning solution in its notebook environment.


Many terms in machine learning are used to mean several different things, or are used casually and imprecisely. This section defines a few potentially confusing terms that are used in this documentation.

Projects, models, versions, and jobs

The core functionality of Cloud Machine Learning Engine involves some generic-sounding words that have very specific meaning in this context. This section defines project, model, version, and job as they are used in Cloud ML Engine and in the rest of this documentation.


Your project is your Google Cloud Platform project. In terms of Cloud ML Engine, your project is the logical container for your deployed models and jobs. Project is the common term for a working area in Google Cloud Platform, and it can have resources and applications associated with it. Each project that you use to develop Cloud ML Engine solutions must have Cloud Machine Learning Engine enabled. Your Google account can be a member of multiple Cloud Platform projects.


Model is the most loosely defined and overloaded term in machine learning. In the broadest sense, a model is the solution to a problem that you're trying to solve with machine learning. It's the recipe that, when you apply the right data to it, results in a predicted value.

Model has a more specific meaning in Cloud ML Engine, in addition to its generic one. A model is a logical container for individual versions of a solution to a problem. For example, a generic problem to solve is predicting the sale price of houses given a set of data about previous sales. When working on a housing price prediction solution, you might create a model in Cloud ML Engine called housing_prices. You might try multiple machine learning techniques to solve your problem. At each stage, you can deploy versions of that model. Each version might be completely different from the others, but you can organize them under the same model if you think it best for your workflow.

This documentation also uses the terms trained model and saved model. A trained model is the state of your computation graph and its settings after training. Most machine learning frameworks can serialize that information and create a file as a saved model. Your trainer exports a trained model, which you can deploy for prediction in the cloud.


A version is an instance of a machine learning solution stored in the Cloud ML Engine model service. You make a version by passing a serialized trained model (as a saved model) to the service.

Cloud ML Engine also has versions, which are used to define the features supported by its services.

To avoid confusion in this documentation, these two uses of version are usually clarified as model version and runtime version.


You interact with the services of Cloud ML Engine by initiating requests and jobs. Requests are regular Web API requests that return with a response object as quickly as possible. Jobs are long-running operations that are processed asynchronously. You submit a request to start the job and get a quick response that verifies the job status. Then you can request status periodically to track your job's progress.

Packaging, staging, exporting, and deploying models

You move models and data around, especially between your local environment and Google Cloud Storage, and between Google Cloud Storage and the Cloud ML Engine services. These terms are used in this documentation to mean specific operations in the process.


You package your training application, so that the Cloud ML Engine training service can install it on each training instance. You make your application into a standard Python package.

If you use the gcloud command-line tool to configure and run training jobs, the tool can package your application for you automatically.


You stage your training application package in a Google Cloud Storage bucket that your project has access to. This enables the training service to access the package and copy it to all of the training instances.

As with packaging, the gcloud command-line tool will stage your application for you as part of requesting a training job.


In the context of machine learning models, this documentation uses export to mean the process of serializing your graph and settings to file. You use your saved model and objects for exporting.


You deploy a model version when you create a version resource. You specify an exported model (a saved model file) and a model resource to assign the version to, and Cloud ML Engine hosts it so that you can get predictions with it.

Training instances and prediction nodes

Cloud ML Engine allocates processing resources in the cloud to run your training job when you create it. It is easiest to think of these resources, called training instances, as virtual machines (VMs), even though their implementation is different from that of traditional VMs. The resources that Cloud ML Engine uses to handle online and batch prediction are similar. They are called prediction nodes or just nodes.

The development environment that is configured on training instances and prediction nodes is defined by the Cloud ML Engine runtime version.

What's next

Send feedback about...

Cloud ML Engine for TensorFlow