Training Models in the Cloud

You can use Cloud Machine Learning Engine to run your TensorFlow training application in the cloud. This page enumerates the steps in the training process and points you to other pages that describe the steps in detail. You can find an overview of the training process with expanded background and context in the concepts section of this documentation.

Before you begin

Here are the steps you need to take before you train your model in the cloud:

  1. Configure your development environment by working through the setup section of the getting-started guide.

  2. Gather and prepare your training data.

  3. Put your training data in an online source that Cloud ML Engine can access.

Training your model with Cloud ML Engine

The following steps cover training a model from model development through managing your training jobs on Google Cloud Platform:

  1. Use TensorFlow to create your computation graph and training application, taking into account the requirements and best-practices for working with Cloud ML Engine.

  2. Package your trainer application and dependencies and put the package in a Google Cloud Storage location that your Cloud ML Engine project can access (this step is simplified when you use the gcloud command-line tool to start your jobs).

  3. Configure and start a Cloud ML Engine job to run your trainer.

  4. Monitor your job.

You may also want to use an advanced feature of Cloud ML Engine:

Finally, you may need to troubleshoot your training job if something goes wrong.

What’s next

Send feedback about...

Cloud ML Engine for TensorFlow