Google Cloud Platform

Cloud ML Engine adds Cloud TPU support for training

Starting today, Cloud Machine Learning Engine (ML Engine) offers the option to accelerate training with Cloud TPUs as a beta feature. Getting started is easy, since Cloud TPU quota is now available to all GCP customers.

Cloud ML Engine enables you to train and deploy machine learning models on datasets of many types and sizes, using the flexibility and production-readiness of TensorFlow. As a managed service, ML Engine handles the infrastructure, compute resources, and job scheduling on your behalf, allowing you to focus on data and modeling.

In March 2017, we launched Cloud ML Engine to provide a managed TensorFlow service, with the ability to scale machine learning workloads using distributed training and GPU acceleration. Over the last year, we have continued to release new features and improvements including beta support for NVIDIA V100 GPUs, online prediction as a deployment capability, and improvements to the hyperparameter tuning feature.

Today, we are adding support for Cloud TPUs, enabling you to train a variety of high-performance, open-source reference models with differentiated performance per dollar. Or, you can choose to accelerate your own models written with high-level TensorFlow APIs.

Recently launched in beta, Cloud TPUs are a family of Google-designed hardware accelerators built from the ground up for machine learning. Cloud TPUs recently won the ImageNet Training Cost category of Stanford’s DAWNBench competition, and their performance and cost advantages were recently analyzed in detail.

Getting started with Cloud TPU on ML Engine

ML Engine automatically handles provisioning and management of Cloud TPU nodes, so you can use TPUs just as easily as CPUs and GPUs. Additionally, you can use ML Engine’s hyperparameter tuning feature in your Cloud TPU jobs to optimize your hyperparameters—combining scale, performance, and algorithms to improve your models. Finally, the resulting models can be deployed with ML Engine to issue prediction requests, or submit batch prediction jobs.

Read this guide to learn more about how you can use Cloud TPUs with ML Engine for training jobs.