Google Cloud Big Data and Machine Learning Blog

Innovation in data processing and machine learning technology

Cloud ML Engine adds Cloud TPU support for training

Monday, May 21, 2018

By Nikhil Kothari, Cloud ML Engine Engineering Lead

Starting today, Cloud Machine Learning Engine (ML Engine) offers the option to accelerate training with Cloud TPUs as a beta feature. Getting started is easy, since Cloud TPU quota is now available to all GCP customers.

Cloud ML Engine enables you to train and deploy machine learning models on datasets of many types and sizes, using the flexibility and production-readiness of TensorFlow. As a managed service, ML Engine handles the infrastructure, compute resources, and job scheduling on your behalf, allowing you to focus on data and modeling.

In March 2017, we launched Cloud ML Engine to provide a managed TensorFlow service, with the ability to scale machine learning workloads using distributed training and GPU acceleration. Over the last year, we have continued to release new features and improvements including beta support for NVIDIA V100 GPUs, online prediction as a deployment capability, and improvements to the hyperparameter tuning feature.

Today, we are adding support for Cloud TPUs, enabling you to train a variety of high-performance, open-source reference models with differentiated performance per dollar. Or, you can choose to accelerate your own models written with high-level TensorFlow APIs.

Recently launched in beta, Cloud TPUs are a family of Google-designed hardware accelerators built from the ground up for machine learning. Cloud TPUs recently won the ImageNet Training Cost category of Stanford’s DAWNBench competition, and their performance and cost advantages were recently analyzed in detail.

Getting started with Cloud TPU on ML Engine

ML Engine automatically handles provisioning and management of Cloud TPU nodes, so you can use TPUs just as easily as CPUs and GPUs. Additionally, you can use ML Engine’s hyperparameter tuning feature in your Cloud TPU jobs to optimize your hyperparameters—combining scale, performance, and algorithms to improve your models. Finally, the resulting models can be deployed with ML Engine to issue prediction requests, or submit batch prediction jobs.

Read this guide to learn more about how you can use Cloud TPUs with ML Engine for training jobs.

  • Big Data Solutions

  • Product deep dives, technical comparisons, how-to's and tips and tricks for using the latest data processing and machine learning technologies.

  • Learn More

12 Months FREE TRIAL

Try BigQuery, Machine Learning and other cloud products and get $300 free credit to spend over 12 months.