Choosing a TPU Service

The following table helps you decide which Cloud TPU service (Compute Engine, Google Kubernetes Engine, or AI Platform) will best serve your needs.

Compute Engine

  • Cloud TPU on Compute Engine is a good starting place for users new to Cloud TPU and for experienced machine learning users who want to manage their own Cloud TPU services. It includes:
    • The ctpu utility program that sets up your VM, TPU, and Cloud Storage resources.
    • A quickstart that guides you through training your first machine learning model.
    • Tutorials for image classification, object detection, and language translation models.
    • Tools for monitoring performance and resolving bottlenecks in TPU model processing.

Kubernetes Engine

  • Cloud TPU on Google Kubernetes Engine offers:
    • Easy setup and management: When you use Cloud TPU, you need a Compute Engine VM to run your workload, and a Classless Inter-Domain Routing (CIDR) block for Cloud TPU. Google Kubernetes Engine sets up and manages the VM and the CIDR block for you.
    • Optimized cost: Google Kubernetes Engine scales your VMs and Cloud TPU nodes automatically based on workloads and traffic. You only pay for Cloud TPU and the VM when you run workloads on them.
    • Flexible usage: Changing your hardware accelerator (CPU, GPU, or TPU) requires only a single line change in your Pod spec.
    • Scalability: Google Kubernetes Engine provides APIs (Job and Deployment) that can easily scale to hundreds of Pods and Cloud TPU nodes.
    • Fault tolerance: The Google Kubernetes Engine Job API, along with the TensorFlow checkpoint mechanism, provide the run-to-completion semantic. Should failures occur on a VM instance or Cloud TPU node, your training jobs automatically rerun from the latest state of the checkpoint.

ML Engine

  • Cloud TPU on AI Platform is a good place to start if you have some ML experience and want to take advantage of the AI Platform managed services and APIs. AI Platform manages the following ML workflow stages:
    • Train an ML model on your data:
      • Training an ML model on your data
      • Evaluating model accuracy
      • Tuning hyperparameters
    • Deploy your trained model.
    • Send prediction requests to your model:
      • Online prediction
      • Batch prediction
    • Monitor the predictions on an ongoing basis.
    • Manage your models and model versions.
Was this page helpful? Let us know how we did:

Send feedback about...