TPUs

This page introduces Cloud TPU and shows you where to find information on using Cloud TPU with Google Kubernetes Engine. Tensor Processing Units (TPUs) are Google’s custom-developed application-specific integrated circuits (ASICs) used to accelerate TensorFlow machine learning workloads.

Overview

Using GKE to manage your Cloud TPU brings the following advantages:

  • Easier setup and management: When you use Cloud TPU, you need a Compute Engine VM to run your workload, and a Classless Inter-Domain Routing (CIDR) block for the Cloud TPU. GKE sets up and manages the VM and the CIDR block for you.

  • Optimized cost: GKE scales your VMs and Cloud TPU nodes automatically based on workloads and traffic. You only pay for Cloud TPU and the VM when you run workloads on them.

  • Flexible usage: It's a one-line change in your Pod spec to request a different hardware accelerator (CPU, GPU, or TPU):

    kind: Pod
    spec:
      containers:

    • name: example-container resources: limits: cloud-tpus.google.com/v2: 8 # See the line above for TPU, or below for CPU / GPU. # cpu: 2 # nvidia.com/gpu: 1
  • Scalability: GKE provides APIs (Job and Deployment) that can easily scale to hundreds of Pods and Cloud TPU nodes.

  • Fault tolerance: GKE's Job API, along with the TensorFlow checkpoint mechanism, provide the run-to-completion semantic. Your training jobs will automatically rerun with the latest state read from the checkpoint if failures occur on the VM instances or Cloud TPU nodes.

  • What's next

    • Follow the Cloud TPU ResNet tutorial, which shows you how to train the TensorFlow ResNet-50 model using Cloud TPU and GKE.
    • Alternatively, follow the quick guide to setting up Cloud TPU with GKE.
    • Learn about best practices for using Cloud TPU for your machine learning tasks.
    Оцените, насколько информация на этой странице была вам полезна:

    Оставить отзыв о...

    Текущей странице
    Kubernetes Engine Documentation