Graphics Processing Units (GPUs) can significantly accelerate the training process for many deep learning models. For example, GPUs can accelerate the training process for deep learning models designed for image classification, video analysis, and natural language processing because the training process for those models involves the compute-intensive task of matrix multiplication and other operations that can take advantage of a GPU's massively parallel architecture. This architecture is well-suited for algorithms designed to address embarrassingly parallel workloads.
Training a deep learning model that involves intensive compute tasks on extremely large datasets can take days to run on a single processor. However, if you design your program to offload those tasks to one or more GPUs, you can reduce training time to hours instead of days.
For general information about accelerated computing using GPUs, go to NVIDIA's page about Accelerated Computing. For detailed information about using GPUs with TensorFlow, go to using GPUs in the TensorFlow documentation.
Requesting GPU-enabled machines
To use GPUs in the cloud, configure your job to access GPU-enabled machines:
- Set the scale tier to
Configure each task (master, worker, or parameter server) to use one of the GPU-enabled machine types below, based on the number of GPUs and the type of accelerator required for your task:
standard_gpu: A single NVIDIA Tesla K80 GPU
complex_model_m_gpu: Four NVIDIA Tesla K80 GPUs
complex_model_l_gpu: Eight NVIDIA Tesla K80 GPUs
standard_p100: A single NVIDIA Tesla P100 GPU (Alpha)
complex_model_m_p100: Four NVIDIA Tesla P100 GPUs (Alpha)
See more information about specifying machine types for the custom scale tier.
Alternatively, if you are learning how to use Cloud ML Engine or
experimenting with GPU-enabled machines, you can set the scale tier to
BASIC_GPU to get a single worker instance with a single NVIDIA Tesla K80 GPU.
In addition, you need to run your job in a region that supports GPUs. The following regions currently provide access to GPUs:
Assigning ops to GPUs
To make use of the GPUs on a machine, make the appropriate changes to your TensorFlow trainer application:
High-level Estimator API: No code changes are necessary as long as your ClusterSpec is configured properly. If a cluster is a mixture of CPUs and GPUs, map the
psjob name to the CPUs and the
workerjob name to the GPUs.
Core Tensorflow API: You must assign ops to run on GPU-enabled machines. This process is the same as using GPUs with TensorFlow locally. You can use tf.train.replica_device_setter to assign ops to devices.
When you assign a GPU-enabled machine to a Cloud ML Engine process, that process has exclusive access to that machine's GPUs; you can't share the GPUs of a single machine in your cluster among multiple processes. The process corresponds to the distributed TensorFlow task in your cluster specification. The distributed TensorFlow documentation describes cluster specifications and tasks.
GPU device strings
standard_gpu machine's single GPU is identified as
Machines with multiple GPUs use identifiers ranging from
"/gpu:<var>n</var>". For example,
complex_model_m_gpu machines have four
GPUs identified as