Graphics Processing Units (GPUs) can significantly accelerate the training process for many deep learning models. For example, GPUs can accelerate the training process for deep learning models designed for image classification, video analysis, and natural language processing because the training process for those models involves the compute-intensive task of matrix multiplication and other operations that can take advantage of a GPU's massively parallel architecture. This architecture is well-suited for algorithms designed to address embarrassingly parallel workloads.
Training a deep learning model that involves intensive compute tasks on extremely large datasets can take days to run on a single processor. However, if you design your program to offload those tasks to one or more GPUs, you can reduce training time to hours instead of days.
For general information about accelerated computing using GPUs, go to NVIDIA's page about Accelerated Computing. For detailed information about using GPUs with TensorFlow, go to using GPUs in the TensorFlow documentation.
Requesting GPU-enabled machines
To use GPUs in the cloud, configure your job to access GPU-enabled machines:
- Set the scale tier to
- Configure each task (master, worker, or parameter server) to use one of the
GPU-enabled machine types:
standard_gputo give your task access to a single GPU.
complex_model_m_gputo give your task access to four GPUs.
complex_model_l_gputo give your task access to eight GPUs.
Alternatively, if you are learning how to use Cloud ML Engine or
experimenting with GPU-enabled machines, you can set the scale tier to
BASIC_GPU to get a single worker instance with a GPU.
In addition, you need to run your job in a region that supports GPUs. The following regions currently provide access to GPUs:
Assigning ops to GPUs
To make use of the GPUs on a machine, make the appropriate changes to your TensorFlow trainer application:
High-level Estimator API: No code changes are necessary as long as your ClusterSpec is configured properly. If a cluster is a mixture of CPUs and GPUs, map the
psjob name to the CPUs and the
workerjob name to the GPUs.
Core Tensorflow API: You must assign ops to run on GPU-enabled machines. This process is the same as using GPUs with TensorFlow locally. You can use tf.train.replica_device_setter to assign ops to devices.
When you assign a GPU-enabled machine to a Cloud ML Engine process, that process has exclusive access to that machine's GPUs; you can't share the GPUs of a single machine in your cluster among multiple processes. The process corresponds to the distributed TensorFlow task in your cluster specification. The distributed TensorFlow documentation describes cluster specifications and tasks.
GPU device strings
standard_gpu machine's single GPU is identified as
Machines with multiple GPUs use identifiers ranging from
"/gpu:<var>n</var>". For example,
complex_model_m_gpu machines have four
GPUs identified as