Using GPUs

This page explains how to run a Dataflow job with GPUs. Jobs using GPUs incur charges during Preview as specified in the Dataflow pricing page.

For more in-depth information about using GPUs with Dataflow, read Dataflow support for GPUs.

Provisioning GPU quota

GPU devices are subject to your Google Cloud project's quota availability. Request GPU quota in the region of your choice.

Installing GPU drivers

You must instruct Dataflow to install NVIDIA drivers onto the workers by appending install-nvidia-driver to the worker_accelerator configuration.

Binaries and libraries provided by the NVIDIA driver installer are mounted into the container running pipeline user code at /usr/local/nvidia/.

Configuring your container image

To interact with the GPUs, you might need additional NVIDIA software, such as GPU-accelerated libraries and the CUDA Toolkit. You must supply these libraries in the Docker container running user code.

You can customize the container image by supplying an image that fulfills the Apache Beam SDK container image contract and has the necessary libraries or by building on top of the images published with Apache Beam SDK releases.

To provide a custom container image, you must use Dataflow Runner v2 and supply the container image using the worker_harness_container_image pipeline option.

For more information, see Using custom containers.

Build a custom container image from a pre-existing base image with GPU software

You can build a Docker image that fulfills the Apache Beam SDK container contract from an existing base image. For example, TensorFlow Docker images and AI Platform Deep Learning Containers are preconfigured for GPU usage.

A sample Dockerfile looks like the following:

# Use a GPU-enabled Tensorflow Docker image. The image has Python 3.6.
FROM tensorflow/tensorflow:2.4.0-gpu
RUN pip install --no-cache-dir apache-beam[gcp]==2.26.0

# Copy the Apache Beam worker dependencies from the Beam Python 3.6 SDK image.
COPY --from=apache/beam_python3.6_sdk:2.26.0 /opt/apache/beam /opt/apache/beam

# Set the entrypoint to Apache Beam SDK worker launcher.
ENTRYPOINT [ "/opt/apache/beam/boot" ]

Verify that the Apache Beam version and Python interpreter minor version in your image matches the versions you use to launch the pipeline. For best results, consider using the latest versions of your chosen base image and Apache Beam.

Build a custom container image from an Apache Beam SDK image

To build on top of the Docker images published from Apache Beam SDK releases, you must install the required GPU libraries into these images.

A sample Dockerfile looks like the following:

FROM apache/beam_python3.7_sdk:2.24.0
ENV INSTALLER_DIR="/tmp/installer_dir"

# The base image has TensorFlow 2.2.0, which requires CUDA 10.1 and cuDNN 7.6.
# You can download cuDNN from NVIDIA website
COPY cudnn-10.1-linux-x64-v7.6.0.64.tgz $INSTALLER_DIR/cudnn.tgz
    # Download CUDA toolkit.
    wget -q -O $INSTALLER_DIR/ && \

    # Install CUDA toolkit. Print logs upon failure.
    sh $INSTALLER_DIR/ --toolkit --silent || (egrep '^\[ERROR\]' /var/log/cuda-installer.log && exit 1) && \
    # Install cuDNN.
    mkdir $INSTALLER_DIR/cudnn && \
    tar xvfz $INSTALLER_DIR/cudnn.tgz -C $INSTALLER_DIR/cudnn && \

    cp $INSTALLER_DIR/cudnn/cuda/include/cudnn*.h /usr/local/cuda/include && \
    cp $INSTALLER_DIR/cudnn/cuda/lib64/libcudnn* /usr/local/cuda/lib64 && \
    chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn* && \
    rm -rf $INSTALLER_DIR

# A volume with GPU drivers will be mounted at runtime at /usr/local/nvidia.
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/nvidia/lib64:/usr/local/cuda/lib64

Driver libraries in /usr/local/nvidia/lib64 must be discoverable in the container as shared libraries by configuring the LD_LIBRARY_PATH environment variable.

If you use TensorFlow, you must choose a compatible combination of CUDA Toolkit and cuDNN versions. For more details, read Software requirements and Tested build configurations.

Selecting type and number of GPUs for Dataflow workers

Dataflow allows you to configure the type and number of GPUs to attach to Dataflow workers using the worker_accelerator parameter. You can select the type and number of GPUs based on your use case and how you plan to utilize the GPUs in your pipeline.

The following GPU types are supported with Dataflow:

  • NVIDIA® Tesla® T4
  • NVIDIA® Tesla® P4
  • NVIDIA® Tesla® V100
  • NVIDIA® Tesla® P100
  • NVIDIA® Tesla® K80

For more detailed information about each GPU type, including performance data, read the GPU comparison chart.

Running your job with GPUs

To run a Dataflow job with GPUs, use the following command:


python PIPELINE \
  --runner "DataflowRunner" \
  --project "PROJECT" \
  --temp_location "gs://BUCKET/tmp" \
  --region "REGION" \
  --worker_zone "WORKER_ZONE" \
  --worker_harness_container_image "IMAGE" \
  --experiment "worker_accelerator=type:GPU_TYPE;count:GPU_COUNT;install-nvidia-driver" \
  --experiment "use_runner_v2"

Replace the following:

  • PIPELINE: your pipeline source code file
  • PROJECT: the Google Cloud project name
  • BUCKET: the Cloud Storage bucket
  • REGION: a regional endpoint
  • WORKER_ZONE: a Compute Engine zone for launching worker instances
  • IMAGE: the Container Registry path for your Docker image
  • GPU_TYPE: an available GPU type
  • GPU_COUNT: number of GPUs to attach to each worker VM

The considerations for running a Dataflow job with GPUs include the following:

  • To supply a custom container to your job with GPUs, you must use Dataflow Runner v2.
  • Select a WORKER_ZONE that supports the GPU_TYPE.
  • The container IMAGE URI should include a tag. You can use :latest instead of omitting the tag.

If you use TensorFlow, consider selecting a machine type with 1 vCPU. If the n1-standard-1 does not provide sufficient memory, you can consider a custom machine type, such as the n1-custom-1-NUMBER_OF_MB or the n1-custom-1-NUMBER_OF_MB-ext for extended memory. When specifying this machine type, NUMBER_OF_MB must be a multiple of 256.

Verifying your Dataflow job

To confirm that the job uses worker VMs with GPUs, follow these steps:

  1. Verify that Dataflow workers for the job have started.
  2. While a job is running, find a worker VM associated with the job.
    1. Paste the Job ID in Search Products and Resources prompt.
    2. Select the Compute Engine VM instance associated with the job.

You can also find list of all running instances in the Compute Engine console.

  1. In the Google Cloud Console, go to the VM instances page.

    Go to VM instances

  2. Click VM instance details.

  3. Verify that details page has a GPUs section and that your GPUs are attached.

If your job did not launch with GPUs, check that the --worker_accelerator experiment is configured properly and visible in the Dataflow monitoring UI in experiments. The order of tokens in the accelerator metadata is important.

For example, an 'experiments' pipeline option in the Dataflow monitoring UI might look like the following:

['use_runner_v2','worker_accelerator=type:nvidia-tesla-t4;count:1;install-nvidia-driver', ...]

Troubleshooting your Dataflow job

If you run into problems running your Dataflow job with GPUs, learn about troubleshooting steps that might resolve your issue.

Workers don't start

If your job is stuck and the Dataflow workers don't start, verify that you are using a compatible machine type:

If you encounter the ZONE_RESOURCE_POOL_EXHAUSTED or ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS errors, try launching the pipeline in a different zone or with a different accelerator type.

No GPU usage

If your pipeline runs successfully, but GPUs are not used, verify the following:

  • NVIDIA libraries installed in the worker containers match the requirements of pipeline user code.
  • Installed NVIDIA libraries are accessible as shared libraries.

If you are using frameworks that support GPU devices, such as TensorFlow, make sure these frameworks can access the attached devices. For example, if you use TensorFlow, you can print available devices with the following:

import logging
import tensorflow as tf
gpu_devices = tf.config.list_physical_devices("GPU")"GPU devices: {}".format(gpu_devices))
if len(gpu_devices) == 0:
  logging.warning("No GPUs found, defaulting to CPU")

If the devices are not available, you might be using an incompatible software configuration. For example, if you are using TensorFlow, verify that you have a compatible combination of TensorFlow, cuDNN version, and CUDA Toolkit version.

If you have trouble pinpointing the mismatch, try to start from the known configuration that works and iterate. You can do this by checking out sample tutorials and Docker images preconfigured for GPU usage, such as the ones found in TensorFlow Docker images and AI Platform Deep Learning Containers image.

Debug with a standalone VM

You can debug your custom container on a standalone VM with GPUs by creating a Compute Engine VM running GPUs on Container-Optimized OS, installing drivers, and starting your container. For detailed instructions on these steps, read Getting started: Running GPUs on Container-Optimized OS.

Apache Beam SDK containers use the /opt/apache/beam/boot entrypoint. For debugging purposes you can launch your container manually with a different entrypoint, as shown in the following example:

docker run --rm \
  -it \
  --entrypoint=/bin/bash \
  --volume /var/lib/nvidia/lib64:/usr/local/nvidia/lib64 \
  --volume /var/lib/nvidia/bin:/usr/local/nvidia/bin \
  --privileged \

Replace IMAGE with the Container Registry path for your Docker image.

Then, verify that the GPU libraries installed in your container can access the GPU devices.

What's next