This page explains how to run an Apache Beam pipeline on Dataflow with GPUs. Jobs using GPUs incur charges as specified in the Dataflow pricing page.
For more information about using GPUs with Dataflow, see Dataflow support for GPUs. For more information about the developer workflow for building pipelines using GPUs, see Developing with GPUs.
Using Apache Beam notebooks
If you already have a pipeline that you would like to run with GPUs on Dataflow, you can skip this section.
Apache Beam notebooks offer a convenient way to prototype and iteratively develop your pipeline with GPUs without setting up a development environment. To get started, read the Developing with Apache Beam notebooks guide, launch an Apache Beam notebooks instance, and follow the example notebook Use GPUs with Apache Beam.
Provisioning GPU quota
GPU devices are subject to your Google Cloud project's quota availability. Request GPU quota in the region of your choice.
Installing GPU drivers
You must instruct Dataflow to install NVIDIA drivers onto the
workers by appending install-nvidia-driver
to the worker_accelerator
option. When the install-nvidia-driver
option is specified,
Dataflow installs NVIDIA drivers onto the Dataflow workers using
cos-extensions
utility provided by Container-Optimized OS. By specifying install-nvidia-driver
,
users agree to accept the NVIDIA license agreement.
Binaries and libraries provided by the NVIDIA driver installer are mounted into
the container running pipeline user code at /usr/local/nvidia/
.
The GPU driver version depends on the Container-Optimized OS version currently used by Dataflow.
Building a custom container image
To interact with the GPUs, you might need additional NVIDIA software, such as GPU-accelerated libraries and the CUDA Toolkit. You must supply these libraries in the Docker container running user code.
You can customize the container image by supplying an image that fulfills the Apache Beam SDK container image contract and has the necessary GPU libraries.
To provide a custom container image, you must use Dataflow Runner
v2 and supply
the container image using the worker_harness_container_image
pipeline option. If you are using Apache Beam 2.30.0 or later, you can use a
shorter option name sdk_container_image
instead for simplicity.
For more information, see Using custom
containers.
Approach 1. Using an existing image configured for GPU usage
You can build a Docker image that fulfills the Apache Beam SDK container contract from an existing base image that is preconfigured for GPU usage. For example, TensorFlow Docker images, and NVIDIA container images are preconfigured for GPU usage.
A sample Dockerfile that builds upon TensorFlow Docker image with Python 3.6 looks like the following:
ARG BASE=tensorflow/tensorflow:2.5.0-gpu
FROM $BASE
# Check that the chosen base image provides the expected version of Python interpreter.
ARG PY_VERSION=3.6
RUN [[ $PY_VERSION == `python -c 'import sys; print("%s.%s" % sys.version_info[0:2])'` ]] \
|| { echo "Could not find Python interpreter or Python version is different from ${PY_VERSION}"; exit 1; }
RUN pip install --no-cache-dir apache-beam[gcp]==2.29.0 \
# Verify that there are no conflicting dependencies.
&& pip check
# Copy the Apache Beam worker dependencies from the Beam Python 3.6 SDK image.
COPY --from=apache/beam_python3.6_sdk:2.29.0 /opt/apache/beam /opt/apache/beam
# Apache Beam worker expects pip at /usr/local/bin/pip by default.
# Some images have pip in a different location. If necessary, make a symlink.
# This can be omitted in Beam 2.30.0 and later versions.
RUN [[ `which pip` == "/usr/local/bin/pip" ]] || ln -s `which pip` /usr/local/bin/pip
# Set the entrypoint to Apache Beam SDK worker launcher.
ENTRYPOINT [ "/opt/apache/beam/boot" ]
When using TensorFlow Docker
images,
use TensorFlow 2.5.0 or later. Earlier TensorFlow
Docker images install the tensorflow-gpu
package instead of the tensorflow
package. The distinction is not important after TensorFlow 2.1.0
release, but several downstream packages, such as tfx
, require the
tensorflow
package.
Large container sizes slow down the worker startup time. This might occur when using containers such as Deep Learning Containers.
Installing a specific Python version
If you have strict requirements for Python version, you could build your image from an NVIDIA base image that has necessary GPU libraries and then install the Python interpreter.
The following example demonstrates selecting an NVIDIA image from the CUDA container image catalog that does not include the Python interpreter. You can adjust the example to install the desired version of Python 3 and pip. The example uses TensorFlow, so when choosing an image, we make sure the CUDA and cuDNN versions in the base image satisfy the requirements for the TensorFlow version.
A sample Dockerfile looks like the following:
# Select an NVIDIA base image with desired GPU stack from https://ngc.nvidia.com/catalog/containers/nvidia:cuda
FROM nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
RUN \
# Add Deadsnakes repository that has a variety of Python packages for Ubuntu.
# See: https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys F23C5A6CF475977595C89F51BA6932366A755776 \
&& echo "deb http://ppa.launchpad.net/deadsnakes/ppa/ubuntu focal main" >> /etc/apt/sources.list.d/custom.list \
&& echo "deb-src http://ppa.launchpad.net/deadsnakes/ppa/ubuntu focal main" >> /etc/apt/sources.list.d/custom.list \
&& apt-get update \
&& apt-get install -y curl \
python3.8 \
# With python3.8 package, distutils need to be installed separately.
python3-distutils \
&& rm -rf /var/lib/apt/lists/* \
&& update-alternatives --install /usr/bin/python python /usr/bin/python3.8 10 \
&& curl https://bootstrap.pypa.io/get-pip.py | python \
# Install Apache Beam and Python packages that will interact with GPUs.
&& pip install --no-cache-dir apache-beam[gcp]==2.29.0 tensorflow==2.4.0 \
# Verify that there are no conflicting dependencies.
&& pip check
# Copy the Apache Beam worker dependencies from the Beam Python 3.8 SDK image.
COPY --from=apache/beam_python3.8_sdk:2.29.0 /opt/apache/beam /opt/apache/beam
# Set the entrypoint to Apache Beam SDK worker launcher.
ENTRYPOINT [ "/opt/apache/beam/boot" ]
On some OS distributions, it might be difficult to install specific Python versions using the OS package manager. In this case, you could install Python interpreter with tools like Miniconda or pyenv.
A sample Dockerfile looks like the following:
FROM nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
# The Python version of the Dockerfile must match the Python version you use
# to launch the Dataflow job.
ARG PYTHON_VERSION=3.8
# Update PATH so we find our new Conda and Python installations.
ENV PATH=/opt/python/bin:/opt/conda/bin:$PATH
RUN apt-get update \
&& apt-get install -y wget \
&& rm -rf /var/lib/apt/lists/* \
# The NVIDIA image doesn't come with Python pre-installed.
# We use Miniconda to install the Python version of our choice.
&& wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& sh Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda \
&& rm Miniconda3-latest-Linux-x86_64.sh \
# Create a new Python environment with desired version, and install pip.
&& conda create -y -p /opt/python python=$PYTHON_VERSION pip \
# Remove unused Conda packages, install necessary Python packages via pip
# to avoid mixing packages from pip and Conda.
&& conda clean -y --all --force-pkgs-dirs \
# Install Apache Beam and Python packages that will interact with GPUs.
&& pip install --no-cache-dir apache-beam[gcp]==2.29.0 tensorflow==2.4.0 \
# Verify that there are no conflicting dependencies.
&& pip check \
# Apache Beam worker expects pip at /usr/local/bin/pip by default.
# This can be omitted in Beam 2.30.0 and later versions.
&& ln -s $(which pip) /usr/local/bin/pip
# Copy the Apache Beam worker dependencies from the Apache Beam SDK for Python 3.8 image.
COPY --from=apache/beam_python3.8_sdk:2.29.0 /opt/apache/beam /opt/apache/beam
# Set the entrypoint to Apache Beam SDK worker launcher.
ENTRYPOINT [ "/opt/apache/beam/boot" ]
Approach 2. Using Apache Beam container images
You can configure a container image for GPU usage without using preconfigured images. This approach is not recommended unless preconfigured images do not work for you. Setting up your own container image requires selecting compatible libraries and configuring their execution environment.
A sample Dockerfile looks like the following:
FROM apache/beam_python3.7_sdk:2.24.0
ENV INSTALLER_DIR="/tmp/installer_dir"
# The base image has TensorFlow 2.2.0, which requires CUDA 10.1 and cuDNN 7.6.
# You can download cuDNN from NVIDIA website
# https://developer.nvidia.com/cudnn
COPY cudnn-10.1-linux-x64-v7.6.0.64.tgz $INSTALLER_DIR/cudnn.tgz
RUN \
# Download CUDA toolkit.
wget -q -O $INSTALLER_DIR/cuda.run https://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run && \
# Install CUDA toolkit. Print logs upon failure.
sh $INSTALLER_DIR/cuda.run --toolkit --silent || (egrep '^\[ERROR\]' /var/log/cuda-installer.log && exit 1) && \
# Install cuDNN.
mkdir $INSTALLER_DIR/cudnn && \
tar xvfz $INSTALLER_DIR/cudnn.tgz -C $INSTALLER_DIR/cudnn && \
cp $INSTALLER_DIR/cudnn/cuda/include/cudnn*.h /usr/local/cuda/include && \
cp $INSTALLER_DIR/cudnn/cuda/lib64/libcudnn* /usr/local/cuda/lib64 && \
chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn* && \
rm -rf $INSTALLER_DIR
# A volume with GPU drivers will be mounted at runtime at /usr/local/nvidia.
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/nvidia/lib64:/usr/local/cuda/lib64
Driver libraries in /usr/local/nvidia/lib64
must be discoverable in the container
as shared libraries by configuring the LD_LIBRARY_PATH
environment variable.
If you use TensorFlow, you must choose a compatible combination of CUDA Toolkit and cuDNN versions. For more details, read Software requirements and Tested build configurations.
Selecting type and number of GPUs for Dataflow workers
Dataflow allows you to configure the type and number of GPUs to
attach to Dataflow workers using the worker_accelerator
parameter. You can
select the type and number of GPUs based on your use case and
how you plan to utilize the GPUs in your pipeline.
The following GPU types are supported with Dataflow:
- NVIDIA® Tesla® T4
- NVIDIA® Tesla® P4
- NVIDIA® Tesla® V100
- NVIDIA® Tesla® P100
- NVIDIA® Tesla® K80
For more detailed information about each GPU type, including performance data, read the GPU comparison chart.
Running your job with GPUs
To run a Dataflow job with GPUs, use the following command:
Python
python PIPELINE \
--runner "DataflowRunner" \
--project "PROJECT" \
--temp_location "gs://BUCKET/tmp" \
--region "REGION" \
--worker_harness_container_image "IMAGE" \
--disk_size_gb "DISK_SIZE_GB" \
--experiments "worker_accelerator=type:GPU_TYPE;count:GPU_COUNT;install-nvidia-driver" \
--experiments "use_runner_v2"
Replace the following:
- PIPELINE: your pipeline source code file
- PROJECT: the Google Cloud project name
- BUCKET: the Cloud Storage bucket
- REGION: a regional endpoint, for example,
us-central1
- IMAGE: the Container Registry path for your Docker image
- DISK_SIZE_GB: Size of the boot disk for each worker VM, for example,
50
- GPU_TYPE: an available
GPU type, for example,
nvidia-tesla-t4
- GPU_COUNT: number of GPUs to attach to each worker VM, for example,
1
The considerations for running a Dataflow job with GPUs include the following:
- To supply a custom container to your job with GPUs, you must use Dataflow Runner v2.
- Select a
REGION
that has zones that support theGPU_TYPE
. Dataflow automatically assigns workers to a zone with GPUs in this region. - Because GPU containers are typically large, we recommended that you increase the default boot disk size to 50 gigabytes or more to avoid running out of disk space.
If you use TensorFlow, consider configuring the workers to use
a single process by setting a pipeline option --experiments=no_use_multiple_sdk_containers
or using workers with one vCPU. If the n1-standard-1
does not provide
sufficient memory, you can consider a custom machine type,
such as the n1-custom-1-NUMBER_OF_MB
or the n1-custom-1-NUMBER_OF_MB-ext
for
extended memory.
For more information, read GPUs and worker parallelism.
Verifying your Dataflow job
To confirm that the job uses worker VMs with GPUs, follow these steps:
- Verify that Dataflow workers for the job have started.
- While a job is running, find a worker VM associated with the job.
- Paste the Job ID in Search Products and Resources prompt.
- Select the Compute Engine VM instance associated with the job.
You can also find list of all running instances in the Compute Engine console.
In the Google Cloud console, go to the VM instances page.
Click VM instance details.
Verify that details page has a GPUs section and that your GPUs are attached.
If your job did not launch with GPUs, check that the --worker_accelerator
experiment is configured properly and visible in the Dataflow
monitoring UI in experiments
. The order of tokens in the accelerator
metadata is important.
For example, an 'experiments' pipeline option in the Dataflow monitoring UI might look like the following:
['use_runner_v2','worker_accelerator=type:nvidia-tesla-t4;count:1;install-nvidia-driver', ...]
Viewing GPU utilization
To see GPU utilization on the worker VMs, follow these steps:
In the Google Cloud console, go to Monitoring or use the following button:
Go to Monitoring
In the Monitoring navigation pane, click Metrics Explorer.
Specify
Dataflow Job
as Resource Type andGPU utilization
orGPU memory utilization
as metric, depending on which metric you would like to monitor.
For more information, read Metrics Explorer guide.
Using GPUs with Dataflow Prime
Dataflow Prime lets
you request accelerators for a specific step of your pipeline.
To use GPUs with Dataflow Prime, do not use the --experiments=worker_accelerator
pipeline option. Instead, request the GPUs with the accelerator
resource hint.
For more information, see Using resource hints.
Troubleshooting your Dataflow job
If you run into problems running your Dataflow job with GPUs, please follow the troubleshooting steps below that might resolve your issue.
Workers don't start
If your job is stuck and the Dataflow workers never start processing data, it is likely that you have a problem related to using a custom container with Dataflow. For more details, read the custom containers troubleshooting guide.
If you are a Python user, verify that the following conditions are met:
- The Python interpreter minor version
in your container image is the same version as you use when launching your
pipeline. In case of the mismatch, you may see errors like
SystemError: unknown opcode
with a stack trace involvingapache_beam/internal/pickler.py
. - If you are using the Apache Beam SDK 2.29.0 or earlier,
pip
must be accessible on the image in/usr/local/bin/pip
.
We recommend that you reduce the customizations to a minimal working configuration the first time you use a custom image. Use the sample custom container images provided in the examples on this page, make sure you can run a simple Dataflow pipeline with this container image without requesting GPUs, and then iterate on the solution.
Verify that workers have sufficient disk space to download your container image and adjust disk size if necessary. Large images take longer to download, which increases worker startup time.
Job fails immediately at startup
If you encounter the
ZONE_RESOURCE_POOL_EXHAUSTED
or ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS
errors, you can take the following steps:
Don't specify the worker zone so that Dataflow selects the optimal zone for you.
Launch the pipeline in a different zone or with a different accelerator type.
Job fails at runtime
If the job fails at runtime, rule out out of memory (OOM) errors on the worker
machine and on the GPU. GPU OOM errors may manifest as
cudaErrorMemoryAllocation out of memory
errors in worker logs. If you are
using TensorFlow, verify that you use only one
TensorFlow process to access one GPU device.
For more information, read GPUs and worker parallelism.
No GPU usage
If your pipeline runs successfully, but GPUs are not used, verify the following:
- NVIDIA libraries installed in the container image match the requirements of pipeline user code and libraries it uses.
- Installed NVIDIA libraries in container images are accessible as shared libraries.
If the devices are not available, you might be using an incompatible software configuration. For example, if you are using TensorFlow, verify that you have a compatible combination of TensorFlow, cuDNN version, and CUDA Toolkit version.
To verify the image configuration, consider running a simple pipeline that just checks that GPUs are available and accessible to the workers.
Debug with a standalone VM
While you are designing and iterating on a container image that works for you, it can be faster to reduce the feedback loop by trying out your container image on a standalone VM.
You can debug your custom container on a standalone VM with GPUs by creating a Compute Engine VM running GPUs on Container-Optimized OS, installing drivers, and starting your container as follows.
Create a VM instance.
gcloud compute instances create INSTANCE_NAME \ --project "PROJECT" \ --image-family cos-stable \ --image-project=cos-cloud \ --zone=us-central1-f \ --accelerator type=nvidia-tesla-t4,count=1 \ --maintenance-policy TERMINATE \ --restart-on-failure \ --boot-disk-size=200G \ --scopes=cloud-platform
Use
ssh
to connect to the VM.gcloud compute ssh INSTANCE_NAME --project "PROJECT"
Install the GPU drivers. After connecting to the VM via
ssh
, run the following commands on the VM:# Run these commands on the virtual machine cos-extensions install gpu sudo mount --bind /var/lib/nvidia /var/lib/nvidia sudo mount -o remount,exec /var/lib/nvidia /var/lib/nvidia/bin/nvidia-smi
Launch your custom container.
Apache Beam SDK containers use the
/opt/apache/beam/boot
entrypoint. For debugging purposes you can launch your container manually with a different entrypoint, as shown below:docker-credential-gcr configure-docker docker run --rm \ -it \ --entrypoint=/bin/bash \ --volume /var/lib/nvidia/lib64:/usr/local/nvidia/lib64 \ --volume /var/lib/nvidia/bin:/usr/local/nvidia/bin \ --privileged \ IMAGE
Replace IMAGE with the Container Registry path for your Docker image.
Verify that the GPU libraries installed in your container can access the GPU devices.
If you are using TensorFlow, you can print available devices in Python interpreter with the following:
>>> import tensorflow as tf >>> print(tf.config.list_physical_devices("GPU"))
If you are using PyTorch, you can inspect available devices in Python interpreter with the following:
>>> import torch >>> print(torch.cuda.is_available()) >>> print(torch.cuda.device_count()) >>> print(torch.cuda.get_device_name(0))
To iterate on your pipeline, you can launch your pipeline on Direct Runner. You can also launch pipelines on Dataflow Runner from this environment.
For additional information, see:
- Getting started: Running GPUs on Container-Optimized OS.
- Container Registry standalone credential helper.
- Container-Optimized OS toolbox.
- Service account access scopes.
What's next
- Learn more about GPU support on Dataflow.
- Work through Processing Landsat satellite images with GPUs.