This page provides background information on how GPUs work with Dataflow, including information about prerequisites and supported GPU types.
Using GPUs in Dataflow jobs lets you accelerate some data processing tasks. GPUs can perform certain computations faster than CPUs. These computations are usually numeric or linear algebra, often used in image processing and machine learning use cases. The extent of performance improvement varies by the use case, type of computation, and amount of data processed.
Prerequisites for using GPUs in Dataflow
- To use GPUs with your Dataflow job, you must use Runner v2.
- Dataflow runs user code in worker VMs inside a Docker container.
These worker VMs run Container-Optimized OS.
For Dataflow jobs to use GPUs, you need the following prerequisites:
- GPU drivers are installed on worker VMs and accessible to the Docker container. For more information, see Install GPU drivers.
- GPU libraries required by your pipeline, such as NVIDIA CUDA-X libraries or the NVIDIA CUDA Toolkit, are installed in the custom container image. For more information, see Configure your container image.
- Because GPU containers are typically large, to avoid running out of disk space, increase the default boot disk size to 50 gigabytes or more.
Jobs using GPUs incur charges as specified in the Dataflow pricing page.
The following GPU types are supported with Dataflow:
|NVIDIA® A100 40 GB||
|NVIDIA® A100 80 GB||
|NVIDIA® Tesla® T4||
|NVIDIA® Tesla® P4||
|NVIDIA® Tesla® V100||
|NVIDIA® Tesla® P100||
|NVIDIA® Tesla® K80||
For more information about each GPU type, including performance data, see Compute Engine GPU platforms.
For information about available regions and zones for GPUs, see GPU regions and zones availability in the Compute Engine documentation.
- See an example of a developer workflow for building pipelines that use GPUs.
- Learn how to run an Apache Beam pipeline on Dataflow with GPUs.
- Work through Processing Landsat satellite images with GPUs.