Develop with GPUs

This page describes an example of a developer workflow for building pipelines using GPUs.

For more information about using GPUs with Dataflow, see Dataflow support for GPUs. For information and examples on how to enable GPUs in your Dataflow jobs, see Using GPUs and Processing Landsat satellite images with GPUs.

Using Apache Beam with NVIDIA GPUs lets you create large-scale data processing pipelines that handle preprocessing and inference. When you're using GPUs for local development, consider the following information:

  • Oftentimes, the data processing workflows use additional libraries that you need to install in the launch environment and in the execution environment on Dataflow workers. This configuration adds steps to the development workflow for configuring pipeline requirements or for using custom containers in Dataflow. You might want a local development environment that mimics the production environment as closely as possible.

  • If you're using a library that implicitly uses NVIDIA GPUs, and your code doesn't require any changes to support GPUs, you don't need to change your development workflow to configure pipeline requirements or to build custom containers.

  • Some libraries don't switch transparently between the CPU and GPU usage, and hence require specific builds and different code paths. To replicate the code-run-code development lifecycle for this scenario, additional steps are required.

  • When running local experiments, it's useful to replicate the environment of the Dataflow worker as closely as possible. Depending on the library, you might need a machine with a GPU and the required GPU libraries installed. This type of machine might not be available in your local environment. You can emulate the Dataflow runner environment using a container running on a GPU-equipped Google Cloud virtual machine.

  • It's unlikely to have a pipeline composed entirely of transformations that require a GPU. A typical pipeline has an ingestion stage that uses one of the many sources provided by Apache Beam. That stage is followed by data manipulation or shaping transforms, which then feed into a GPU transform.

The following two-stage workflow shows how to build a pipeline using GPUs. This flow takes care of GPU and non-GPU related issues separately and shortens the feedback loop.

  1. Create a pipeline

    Create a pipeline that can run on Dataflow. Replace the transforms that require GPUs with the transforms that don't use GPUs, but are functionally the same:

    1. Create all transformations that surround the GPU usage, such as data ingestion and manipulation.

    2. Create a stub for the GPU transform with a simple pass-through or schema change.

  2. Test locally

    Test the GPU portion of the pipeline code in the environment that mimics the Dataflow worker execution environment. The following steps describe one of the methods to run this test:

    1. Create a Docker image with all necessary libraries.

    2. Start development of the GPU code.

    3. Begin the code-run-code cycle using a Google Cloud virtual machine with the Docker image. To rule out library incompatibilities, run the GPU code in a local Python process separately from an Apache Beam pipeline. Then, run the entire pipeline on the direct runner, or launch the pipeline on Dataflow.

Using a VM running container-optimized operating system

For a minimum environment, use a container-optimized virtual machine (VM). For more information, see Create a VM with attached GPUs.

The general flow is:

  1. Create a VM.

  2. Connect to the VM and run the following commands:

    cos-extensions install gpu
    sudo mount --bind /var/lib/nvidia /var/lib/nvidia
    sudo mount -o remount,exec /var/lib/nvidia /var/lib/nvidia/bin/nvidia-smi
  3. Confirm that GPUs are available:

  4. Start a Docker container with GPU drivers from the VM mounted as volumes. For example:

    docker run --rm -it --entrypoint /bin/bash
    --volume /var/lib/nvidia/lib64:/usr/local/nvidia/lib64
    --volume /var/lib/nvidia/bin:/usr/local/nvidia/bin

For a sample Dockerfile, see Building a custom container image. Make sure that you add all the dependencies that you need for your pipeline, to the Dockerfile.

For more information about using a Docker image that is pre-configured for GPU usage, see Using an existing image configured for GPU usage.

Useful tools when working with container-optimized systems

  • To configure Docker CLI to use docker-credential-gcr as a credential helper for the default set of Google Container Registries (GCR), use:

    docker-credential-gcr configure-docker

    For more information about setting up Docker credentials, see docker-credential-gcr.

  • To copy files, such as pipeline code, to or from a VM, use toolbox. This technique is useful when using a Custom-Optimized image. For example:

    toolbox /google-cloud-sdk/bin/gsutil cp gs://bucket/gpu/image_process/* /media/root/home/<userid>/opencv/

    For more information, see Debugging node issues using toolbox.

What's next