Attaching GPUs to clusters

Cloud Dataproc provides the ability for graphics processing units (GPUs) to be attached to the master and worker Compute Engine nodes in a Cloud Dataproc cluster. You can use these GPUs to accelerate specific workloads on your instances, such as machine learning and data processing.

For more information about what you can do with GPUs and what types of GPU hardware are available, read GPUs on Compute Engine.

Before you begin

  • GPUs require special drivers and software. These items are not pre-installed on Cloud Dataproc clusters.
  • Read about GPU pricing on Compute Engine to understand the cost to use GPUs in your instances.
  • GPUs cannot be attached to preemptible virtual machines in Dataproc clusters.
  • Read about restrictions for instances with GPUs to learn how these instances function differently from non-GPU instances.
  • Check the quotas page for your project to ensure that you have sufficient GPU quota (NVIDIA_K80_GPUS, NVIDIA_P100_GPUS, or NVIDIA_V100_GPUS) available in your project. If GPUs are not listed on the quotas page or you require additional GPU quota, request a quota increase.

Types of GPUs

Cloud Dataproc nodes support the following GPU types. You must specify GPU type when attaching GPUs to your Cloud Dataproc cluster.

  • nvidia-tesla-k80 - NVIDIA® Tesla® K80
  • nvidia-tesla-p100 - NVIDIA® Tesla® P100
  • nvidia-tesla-v100 - NVIDIA® Tesla® V100

Attaching GPUs to clusters

gcloud

Attach GPUs to the master and worker nodes in a Cloud Dataproc cluster when creating the cluster using the ‑‑master-accelerator and ‑‑worker-accelerator flags. These flags take the following two values:

  1. the type of GPU to attach to a node, and
  2. the number of GPUs to attach to the node.

The type of GPU is required, and the number of GPUs is optional (the default is 1 GPU).

gcloud beta dataproc clusters create args \
  --master-accelerator type=nvidia-tesla-k80 \
  --worker-accelerator type=nvidia-tesla-k80,count=4

To use GPUs on your cluster, you must install GPU drivers.

REST API

Attach GPUs to the master and worker nodes in a Cloud Dataproc cluster by filling in the InstanceGroupConfig.AcceleratorConfig acceleratorTypeUri and acceleratorCount fields as part of the cluster.create API request.

Console

GCP Console support for creating clusters with GPUs attached will be added in a future Cloud Dataproc release.

Installing GPU drivers

GPU drivers are required to utilize any GPUs attached to Cloud Dataproc nodes. The easiest way to do this is to create an initialization action, which is used when you create a cluster. Installing GPU drivers and libraries in an initialization action may require several minutes.

This initialization action installs NVIDIA GPU drivers from the non-free component of Debian 8 Jessie backports repository. More recent drivers may be available from the NVIDIA driver download site (for "Operating System", select "Show all Operating Systems"→"Linux 64 Bit" for Debian 8 compatible drivers.). Installing GPU drivers and libraries in an initialization action may require several minutes. You can save a copy of this initialization action in a Cloud Storage bucket for use with Cloud Dataproc.

#!/bin/bash -ex

# Detect NVIDIA GPU
apt-get update
apt-get install -y pciutils
if ! (lspci | grep -q NVIDIA); then
  echo 'No NVIDIA card detected. Skipping installation.' >&2
  exit 0
fi

# Add non-free Debian 8 Jessie backports packages.
# See https://www.debian.org/distrib/packages#note
sed 's/main/contrib/p;s/contrib/non-free/' \
  /etc/apt/sources.list.d/backports.list \
  > /etc/apt/sources.list.d/backports-non-free.list
apt-get update

# Install proprietary NVIDIA Drivers and CUDA
# See https://wiki.debian.org/NvidiaGraphicsDrivers
export DEBIAN_FRONTEND=noninteractive
apt-get install -y linux-headers-$(uname -r)
# Without --no-install-recommends this takes a very long time.
apt-get install -y -t jessie-backports --no-install-recommends \
  nvidia-cuda-toolkit nvidia-kernel-common nvidia-driver nvidia-smi

# Create a system wide NVBLAS config
# See http://docs.nvidia.com/cuda/nvblas/
NVBLAS_CONFIG_FILE=/etc/nvidia/nvblas.conf
cat << EOF >> ${NVBLAS_CONFIG_FILE}
# Insert here the CPU BLAS fallback library of your choice.
# The standard libblas.so.3 defaults to OpenBLAS, which does not have the
# requisite CBLAS API.
NVBLAS_CPU_BLAS_LIB /usr/lib/libblas/libblas.so

# Use all GPUs
NVBLAS_GPU_LIST ALL

# Add more configuration here.
EOF
echo "NVBLAS_CONFIG_FILE=${NVBLAS_CONFIG_FILE}" >> /etc/environment

# Rebooting during an initialization action is not recommended, so just
# dynamically load kernel modules. If you want to run an X server, it is
# recommended that you schedule a reboot to occur after the initialization
# action finishes.
modprobe -r nouveau
modprobe nvidia-current
modprobe nvidia-drm
modprobe nvidia-uvm
modprobe drm

# Restart any NodeManagers so they pick up the NVBLAS config.
if systemctl status hadoop-yarn-nodemanager; then
  systemctl restart hadoop-yarn-nodemanager
fi

Verifying GPU driver install

After you have finished installing the GPU driver on your Cloud Dataproc nodes, you can verify that the driver is functioning properly. SSH into the master node of your Cloud Dataproc cluster and run the following command:

nvidia-smi

If the driver is functioning properly, the output will display the driver version and GPU statistics (see Verifying the GPU driver install).

Spark configuration

When submitting jobs to Spark, you can use the following Spark Configuration to load needed libraries.

spark.executorEnv.LD_PRELOAD=libnvblas.so

Example GPU job

You can test GPUs on Cloud Dataproc by running any of the following jobs, which benefit when run with GPUs:

  1. Run one of the Spark ML examples.
  2. Run the following example with spark-shell to run a matrix computation:
import org.apache.spark.mllib.linalg._
import org.apache.spark.mllib.linalg.distributed._
import java.util.Random

def makeRandomSquareBlockMatrix(rowsPerBlock: Int, nBlocks: Int): BlockMatrix = {
  val range = sc.parallelize(1 to nBlocks)
  val indices = range.cartesian(range)
  return new BlockMatrix(
      indices.map(
          ij => (ij, Matrices.rand(rowsPerBlock, rowsPerBlock, new Random()))),
      rowsPerBlock, rowsPerBlock, 0, 0)
}

val N = 1024 * 5
val n = 2
val mat1 = makeRandomSquareBlockMatrix(N, n)
val mat2 = makeRandomSquareBlockMatrix(N, n)
val mat3 = mat1.multiply(mat2)
mat3.blocks.persist.count
println("Processing complete!")

What's Next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation