Attaching GPUs to clusters

Dataproc provides the ability for graphics processing units (GPUs) to be attached to the master and worker Compute Engine nodes in a Dataproc cluster. You can use these GPUs to accelerate specific workloads on your instances, such as machine learning and data processing.

For more information about what you can do with GPUs and what types of GPU hardware are available, read GPUs on Compute Engine.

Before you begin

  • GPUs require special drivers and software. These items are not pre-installed on Dataproc clusters.
  • Read about GPU pricing on Compute Engine to understand the cost to use GPUs in your instances.
  • GPUs cannot be attached to preemptible virtual machines in Dataproc clusters.
  • Read about restrictions for instances with GPUs to learn how these instances function differently from non-GPU instances.
  • Check the quotas page for your project to ensure that you have sufficient GPU quota (NVIDIA_K80_GPUS, NVIDIA_P100_GPUS, or NVIDIA_V100_GPUS) available in your project. If GPUs are not listed on the quotas page or you require additional GPU quota, request a quota increase.

Types of GPUs

Dataproc nodes support the following GPU types. You must specify GPU type when attaching GPUs to your Dataproc cluster.

  • nvidia-tesla-k80 - NVIDIA® Tesla® K80
  • nvidia-tesla-p100 - NVIDIA® Tesla® P100
  • nvidia-tesla-v100 - NVIDIA® Tesla® V100
  • nvidia-tesla-p4 - NVIDIA® Tesla® P4
  • nvidia-tesla-t4 - NVIDIA® Tesla® T4
  • nvidia-tesla-p100-vws - NVIDIA® Tesla® P100 Virtual Workstations
  • nvidia-tesla-p4-vws - NVIDIA® Tesla® P4 Virtual Workstations
  • nvidia-tesla-t4-vws - NVIDIA® Tesla® T4 Virtual Workstations

Attaching GPUs to clusters


Attach GPUs to the master and primary and preemptible worker nodes in a Dataproc cluster when creating the cluster using the ‑‑master-accelerator, ‑‑worker-accelerator, and ‑‑preemptible-accelerator flags. These flags take the following two values:

  1. the type of GPU to attach to a node, and
  2. the number of GPUs to attach to the node.

The type of GPU is required, and the number of GPUs is optional (the default is 1 GPU).


gcloud dataproc clusters create args \
  --master-accelerator type=nvidia-tesla-k80 \
  --worker-accelerator type=nvidia-tesla-k80,count=4 \
  --preemptible-worker-accelerator type=nvidia-tesla-k80,count=4

To use GPUs in your cluster, you must install GPU drivers.


Attach GPUs to the master and primary and preemptible worker nodes in a Dataproc cluster by filling in the InstanceGroupConfig.AcceleratorConfig acceleratorTypeUri and acceleratorCount fields as part of the cluster.create API request.


Click Customize in the master and worker nodes sections of the Create a cluster page in the Cloud Console to specify the number of GPUs and GPU type for the nodes.

Installing GPU drivers

GPU drivers are required to utilize any GPUs attached to Dataproc nodes. You can install GPU drivers by following the instructions for this initialization action, which is listed below.

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS-IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.
# This script installs NVIDIA GPU drivers and collects GPU utilization metrics.

set -euxo pipefail

function get_metadata_attribute() {
  local attribute_name=$1
  local default_value=$2
  /usr/share/google/get_metadata_value "attributes/${attribute_name}" || echo -n "${default_value}"

readonly GPU_AGENT_REPO_URL=''

# Whether to install GPU monitoring agent that sends GPU metrics to StackDriver
INSTALL_GPU_AGENT=$(get_metadata_attribute 'install_gpu_agent' 'false')

OS_NAME=$(lsb_release -is | tr '[:upper:]' '[:lower:]')
readonly OS_NAME
OS_DIST=$(lsb_release -cs)
readonly OS_DIST


function install_gpu_driver() {
  # Detect NVIDIA GPU
  apt-get update
  apt-get install -y pciutils
  if ! (lspci | grep -q NVIDIA); then
    echo 'No NVIDIA card detected. Skipping installation.' >&2
    exit 0

  local packages=(nvidia-cuda-toolkit)
  local modules=(nvidia-drm nvidia-uvm drm)

  # Add non-free Debian packages.
  # See
  if [[ ${OS_NAME} == debian ]]; then
    for type in deb deb-src; do
      for distro in ${OS_DIST} ${OS_DIST}-backports; do
        echo "${type} ${distro} contrib non-free" \
    packages+=(nvidia-driver nvidia-kernel-common nvidia-smi)
    local nvblas_cpu_blas_lib=/usr/lib/
  elif [[ ${OS_NAME} == ubuntu ]]; then
    # Ubuntu-specific Nvidia driver pacakges and modules
    local nvblas_cpu_blas_lib=/usr/lib/x86_64-linux-gnu/
    echo "Unsupported OS: '${OS_NAME}'"
    exit 1

  apt-get update
  # Install proprietary NVIDIA Drivers and CUDA
  # See
  export DEBIAN_FRONTEND=noninteractive
  apt-get install -y "linux-headers-$(uname -r)"
  # Without --no-install-recommends this takes a very long time.
  apt-get install -y -t "${OS_DIST}-backports" --no-install-recommends "${packages[@]}"

  # Create a system wide NVBLAS config
  # See
  local nvblas_config_file=/etc/nvidia/nvblas.conf
  # Create config file if it does not exist - this file doesn't exist by default in Ubuntu
  mkdir -p "$(dirname ${nvblas_config_file})"
  cat <<EOF >>${nvblas_config_file}
# Insert here the CPU BLAS fallback library of your choice.
# The standard defaults to OpenBLAS, which does not have the
# requisite CBLAS API.
NVBLAS_CPU_BLAS_LIB ${nvblas_cpu_blas_lib}
# Use all GPUs
# Add more configuration here.
  echo "NVBLAS_CONFIG_FILE=${nvblas_config_file}" >>/etc/environment

  # Rebooting during an initialization action is not recommended, so just
  # dynamically load kernel modules. If you want to run an X server, it is
  # recommended that you schedule a reboot to occur after the initialization
  # action finishes.
  modprobe -r nouveau
  modprobe "${modules[@]}"

  # Restart any NodeManagers, so they pick up the NVBLAS config.
  if systemctl status hadoop-yarn-nodemanager; then
    systemctl restart hadoop-yarn-nodemanager

  echo 'NVIDIA GPU driver was installed successfully'

# Collect gpu_utilization and gpu_memory_utilization
function install_gpu_agent_service() {
  if ! command -v pip; then
    apt-get install -y python-pip
  local install_dir=/opt/gpu_utilization_agent
  mkdir "${install_dir}"
  wget -nv --timeout=30 --tries=5 --retry-connrefused \
    "${GPU_AGENT_REPO_URL}/requirements.txt" -P "${install_dir}"
  wget -nv --timeout=30 --tries=5 --retry-connrefused \
    "${GPU_AGENT_REPO_URL}/" -P "${install_dir}"
  pip install -r "${install_dir}/requirements.txt"

  # Generate GPU service.
  cat <<EOF >/lib/systemd/system/gpu_utilization_agent.service
Description=GPU Utilization Metric Agent

ExecStart=/bin/bash --login -c 'python "${install_dir}/"'

  # Reload systemd manager configuration
  systemctl daemon-reload
  # Enable gpu_utilization_agent service
  systemctl --now enable gpu_utilization_agent.service

function main() {
  # Install GPU NVIDIA Drivers

  # Install GPU metrics collection in Stackdriver if needed
  if [[ ${INSTALL_GPU_AGENT} == true ]]; then
    echo 'GPU agent successfully deployed.'
    echo 'GPU metrics will not be installed.'


Verifying GPU driver install

After you have finished installing the GPU driver on your Dataproc nodes, you can verify that the driver is functioning properly. SSH into the master node of your Dataproc cluster and run the following command:


If the driver is functioning properly, the output will display the driver version and GPU statistics (see Verifying the GPU driver install).

Spark configuration

When submitting jobs to Spark, you can use the following Spark Configuration to load needed libraries.

Example GPU job

You can test GPUs on Dataproc by running any of the following jobs, which benefit when run with GPUs:

  1. Run one of the Spark ML examples.
  2. Run the following example with spark-shell to run a matrix computation:
import org.apache.spark.mllib.linalg._
import org.apache.spark.mllib.linalg.distributed._
import java.util.Random

def makeRandomSquareBlockMatrix(rowsPerBlock: Int, nBlocks: Int): BlockMatrix = {
  val range = sc.parallelize(1 to nBlocks)
  val indices = range.cartesian(range)
  return new BlockMatrix(
          ij => (ij, Matrices.rand(rowsPerBlock, rowsPerBlock, new Random()))),
      rowsPerBlock, rowsPerBlock, 0, 0)

val N = 1024 * 5
val n = 2
val mat1 = makeRandomSquareBlockMatrix(N, n)
val mat2 = makeRandomSquareBlockMatrix(N, n)
val mat3 = mat1.multiply(mat2)
println("Processing complete!")

What's Next