Dataproc Persistent Boot Disks

You select standard, SSD, balanced persistent disks, or hyperdisk balanced as boot disks for Dataproc cluster nodes.

Select persistent boot disk types for cluster nodes

You can select the persistent boot disk type when you create a cluster using the Google Cloud console, Google Cloud CLI, or Dataproc API.

Console

You can create a cluster and select a standard, SSD, or balanced persistent boot disk for master, primary worker, and secondary worker cluster nodes from the Configure nodes panel on the Dataproc Create a cluster page of the Google Cloud console.

gcloud CLI

You can create a cluster and select a standard, SSD , balanced persistent boot disk, or hyperdisk balanced for master, primary worker, and secondary worker cluster nodes using the gcloud dataproc clusters create command with the --master-boot-disk-type, --worker-boot-disk-type, and --secondary-worker-boot-disk-type flags.

The default persistent boot disk type for Dataproc cluster master and primary worker nodes is pd-standard. If the VM machine type supports only hyperdisk as the boot disk, the default boot disk is hyperdisk-balanced. The default persistent boot disk type for cluster secondary worker nodes is the primary worker node persistent boot disk type.

You can pass a value of pd-standard, pd-ssd, pd-balanced, or hyperdisk-balanced to the --master-boot-disk-type, --worker-boot-disk-type, and --secondary-worker-boot-disk-type flags. Any of the valid disk type values can be set on any cluster node type.

Example:
gcloud dataproc clusters create CLUSTER_NAME \
    --region=REGION \
    --master-boot-disk-type=pd-ssd \
    --worker-boot-disk-type=hyperdisk-balanced \
    --secondary-worker-boot-disk-type=pd-standard \
    other args ...
p

REST API

The default boot disk type for Dataproc cluster master and primary worker nodes is pd-standard. If the VM machine type supports only hyperdisk as the boot disk, the default boot disk is hyperdisk-balanced. The default boot disk type for secondary worker nodes is the primary work node boot disk type.

You can set a value of pd-standard, pd-ssd, pd-balanced, or hyperdisk-balanced in the InstanceGroupConfig.DiskConfig.bootDiskType field in the masterConfig, workerConfig, and secondaryWorkerConfig as part of a cluster.create API request. Any of the valid boot disk type type values can be set on any cluster node type.

Hyperdisk settings

When creating a cluster, if you select hyperdisk-balanced as the boot disk for a Dataproc cluster node, you can use the gcloud CLI or the Dataproc API to set the provisioned IOPS and provisioned throughput settings.

gcloud CLI

Set provisioned IOPS and provisioned throughput for cluster nodes with the hyperdisk-balanced boot disks using the gcloud dataproc clusters create command --master-boot-disk-provisioned-iops, --worker-boot-disk-provisioned-iops, --master-boot-disk-provisioned-throughput, and --worker-boot-disk-provisioned-throughput flags.

Example:
  gcloud dataproc clusters create CLUSTER_NAME \
      --region=REGION \
      --master-boot-disk-type=hyperdisk-balanced \
      --master-boot-disk-provisioned-iops=MASTER_BOOT_DISK_IOPS  \
      --master-boot-disk-provisioned-throughput=MASTER_BOOT_DISK_THROUGHPUT \
      --worker-boot-disk-type=hyperdisk-balanced \
      --worker-boot-disk-provisioned-iops=WORKER_BOOT_DISK_IOPS \
      --worker-boot-disk-provisioned-throughput=WORKER_BOOT_DISK_THROUGHPUT \
      other args ...
  

REST API

Set provisioned IOPS and provisioned throughput for cluster nodes with hyperdisk boot disks using the InstanceGroupConfig.DiskConfig.bootDiskProvisionedIops and InstanceGroupConfig.DiskConfig.bootDiskProvisionedThroughput fields for the master and worker configs.