You select standard, SSD, balanced persistent disks, or hyperdisk balanced as boot disks for Dataproc cluster nodes.
Select persistent boot disk types for cluster nodes
You can select the persistent boot disk type when you create a cluster using the Google Cloud console, Google Cloud CLI, or Dataproc API.
Console
You can create a cluster and select a standard, SSD, or balanced persistent boot disk for master, primary worker, and secondary worker cluster nodes from the Configure nodes panel on the Dataproc Create a cluster page of the Google Cloud console.
gcloud CLI
You can create a cluster and select a standard, SSD , balanced persistent boot disk,
or hyperdisk balanced for master, primary worker, and
secondary worker cluster nodes using the
gcloud dataproc clusters create
command with the --master-boot-disk-type
,
--worker-boot-disk-type
, and
--secondary-worker-boot-disk-type
flags.
The default persistent boot disk type for Dataproc cluster master and
primary worker nodes is pd-standard
. If the VM
machine type
supports only hyperdisk as the boot disk,
the default boot disk is hyperdisk-balanced
.
The default persistent boot disk type for cluster
secondary worker nodes is the primary worker node persistent boot disk type.
You can pass a value of pd-standard
, pd-ssd
,
pd-balanced
, or hyperdisk-balanced
to the
--master-boot-disk-type
, --worker-boot-disk-type
, and
--secondary-worker-boot-disk-type
flags. Any of the valid
disk type values can be set on any cluster node type.
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ --master-boot-disk-type=pd-ssd \ --worker-boot-disk-type=hyperdisk-balanced \ --secondary-worker-boot-disk-type=pd-standard \ other args ...
REST API
The default boot disk type for Dataproc cluster master and primary worker
nodes is pd-standard
. If the VM machine type
supports only hyperdisk as the boot disk,
the default boot disk is hyperdisk-balanced
. The default boot disk type for
secondary worker nodes is the
primary work node boot disk type.
You can set a value of pd-standard
, pd-ssd
,
pd-balanced
, or hyperdisk-balanced
in the
InstanceGroupConfig.DiskConfig.bootDiskType
field in the masterConfig
, workerConfig
, and
secondaryWorkerConfig
as part of a
cluster.create
API request. Any of the valid boot disk type type values can be set on any cluster node type.
Hyperdisk settings
When creating a cluster, if you select hyperdisk-balanced
as the boot disk for a Dataproc
cluster node, you can use the gcloud CLI or the
Dataproc API to set the
provisioned IOPS and
provisioned throughput settings.
gcloud CLI
Set provisioned IOPS and provisioned throughput for cluster nodes with the
hyperdisk-balanced boot disks using the
gcloud dataproc clusters create
command --master-boot-disk-provisioned-iops
,
--worker-boot-disk-provisioned-iops
,
--master-boot-disk-provisioned-throughput
, and
--worker-boot-disk-provisioned-throughput
flags.
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ --master-boot-disk-type=hyperdisk-balanced \ --master-boot-disk-provisioned-iops=MASTER_BOOT_DISK_IOPS \ --master-boot-disk-provisioned-throughput=MASTER_BOOT_DISK_THROUGHPUT \ --worker-boot-disk-type=hyperdisk-balanced \ --worker-boot-disk-provisioned-iops=WORKER_BOOT_DISK_IOPS \ --worker-boot-disk-provisioned-throughput=WORKER_BOOT_DISK_THROUGHPUT \ other args ...
REST API
Set provisioned IOPS and provisioned throughput for cluster nodes with
hyperdisk boot disks using the InstanceGroupConfig.DiskConfig.bootDiskProvisionedIops
and InstanceGroupConfig.DiskConfig.bootDiskProvisionedThroughput
fields for
the master and worker configs.