To supplement the boot disk, you can attach
local Solid State Drives (local SSDs)
to master, primary worker, and secondary worker nodes in your cluster. Local
SSDs can provide faster read and write times than persistent disk. The 375GB
size of each local SSD is fixed, but you can attach multiple local SSDs to
increase SSD storage (see
Adding Local SSDs). Each local SSD
is mounted to /mnt/<id>
in Dataproc cluster nodes.
When local SSDs are provided to the cluster, both HDFS and scratch data,
such as shuffle outputs, will use the local SSDs instead of the boot
Persistent Disk.
Using local SSDs
gcloud command
Use the
gcloud dataproc clusters create
command with the --num-master-local-ssds
,
--num-workers-local-ssds
, and
--num-secondary-worker-local-ssds
flags to attach local
SSDs to the cluster's master, primary, and secondary (preemptible) worker
nodes, respectively.
Example:
gcloud dataproc clusters create cluster-name \ --region=region \ --num-master-local-ssds=1 \ --num-worker-local-ssds=1 \ --num-secondary-worker-local-ssds=1 \ ... other args ...
REST API
Set the
numLocalSsds
field in the masterConfig
, workerConfig
, and
secondaryWorkerConfig
InstanceGroupConfig
in a
cluster.create
API request to attach local SSDs to the cluster's master, primary worker, and
secondary (preemptible) worker nodes, respectively.
Console
Create a cluster and attach local SSDs to the master, primary, and secondary worker nodes from the Configure nodes panel of the Dataproc Create a cluster page of the Google Cloud Console.