In supplement the boot disk, you can attach
local Solid State Drives (local SSDs)
to master, primary worker, and secondary worker nodes in your cluster. Local
SSDs can provide faster read and write times than persistent disk. The size of
each local SSD is fixed, but you can attach multiple local SSDs to
increase SSD storage (see
Adding Local SSDs). Each local SSD
is mounted to
/mnt/<id> in Dataproc cluster nodes.
By default, local SSDs are used for writing and reading Apache Hadoop and Apache
Spark scratch files, such as shuffle outputs.
Using local SSDs
gcloud dataproc clusters create
command with the
--num-preemptible-worker-local-ssds flags to attach local
SSDs to the cluster's master, primary, and secondary (preemptible) worker
gcloud dataproc clusters create cluster-name \ --num-master-local-ssds=1 \ --num-worker-local-ssds=1 \ --num-preemptible-worker-local-ssds=1 \ ... other args ...
field in the
API request to attach local SSDs to the cluster's master, primary worker, and
secondary (preemptible) worker nodes, respectively.
Create a cluster and attach local SSDs to the primary worker node(s) from the Dataproc Create a cluster page of the Google Cloud Console.