Preemptible VMs

In addition to using standard Compute Engine virtual machines (VMs), Cloud Dataproc clusters can use preemptible VM instances, also known as preemptible VMs. You may decide to use preemptible instances to lower per-hour compute costs for non-critical data processing or to create very large clusters at a lower total cost. See the Cloud Dataproc pricing documentation for more information.

How preemptibles work with Cloud Dataproc

All preemptible instances added to a cluster use the machine type of the cluster's non-preemptible worker nodes. For example, if you create a cluster with workers that use n1-standard-4 machine types, all preemptible instances added to the cluster will also use n1-standard-4 machines. The addition or removal of preemptible workers from a cluster does not affect the number of non-preemptible workers in the cluster.

Because preemptible instances are reclaimed if they are required for other tasks, Cloud Dataproc adds preemptible instances as secondary workers in a managed instance group, which contains only preemptible workers. The managed group automatically re-adds workers lost due to reclamation as capacity permits. For example, if two preemptible machines are reclaimed and removed from a cluster, these instances will be re-added to the cluster if and when capacity is available to re-add them.

The following rules will apply when you use preemptible workers with a Cloud Dataproc cluster:

  • Processing only—Since preemptibles can be reclaimed at any time, preemptible workers do not store data. Preemptibles added to a Cloud Dataproc cluster only function as processing nodes.
  • No preemptible-only clusters—To ensure clusters do not lose all workers, Cloud Dataproc cannot create preemptible-only clusters. If you use the gcloud dataproc clusters create command with --num-preemptible-workers, and you do not also specify a number of standard workers with --num-workers, Cloud Dataproc will automatically add two non-preemptible workers to the cluster.
  • Persistent disk size—As a default, all preemptible workers are created with the smaller of 100GB or the primary worker boot disk size. This disk space is used for local caching of data and is not available through HDFS. You can override the default disk size with the gcloud dataproc clusters create --premptible-worker-boot-disk-size command at cluster creation. This flag can be specified even if cluster does not have any preemptible workers at creation time.

Using preemptibles in a cluster

You can use the Google Cloud Platform Console or the Google Cloud SDK to add, update, and remove preemptible instances used in a Cloud Dataproc cluster.

Using the Google Cloud Platform Console

When creating a Cloud Dataproc cluster from the Cloud Platform Console, you can specify the number of preemptible workers. After a cluster has been created, you can add and remove preemptible workers by editing the cluster from the Cloud Platform Console.

Creating a cluster with preemptible instances

The Create a Cloud Dataproc cluster page has an expandable panel titled, "Preemptible workers, bucket, network, version, initialization, & access options."

Open this panel to view the Preemptible worker nodes section. Add preemptible workers to the new cluster by specifying a positive number in the Nodes field.

Updating a cluster with preemptible instances

After a cluster is created, you can edit the number of preemptible workers in a cluster by clicking the Edit button on the Configuration tab on the cluster detail page.

To change the number of preemptible workers, specify a new value in the Preemptible worker nodes field.

Click Save to update the cluster.

Removing preemptible instances from a cluster

To remove all preemptible instances from your cluster, update the cluster as explained above, specifying 0 in the Preemptible worker nodes field.

Using the Google Cloud SDK

Use the gcloud dataproc clusters create command to add preemptible instances to a cluster when the cluster is created. After a cluster is created, you can add or remove preemptibles to or from the cluster with the gcloud dataproc clusters update command. Both commands use the --num-preemptible-workers argument to specify the number of preemptible instances to use in the cluster.

Creating a cluster with preemptible instances

To create a cluster with preemptible instances, use the gcloud dataproc clusters create command with the --num-preemptible-workers argument.

For example, the following commands creates a cluster named "my-test-cluster" with two preemptible (and two non-preemptible) instances.

gcloud dataproc clusters create my-test-cluster --num-preemptible-workers 2
Waiting on operation [operations/projects/project-id/operations/...].
Waiting for cluster update operation...done.
Updated [https://dataproc.googleapis.com/...].
clusterName: my-test-cluster
  ...
secondaryWorkerConfiguration:
    - dataproc-1-sw-2skd
    - dataproc-1-sw-l20p
    isPreemptible: true
...

Updating a cluster with preemptible instances

To update a cluster to add or remove preemptible instances, use the gcloud dataproc clusters update command with the --num-preemptible-workers argument. For example, the following command updates a cluster named "my-test-cluster" to use two preemptible instances.

gcloud dataproc clusters update my-test-cluster --num-preemptible-workers 2
Waiting on operation [operations/projects/project-id/operations/...].
Waiting for cluster update operation...done.
Updated [https://dataproc.googleapis.com/...].
clusterName: my-test-cluster
  ...
secondaryWorkerConfiguration:
    - dataproc-1-sw-2skd
    - dataproc-1-sw-l20p
    isPreemptible: true
...

Removing preemptible instances from a cluster

To remove all preemptible workers from a cluster, use the gcloud alpha dataproc clusters update command with --num-preemptible-workers set to 0. For example, the following command removes all preemptible workers from my-test-cluster.

gcloud dataproc clusters update my-test-cluster --num-preemptible-workers 0

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Google Cloud Dataproc Documentation