This page provides an overview of preemptible VMs support in Google Kubernetes Engine.
Preemptible VMs are Google Compute Engine VM instances that last a maximum of 24 hours and provide no availability guarantees. Preemptible VMs are priced lower than standard Compute Engine VMs and offer the same machine types and options.
You can use preemptible VMs in your GKE clusters or node pools to run batch or fault-tolerant jobs that are less sensitive to the ephemeral, non-guaranteed nature of preemptible VMs.
To learn more about preemptible VMs, refer to Preemptible VMs in the Compute Engine documentation.
How preemptible VMs work
When GKE clusters or node pools create Compute Engine VMs, the VMs behave like a managed instance group. Preemptible VMs in GKE are subject to the same limitations as preemptible instances in a managed instance group. Preemptible instances terminate after 30 seconds upon receiving a preemption notice.
Here is an example selector for filtering preemptible VMs:
apiVersion: v1 kind: Pod spec: nodeSelector: cloud.google.com/gke-preemptible: "true"
Kubernetes Constraint Violations
Using preemptible VMs on GKE invalidates some Kubernetes guarantees. The following constraints are modified by preemptible VMs:
Node preemption shutdowns Pods ungracefully and ignores the configured Pod grace period. This means Pods shut down without notice.
According to the Pod disruption budget documentation, "The budget can only protect against voluntary evictions, not all causes of unavailability." Preemption is not voluntary, so you may experience greater unavailability than what is specified in the Pod disruption budget.
Because preemptible VMs have no availability guarantees, you should design your system under the assumption that any or all of your Compute Engine instances might be preempted and become unavailable. There are no guarantees as to when new instances become available.
Moreover, there is no guarantee that Pods running on preemptible VMs can always shutdown gracefully. It may take several minutes for GKE to detect that the node was preempted and that the Pods are no longer running, which will delay the rescheduling of the Pods to a new node.
If you want to ensure that your jobs or workloads are processed even if no preemptible VMs are available, you can create both non-preemptible and preemptible node pools in your cluster.
Although node names generally stay the same stay the same if and when they are replaced after preemption, the internal and external Preemptible VM IPs may change upon preemption.
Do not use Preemptible VMs with stateful Pods because they could violate the at-most-one semantics inherent to StatefulSets and could lead to data loss.
Using node taints to avoid scheduling to preemptible VM nodes
You should avoid having critical Pods scheduled on a preemptible VM node. You can use a node taint and toleration to avoid scheduling Pods to nodes with preemptible VMs.
Tainting a node for preemptible VMs
To add a node taint for a node with preemptible VMs, run the following command:
kubectl taint nodes [NODE_NAME] cloud.google.com/gke-preemptible="true":NoSchedule
Now, only Pods that tolerate the node taint are scheduled to the node.
Adding toleration to Pods
To add the relevant toleration to your Pods, add the following to your Pod's specification or your object's Pod template specification:
tolerations: - key: cloud.google.com/gke-preemptible operator: Equal value: "true" effect: NoSchedule
GPU Preemptible Node Taints
You should create the cluster with non-preemptible nodes before adding a preemptible GPU node pool. This ensures that there is always a node pool of standard VMs on which to run system components like DNS before adding preemptible GPU node pools.
If there are no other node pools on the cluster when a preemptible node pool
with GPUs is added to the cluster, including if a cluster is created initially
with a preemptible GPU node pool, it will not have the normal
"nvidia.com/gpu":NoSchedule taint assigned. This means that system Pods will
be scheduled on the preemptible nodes which can be disruptive when they get
preempted. These Pods also consume resources on GPU nodes. This wastes not only
capacity but also money because GPU nodes are more expensive than non-GPU nodes.
Creating a cluster or node pool with preemptible VMs
You can use the
gcloud command-line tool or GCP Console to create a
cluster or node pool with preemptible VMs.
You can create a cluster or node pool with preemptible VMs by specifying the
To create a cluster with preemptible VMs, run the following command:
gcloud container clusters create [CLUSTER_NAME] --preemptible
[COMPUTE_ZONE] is the cluster's compute zone.
To create a node pool with preemptible VMs:
gcloud container node-pools create [POOL_NAME] --preemptible \ --cluster [CLUSTER_NAME]
Visit the Google Kubernetes Engine menu in GCP Console.
Click Create cluster.
Choose the Standard cluster template or choose an appropriate template for your workload.
Configure your cluster as desired. Then, click More options for the node pool you want to configure.
In the Preemptible nodes section, select Enable preemptible nodes.
Click Save to close the node pool modification overlay.