This page provides an overview of preemptible virtual machine (VM) support in Google Kubernetes Engine (GKE).
Preemptible VMs are Compute Engine VM instances that last a maximum of 24 hours in general, and provide no availability guarantees. Preemptible VMs are priced lower than standard Compute Engine VMs and offer the same machine types and options.
You can use preemptible VMs in your GKE clusters or node pools to run batch or fault-tolerant jobs that are less sensitive to the ephemeral, non-guaranteed nature of preemptible VMs.
To learn more about preemptible VMs, refer to Preemptible VMs in the Compute Engine documentation.
How preemptible VMs work
When GKE clusters or node pools create Compute Engine VMs, the VMs behave like a managed instance group. Preemptible VMs in GKE are subject to the same limitations as preemptible instances in a managed instance group. Preemptible instances terminate after 30 seconds upon receiving a preemption notice.
Here is an example selector for filtering preemptible VMs:
apiVersion: v1 kind: Pod spec: nodeSelector: cloud.google.com/gke-preemptible: "true"
Kubernetes preemptible nodes
On preemptible GKE nodes running versions 1.20 or later, the kubelet graceful node shutdown feature is enabled by default. As a result, kubelet detects preemption and gracefully terminates Pods.
On a best effort basis, user Pods will be given 25 seconds for graceful
termination (requested with
terminationGracePeriodSeconds) and then system
Pods (those with the
priority classes) will be given 5 seconds for their graceful termination.
For Pods on preemptible nodes, do not specify more than 25 seconds for
terminationGracePeriodSeconds because those Pods will only receive 25
seconds during preemption.
Kubernetes constraint violations
Using preemptible VMs on GKE invalidates some Kubernetes guarantees. The following constraints are modified by preemptible VMs:
- Node preemption shuts down Pods without notice and ignores the configured Pod grace period if Graceful Node Shutdown is not enabled (prior to 1.20).
- According to the Pod disruption budget documentation, "The budget can only protect against voluntary evictions, not all causes of unavailability." Preemption is not voluntary, so you may experience greater unavailability than what is specified in the Pod disruption budget.
Because preemptible VMs have no availability guarantees, you should design your system under the assumption that any or all of your Compute Engine instances might be preempted and become unavailable. There are no guarantees as to when new instances become available.
Moreover, there is no guarantee that Pods running on preemptible VMs can always shutdown gracefully. It may take several minutes for GKE to detect that the node was preempted and that the Pods are no longer running, which will delay the rescheduling of the Pods to a new node.
If you want to ensure that your jobs or workloads are processed even if no preemptible VMs are available, you can create both non-preemptible and preemptible node pools in your cluster.
Although node names generally stay the same if and when they are replaced after preemption, the internal and external Preemptible VM IPs may change upon preemption.
Do not use Preemptible VMs with stateful Pods because they could violate the at-most-one semantics inherent to StatefulSets and could lead to data loss.
Using node taints to avoid scheduling to preemptible VM nodes
To avoid system disruptions, use a node taint to ensure critical Pods are not scheduled on a preemptible VM node.
If you apply node tainting, make sure your cluster also has non-preemptible non- tainted nodes, so that there is always a node pool of standard VMs on which to run system components like DNS.
Tainting a node for preemptible VMs
To add a node taint for a node with preemptible VMs, run the following command:
kubectl taint nodes node-name cloud.google.com/gke-preemptible="true":NoSchedule
where node-name is the name of the node.
Now, only Pods that tolerate the node taint are scheduled to the node.
Adding toleration to Pods
To add the relevant toleration to your Pods, add the following to your Pod's specification or your object's Pod template specification:
tolerations: - key: cloud.google.com/gke-preemptible operator: Equal value: "true" effect: NoSchedule
GPU Preemptible Node Taints
You should create the cluster with non-preemptible nodes before adding a preemptible GPU node pool. This ensures that there is always a node pool of standard VMs on which to run system components like DNS before adding preemptible GPU node pools.
If there are no other node pools on the cluster when a preemptible node pool
with GPUs is added to the cluster, including if a cluster is created initially
with a preemptible GPU node pool, it will not have the normal
"nvidia.com/gpu":NoSchedule taint assigned. This means that system Pods will
be scheduled on the preemptible nodes which can be disruptive when they get
preempted. These Pods also consume resources on GPU nodes. This wastes not only
capacity but also money because GPU nodes are more expensive than non-GPU nodes.
Creating a cluster or node pool with preemptible VMs
You can use the
gcloud command-line tool or Cloud Console to create a
cluster or node pool with preemptible VMs.
You can create a cluster or node pool with preemptible VMs by specifying the
To create a cluster with preemptible VMs, run the following command:
gcloud container clusters create cluster-name --preemptible
where cluster-name is the name of the cluster to create.
To create a node pool with preemptible VMs, run the following command:
gcloud container node-pools create pool-name --preemptible \ --cluster cluster-name
- pool-name is the name of the node pool to create.
- cluster-name is the name of the cluster for the node pool.
Visit the Google Kubernetes Engine menu in Cloud Console.
Click add_box Create.
Configure your cluster as desired.
From the navigation pane, under Node Pools, for the node pool you want to configure, click Nodes.
Select the Enable preemptible nodes checkbox.