Running preemptible VMs

This page provides an overview of preemptible VMs support in Google Kubernetes Engine.

Overview

Preemptible VMs are Google Compute Engine VM instances that last a maximum of 24 hours and provide no availability guarantees. Preemptible VMs are priced lower than standard Compute Engine VMs and offer the same machine types and options.

You can use preemptible VMs in your GKE clusters or node pools to run batch or fault-tolerant jobs that are less sensitive to the ephemeral, non-guaranteed nature of preemptible VMs.

To learn more about preemptible VMs, refer to Preemptible VMs in the Compute Engine documentation.

How preemptible VMs work

When GKE clusters or node pools create Compute Engine VMs, the VMs behave like a managed instance group. Preemptible VMs in GKE are subject to the same limitations as preemptible instances in a managed instance group. Preemptible instances terminate after 30 seconds upon receiving a preemption notice.

Additionally, these preemptible VMs are given a Kubernetes label, cloud.google.com/gke-preemptible=true. Kubernetes labels can be used in the nodeSelector field for scheduling Pods to specific nodes.

Here is an example selector for filtering preemptible VMs:

apiVersion: v1
kind: Pod
spec:
  nodeSelector:
    cloud.google.com/gke-preemptible: "true"

Kubernetes Constraint Violations

Using preemptible VMs on GKE invalidates some Kubernetes guarantees. The following constraints are modified by preemptible VMs:

  • Node preemption shutdowns Pods ungracefully and ignores the configured Pod grace period. This means Pods shut down without notice.

  • According to the Pod disruption budget documentation, "The budget can only protect against voluntary evictions, not all causes of unavailability." Preemption is not voluntary, so you may experience greater unavailability than what is specified in the Pod disruption budget.

Best practices

Because preemptible VMs have no availability guarantees, you should design your system under the assumption that any or all of your Compute Engine instances might be preempted and become unavailable. There are no guarantees as to when new instances become available.

Moreover, there is no guarantee that Pods running on preemptible VMs can always shutdown gracefully. It may take several minutes for GKE to detect that the node was preempted and that the Pods are no longer running, which will delay the rescheduling of the Pods to a new node.

If you want to ensure that your jobs or workloads are processed even if no preemptible VMs are available, you can create both non-preemptible and preemptible node pools in your cluster.

Although node names generally stay the same stay the same if and when they are replaced after preemption, the internal and external Preemptible VM IPs may change upon preemption.

Do not use Preemptible VMs with stateful Pods because they could violate the at-most-one semantics inherent to StatefulSets and could lead to data loss.

Using node taints to avoid scheduling to preemptible VM nodes

You should avoid having critical Pods scheduled on a preemptible VM node. You can use a node taint and toleration to avoid scheduling Pods to nodes with preemptible VMs.

Tainting a node for preemptible VMs

To add a node taint for a node with preemptible VMs, run the following command:

kubectl taint nodes [NODE_NAME] cloud.google.com/gke-preemptible="true":NoSchedule

Now, only Pods that tolerate the node taint are scheduled to the node.

Adding toleration to Pods

To add the relevant toleration to your Pods, add the following to your Pod's specification or your object's Pod template specification:

tolerations:
- key: cloud.google.com/gke-preemptible
  operator: Equal
  value: "true"
  effect: NoSchedule

GPU Preemptible Node Taints

You should create the cluster with non-preemptible nodes before adding a preemptible GPU node pool. This ensures that there is always a node pool of standard VMs on which to run system components like DNS before adding preemptible GPU node pools.

If there are no other node pools on the cluster when a preemptible node pool with GPUs is added to the cluster, including if a cluster is created initially with a preemptible GPU node pool, it will not have the normal "nvidia.com/gpu":NoSchedule taint assigned. This means that system Pods will be scheduled on the preemptible nodes which can be disruptive when they get preempted. These Pods also consume resources on GPU nodes. This wastes not only capacity but also money because GPU nodes are more expensive than non-GPU nodes.

Creating a cluster or node pool with preemptible VMs

You can use the gcloud command-line tool or Cloud Console to create a cluster or node pool with preemptible VMs.

gcloud

You can create a cluster or node pool with preemptible VMs by specifying the --preemptible flag.

To create a cluster with preemptible VMs, run the following command:

gcloud container clusters create [CLUSTER_NAME] --preemptible

where [COMPUTE_ZONE] is the cluster's compute zone.

To create a node pool with preemptible VMs:

gcloud container node-pools create [POOL_NAME] --preemptible \
--cluster [CLUSTER_NAME]

Console

  1. Visit the Google Kubernetes Engine menu in Cloud Console.

    Visit the Google Kubernetes Engine menu

  2. Click Create cluster.

  3. Choose the Standard cluster template or choose an appropriate template for your workload.

  4. Configure your cluster as desired. Then, click More options for the node pool you want to configure.

  5. In the Preemptible nodes section, select Enable preemptible nodes.

  6. Click Save to close the node pool modification overlay.

  7. Click Create.

What's next

Kunde den här sidan hjälpa dig? Berätta:

Skicka feedback om ...

Kubernetes Engine Documentation