Run fault-tolerant workloads at lower costs with Spot VMs


This page shows you how to run fault-tolerant, stateless, or batch workloads at lower costs by using Spot VMs in your Google Kubernetes Engine (GKE) clusters and node pools.

Overview

Spot VMs are Compute Engine virtual machines (VMs) that are priced lower than the default standard VMs and provide no guarantee of availability. Spot VMs offer the same machine types and options as standard Compute Engine VMs. Compute Engine can reclaim Spot VMs at any time due to system events, such as when the resources are needed for standard VMs.

To learn more about Spot VMs in GKE, see Spot VMs.

Spot VMs replace the need to use preemptible VMs to run stateless, batch, or fault-tolerant workloads. In contrast to preemptible VMs, which expire after 24 hours, Spot VMs have no expiration time. Spot VMs are terminated when Compute Engine requires the resources to run standard VMs.

Spot VMs are also supported on GKE Autopilot clusters through Spot Pods. With Spot Pods, Autopilot automatically schedules and manages workloads on Spot VMs.

Limitations

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Create a cluster with Spot VMs

You can create a new cluster using Spot VMs with the Google Cloud CLI or the Google Cloud console.

gcloud

Create a new cluster which uses Spot VMs in the default node pool instead of standard VMs:

gcloud container clusters create CLUSTER_NAME \
    --spot

Replace CLUSTER_NAME with the name of your new cluster.

Console

To create a new cluster with a node pool using Spot VMs, perform the following steps:

  1. Go to the Google Kubernetes Engine page in the Google Cloud console.

    Go to Google Kubernetes Engine

  2. Click Create.

  3. On the Create cluster dialog, next to GKE Standard, click Configure.

  4. From the navigation menu, in the Node pools section, click the name of the node pool you want to configure, and then click Nodes.

  5. Select the Enable Spot VMs checkbox.

  6. Configure the cluster as needed, and then click Create.

Create a node pool with Spot VMs

You can create new node pools using Spot VMs with the gcloud CLI or Google Cloud console. You can only enable Spot VMs on new node pools. You cannot enable or disable Spot VMs on existing node pools.

gcloud

Create a new node pool using Spot VMs:

gcloud container node-pools create POOL_NAME \
    --cluster=CLUSTER_NAME \
    --spot

Replace POOL_NAME with the name of your new node pool.

Console

To create a new node pool using Spot VMs, perform the following steps:

  1. Go to the Google Kubernetes Engine page in the Google Cloud console.

    Go to Google Kubernetes Engine

  2. In the cluster list, click the name of the cluster you want to modify.

  3. Click Add node pool.

  4. From the navigation menu, click Nodes.

  5. Select the Enable Spot VMs checkbox.

  6. Configure the node pool as needed, and then click Create.

Schedule workloads on Spot VMs

GKE adds the cloud.google.com/gke-spot=true and cloud.google.com/gke-provisioning=spot (for nodes running GKE version 1.25.5-gke.2500 or later) labels to nodes that use Spot VMs. You can filter for this label in your Pod spec using either the nodeSelector field in your Pod spec or node affinity.

In the following example, you create a cluster with two node pools, one of which uses Spot VMs. Then, you deploy a stateless nginx application onto the Spot VMs, using a nodeSelector to control where GKE places the Pods.

  1. Create a new cluster with the default node pool using standard VMs:

    gcloud container clusters create CLUSTER_NAME
    

    Replace CLUSTER_NAME with the name of your new cluster.

  2. Get credentials for the cluster:

    gcloud container clusters get-credentials CLUSTER_NAME
    
  3. Create a node pool using Spot VMs:

    gcloud container node-pools create POOL_NAME \
        --num-nodes=1 \
        --spot
    

    Replace POOL_NAME with the name of your new node pool.

  4. Save the following manifest as a file named pi-app.yaml:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: pi
    spec:
      template:
        metadata:
          labels:
            app: pi
        spec:
          nodeSelector:
            cloud.google.com/gke-spot: "true"
          terminationGracePeriodSeconds: 25
          containers:
          - name: pi
            image: perl:5.34.0
            command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
          restartPolicy: Never
      backoffLimit: 4
    

    In this manifest, the nodeSelector field tells GKE to only schedule Pods on nodes that use Spot VMs.

  5. Apply the manifest to your cluster:

    kubectl apply -f pi-app.yaml
    
  6. Describe the Pod:

    kubectl describe pod pi
    

    The output is similar to the following:

    Name:         pi-kjbr9
    Namespace:    default
    Priority:     0
    Node:         gke-cluster-2-spot-pool-fb434072-44ct
    ...
    Labels:       app=pi
                  job-name=pi
    Status:       Succeeded
    ...
    Controlled By:  Job/pi
    Containers:
    ...
    Conditions:
      Type              Status
      Initialized       True 
      Ready             False 
      ContainersReady   False 
      PodScheduled      True 
    Volumes:
    ...
    Node-Selectors:              cloud.google.com/gke-spot=true
    Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
    Events:
      Type    Reason     Age    From               Message
      ----    ------     ----   ----               -------
      Normal  Scheduled  4m3s   default-scheduler  Successfully assigned default/pi-kjbr9 to gke-cluster-2-spot-pool-fb434072-44ct
      Normal  Pulling    4m2s   kubelet            Pulling image "perl:5.34.0"
      Normal  Pulled     3m43s  kubelet            Successfully pulled image "perl:5.34.0" in 18.481761978s
      Normal  Created    3m43s  kubelet            Created container pi
      Normal  Started    3m43s  kubelet            Started container pi
    

    The Node field shows that GKE only schedules your Pods on nodes that use Spot VMs.

Use taints and tolerations for Spot VMs

As a best practice, create clusters with at least one node pool without Spot VMs where you can place system workloads like DNS. You can use node taints and the corresponding tolerations to tell GKE to avoid placing certain workloads on Spot VMs.

  1. To create a node pool with nodes that use Spot VMs and have node taints, use the --node-taints flag when creating the node pool:

    gcloud container node-pools create POOL_NAME \
        --node-taints=cloud.google.com/gke-spot="true":NoSchedule
        --spot
    
  2. To add the corresponding toleration to the Pods that you want to schedule to Spot VMs, modify your deployments and add the following to your Pod specification:

    tolerations:
    - key: cloud.google.com/gke-spot
      operator: Equal
      value: "true"
      effect: NoSchedule
    

    GKE only schedules Pods with this toleration onto the Spot VMs with the added node taint.

What's next