This page shows you how to run fault-tolerant workloads at lower costs by using Spot Pods in your Google Kubernetes Engine (GKE) Autopilot clusters.
Overview
In GKE Autopilot clusters, Spot Pods are Pods that run on nodes backed by Compute Engine Spot VMs. Spot Pods are priced lower than standard Autopilot Pods, but can be evicted by GKE whenever compute resources are required to run standard Pods.
Spot Pods are ideal for running stateless, batch, or fault-tolerant workloads at lower costs compared to running those workloads as standard Pods. To use Spot Pods in Autopilot clusters, modify the manifest with your Pod specification to request Spot Pods.
You can run Spot Pods on the default general-purpose Autopilot compute class as well as on specialized compute classes that meet specific hardware requirements. For information about these compute classes, refer to Compute classes in Autopilot.
To learn more about the pricing for Spot Pods in Autopilot clusters, see Google Kubernetes Engine pricing.
Spot Pods are excluded from the Autopilot Service Level Agreement.
Benefits
Using Spot Pods in your Autopilot clusters provides you with the following benefits:
- Lower pricing than running the same workloads on standard Autopilot Pods.
- GKE automatically manages autoscaling and scheduling.
- GKE automatically taints nodes that run Spot Pods to ensure that standard Pods, like your critical workloads, aren't scheduled on those nodes. Your deployments that do use Spot Pods are automatically updated with a corresponding toleration.
Before you begin
Before you start, make sure you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running
gcloud components update
.
Request Spot Pods in your Autopilot workloads
To request that your Pods run as Spot Pods, use the
cloud.google.com/gke-spot=true
label in a
nodeSelector
or node
affinity
in your Pod specification. GKE automatically provisions nodes
that can run Spot Pods.
Spot Pods can be evicted and terminated at any time, for example if the
compute resources are required elsewhere in Google Cloud. When a termination
occurs, Spot Pods on the terminating node can request up to a 15 second
grace period before termination, which is granted on a best effort basis, by
specifying the terminationGracePeriodSeconds
field.
The maximum grace period given to Spot Pods during preemption is 15
seconds. Requesting more than 15 seconds in terminationGracePeriodSeconds
doesn't grant more than 15 seconds during preemption. On eviction, your Pod is
sent the SIGTERM
signal,
and should take steps to shutdown during the grace period.
For Autopilot, GKE also automatically taints the nodes created to run Spot Pods and modifies those workloads with the corresponding toleration. The taint prevents standard Pods from being scheduled on nodes that run Spot Pods.
Use a nodeSelector to require Spot Pods
You can use a nodeSelector to require Spot Pods in a Deployment. Add the
cloud.google.com/gke-spot=true
label to your Deployment, such as in the following
example:
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
metadata:
labels:
app: pi
spec:
nodeSelector:
cloud.google.com/gke-spot: "true"
terminationGracePeriodSeconds: 15
containers:
- name: pi
image: perl:5.34.0
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4
Use node affinity to request Spot Pods
Alternatively, you can use node affinity to request Spot Pods. Node affinity provides you with a more extensible way to select nodes to run your workloads. For example, you can combine several selection criteria to get finer control over where your Pods run. When you use node affinity to request Spot Pods, you can specify the type of node affinity to use, as follows:
requiredDuringSchedulingIgnoredDuringExecution
: Must use Spot Pods.preferredDuringSchedulingIgnoredDuringExecution
: Use Spot Pods on a best-effort basis.
To use node affinity to require Spot Pods in a Deployment, add the
following nodeAffinity
rule to your Deployment manifest:
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
metadata:
labels:
app: pi
spec:
terminationGracePeriodSeconds: 15
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-spot
operator: In
values:
- "true"
containers:
- name: pi
image: perl:5.34.0
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4
Requesting Spot Pods on a best-effort basis
To use node affinity to request Spot Pods on a best-effort basis, use
preferredDuringSchedulingIgnoredDuringExecution
.
When you request Spot Pods on a preferred basis, GKE
schedules your Pods based on the following order:
- Existing nodes that can run Spot Pods that have available allocatable capacity.
- Existing standard nodes that have available allocatable capacity.
- New nodes that can run Spot Pods, if the compute resources are available.
- New standard nodes.
Because GKE prefers existing standard nodes that have allocatable capacity over creating new nodes for Spot Pods, you might notice more Pods running as standard Pods than as Spot Pods, which prevents you from taking full advantage of the lower pricing of Spot Pods.
Requests for preemptible Pods
Autopilot clusters support requests for preemptible Pods using the
cloud.google.com/gke-preemptible
selector. Pods that use this selector are
automatically migrated to Spot Pods, and the selector is changed to
cloud.google.com/gke-spot
.
Find and delete terminated Pods
During graceful Pod termination, the kubelet assigns a Failed
status and a
Shutdown
reason to the terminated Pods. When the number of terminated Pods
reaches a threshold of 1000, garbage
collection
cleans up the Pods. You can also delete shutdown Pods manually using the
following command:
kubectl get pods --all-namespaces | grep -i shutdown | awk '{print $1, $2}' | xargs -n2 kubectl delete pod -n
What's next
- Learn more about Autopilot cluster architecture.
- Learn about the lifecycle of Pods.
- Read about Spot VMs in GKE Standard clusters.