Extend the run time of Autopilot Pods

Autopilot

This page shows you how to request extended run times for Pods before they're evicted by Google Kubernetes Engine (GKE).

About GKE-initiated Pod eviction

Pod evictions are a normal part of running workloads on Kubernetes. GKE evicts workloads during scheduled events, such as automatic node upgrades and autoscaling scale-downs, to ensure that your nodes are up-to-date and optimized for efficient resource usage. By default, GKE sends a termination signal to the container as soon as the event occurs, after which the container has a grace period to terminate before Kubernetes evicts the Pod. For automatic node upgrades, the grace period can be up to one hour. For scale-down events, the grace period can be up to 10 minutes.

Kubernetes has built-in features that containers can use to gracefully handle evictions, such as PodDisruptionBudgets and graceful termination periods. However, some workloads, such as batch queues or multiplayer game servers, need to run for a longer period of time before being evicted. The default grace period that GKE grants during GKE-initiated evictions might not be enough for these workloads. In these situations, you can tell Autopilot to avoid evicting specific workloads for up to 7 days.

Use cases

Some situations in which you might want to tell GKE to avoid evicting workloads include the following:

You run multiplayer game servers that would kick players out of their sessions if the servers terminated early.
You run audio or video conferencing software that would disrupt in-progress meetings if the servers terminated.
You run tasks that need time to complete, and early termination would cause a loss of in-progress work.
You run a stateful service that is less tolerant to disruption and you want to minimize how often disruptions occur.

Pricing

You can request extended run times for your Pods at no additional charge. However, consider the following behavioral changes that might impact your pricing:

Autopilot clusters enforce higher minimum values for the resource requests of extended duration Pods. Autopilot clusters charge you for the resource requests of your running Pods. You're not charged for system overhead or for unused node capacity.
Using extended duration Pods might increase the number of nodes in your cluster, which might affect IP address usage and scalability. If you have DaemonSets that run on every node, this results in more DaemonSets in the cluster,

For pricing details, see Autopilot pricing.

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.
Note: For existing gcloud CLI installations, make sure to set the compute/region property. If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Ensure that you have an Autopilot cluster running version 1.27 or later.

Limitations

You can't request extended run times for your Spot Pods.
Image pull times are counted when calculating the extended run time.
You can have a maximum of 50 extended duration workloads (with different CPU requests) in each cluster. This means that up to 50 different sets of CPU request values, after passing Autopilot resource minimums, ratios, and increment size checks, can have extended duration in each cluster.
You can't use Kubernetes inter-Pod affinity in extended duration Pods.
Whenever possible, GKE places each extended run time Pod on its own node. This behavior ensures that nodes can scale down if they're under-utilized.
You can't request extended run times for Pods that target custom compute classes.

Request extended run time

To request extended run time for a Pod, set the Kubernetes cluster-autoscaler.kubernetes.io/safe-to-evict annotation to false in the Pod specification.

Save the following manifest as extended-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: extended-pods
  labels:
    duration: extended
spec:
  selector:
    matchLabels:
      duration: extended
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
      labels:
        duration: extended
    spec:
      containers:
      - name: example-container
        image: registry.k8s.io/pause
        resources:
          requests:
            cpu: 200m

Create the Deployment:

kubectl create -f extended-deployment.yaml

The Pods continue to run for at least 7 days before a scale-down or a node auto-upgrade can occur.

Considerations and recommendations

When you use this functionality, consider the following:

Extended duration Pods aren't protected from priority-based eviction. If you use Kubernetes PriorityClasses, consider the following methods to minimize the probability of priority-based eviction:
- Ensure that your extended duration Pods use the highest priority PriorityClass, so that other user Pods don't evict your extended duration Pods.
- Use workload separation to run extended duration Pods separately from other Pods.
System Pods run with the highest priority and will always be able to evict extended duration Pods. To minimize the probability of this, GKE schedules system Pods on the node before scheduling the extended duration Pod.
Extended duration Pods can still be evicted early in the following situations:
- Eviction to make space for higher-priority user Pods (using a higher PriorityClass)
- Eviction to make space for Kubernetes system components
- kubelet out-of-memory eviction if the Pod uses more memory than it requested (OOMKill)
- Compute Engine VM maintenance events. Accelerator-optimized machine types are more likely to be affected by these events because those machines don't support live migration.
- Node auto-repairs
- User-initiated events such as draining a node
You can use the cluster-autoscaler.kubernetes.io/safe-to-evict annotation in Standard clusters, but the result is not the same. Pods run indefinitely even if a scale-down event occurs, preventing deletion of underutilized nodes and resulting in you continuing to pay for those nodes. Pods also aren't protected from evictions caused by node auto-upgrades.

What's next

Use PriorityClasses to provision spare capacity in Autopilot for rapid Pod scaling