Vertical Pod autoscaling

Autopilot Standard

This page explains how you can analyze and optimize your resource allocation to improve your workload efficiency in Google Kubernetes Engine (GKE), using vertical Pod autoscaling. By analyzing your workload's resource usage over time, you can get optimization recommendations and automatically adjust CPU and memory requests, and limits for containers within the Pods.

In this page, you learn how vertical Pod autoscaling works, its benefits and limitations, best practices for using it, and access the API references for the VerticalPodAutoscaler custom resource and related types.

This page is for Operators and Developers who provision and configure cloud resources, deploy workloads, and manage application scaling. To learn more about common roles, see Common GKE user roles and tasks.

Before reading this page, ensure that you are familiar with resource requests and limits in Kubernetes.

For rapid scaling needs in response to sudden resource usage, use the Horizontal Pod Autoscaler.

To learn best practices for autoscaling, see Best practices for running cost-optimized Kubernetes applications on GKE.

How vertical Pod autoscaling works

Vertical Pod autoscaling lets you analyze and set CPU and memory resources required by Pods. Instead of having to set up-to-date CPU requests and limits and memory requests and limits for the containers in your Pods, you can configure vertical Pod autoscaling to provide recommended values for CPU and memory requests and limits that you can use to manually update your Pods, or you can configure vertical Pod autoscaling to automatically update the values.

Vertical Pod autoscaling is enabled by default in Autopilot clusters.

Vertical Pod autoscaling modes

You can configure how vertical Pod autoscaling applies resource changes by applying different update modes.

`Auto` (`Recreate`) mode

In Recreate mode, vertical Pod autoscaling evicts a Pod if it needs to change the Pod's resource requests. Eviction is necessary because, due to Kubernetes limitations in versions earlier than 1.33, the only way to modify the resource requests of a running Pod is to re-create it.

To limit the amount of Pod recreations, use a Pod disruption budget. To ensure that your cluster can handle the new sizes of your workloads, use cluster autoscaler and node auto-provisioning.

Vertical Pod autoscaling notifies the cluster autoscaler ahead of the update, and provides the resources needed for the resized workload before re-creating the workload, to minimize the disruption time.

`Initial` mode

With Initial enabled, vertical Pod autoscaling only assigns resource requests on Pod creation and never changes them later.

`InPlaceOrRecreate` mode

The InPlaceOrRecreate mode aims to reduce service disruption by attempting to update Pod resources without recreating the Pod. Although this mode does not guarantee zero disruptions, it can help reduce disruptions while still benefiting from vertical Pod autoscaling.

To use InPlaceOrRecreate mode, set the spec.updatePolicy.updateMode field to "InPlaceOrRecreate" in your VerticalPodAutoscaler object.

If vertical Pod autoscaling determines that an in-place update is not possible, it reverts to the Auto mode behavior, which evicts and re-creates the Pod to apply the changes.

The InPlaceOrRecreate mode is available with Kubernetes version 1.34.0-gke.2011000 and later.

For more details on the behavior and existing limitations of InPlaceOrRecreate mode, see the Kubernetes announcement for in-place updates.

`Off` mode

In Off mode, vertical Pod autoscaling doesn't automatically apply any changes to a Pod. You can still view recommended values for the requests and limits of both CPU and memory based on historical usage, but these recommendations aren't applied for you. You can manually apply the recommended values to your Pods, if needed.

Benefits

Vertical Pod autoscaling provides the following benefits:

Setting the right resource requests and limits for your workloads improves stability and cost efficiency. If your Pod resource sizes are smaller than your workloads require, your application can either be throttled or it can fail due to out-of-memory errors. If your resource sizes are too large, you have waste and, therefore, larger bills.
Cluster nodes are used efficiently because Pods use exactly what they need.
Pods are scheduled onto nodes that have the appropriate resources available.
You don't have to run time-consuming benchmarking tasks to determine the correct values for CPU and memory requests.
Reduced maintenance time because the autoscaler can adjust CPU and memory requests over time without any action on your part.

GKE vertical Pod autoscaling provides the following benefits over the Kubernetes open source autoscaler:

Takes maximum node size and resource quotas into account when determining the recommendation target.
Notifies the cluster autoscaler to adjust cluster capacity.
Uses historical data, providing metrics collected before you enable the Vertical Pod Autoscaler.
Runs Vertical Pod Autoscaler Pods as control plane processes, instead of deployments on your worker nodes.

Limitations

To use vertical Pod autoscaling with horizontal Pod autoscaling, use multidimensional Pod autoscaling. You can also use vertical Pod autoscaling with horizontal Pod autoscaling on custom and external metrics.
Vertical Pod autoscaling is not ready for use with JVM-based workloads due to limited visibility into actual memory usage of the workload.
Vertical Pod autoscaling has a default setting of two minimum replicas for Deployments to replace Pods with revised resource values. In GKE version 1.22 and later, you can override this setting by specifying a value for minReplicas in the PodUpdatePolicy field.
If you use the InPlaceOrRecreate update mode of vertical Pod autoscaling and an in-place update isn't possible (for example, when upscaling the Pod beyond the node capacity), vertical Pod autoscaling evicts and re-creates the Pod to apply the recommendation. Eviction and re-creation occurs even for Pods that have a resizePolicy set in their specification to avoid recreations. This behavior occurs for Autopilot resize requests, including when applying minimum resources and CPU:memory ratio constraints.
Vertical Pod autoscaling requires a workload object that manages the Pods, such as a Deployment, StatefulSet, ReplicaSet, or ReplicationControllers. You can't use vertical Pod autoscaling with standalone Pods because a workload controller is required to manage the Pod re-creation process.

Best practices

To avoid cluster update disruptions, we recommend that you keep the number of VerticalPodAutoscaler objects per cluster under 1,000.
Vertical Pod autoscaling works best with long-running homogenous workloads.

API reference

This is the v1 API reference. We strongly recommend using this version of the API.

VerticalPodAutoscaler v1 autoscaling.k8s.io

Fields
	`TypeMeta` API group, version, and kind.
`metadata`	`ObjectMeta` Standard object metadata.
`spec`	`VerticalPodAutoscalerSpec` The desired behavior of the `VerticalPodAutoscaler`.
`status`	`VerticalPodAutoscalerStatus` The most recently observed status of the `VerticalPodAutoscaler`.

VerticalPodAutoscalerSpec v1 autoscaling.k8s.io

Fields
`targetRef`	`CrossVersionObjectReference` Reference to the controller that manages the set of Pods for the autoscaler to control, for example, a Deployment or a StatefulSet. You can point a `VerticalPodAutoscaler` at any controller that has a Scale subresource. Typically, the `VerticalPodAutoscaler` retrieves the Pod set from the controller's ScaleStatus. For some well known controllers, for example DaemonSet, the `VerticalPodAutoscaler` retrieves the Pod set from the controller's spec.
`updatePolicy`	`PodUpdatePolicy` Specifies whether recommended updates are applied when a Pod is started and whether recommended updates are applied during the life of a Pod.
`resourcePolicy`	`PodResourcePolicy` Specifies policies for how CPU and memory requests are adjusted for individual containers. The resource policy can be used to set constraints on the recommendations for individual containers. If not specified, the autoscaler computes recommended resources for all containers in the Pod, without additional constraints.
`recommenders`	`VerticalPodAutoscalerRecommenderSelector array` Recommender responsible for generating recommendation for this VPA object. Leave empty to use the default recommender provided by GKE. Otherwise the list can contain exactly one entry for a user-provided alternative recommender. Supported since GKE 1.22.

VerticalPodAutoscalerList v1 autoscaling.k8s.io

Fields

TypeMeta

API group, version, and kind.

metadata

ObjectMeta

Standard object metadata.

items

VerticalPodAutoscaler array

A list of VerticalPodAutoscaler objects.

PodUpdatePolicy v1 autoscaling.k8s.io

Fields

Fields
`updateMode`	`string` Specifies whether recommended updates are applied when a Pod is started and whether recommended updates are applied during the life of a Pod. Possible values are the following: `"Off"`: recommended updates are generated, but aren't automatically applied to the Pod. `"Initial"`: recommended updates are applied only when a Pod is first started. Updates that happen while the Pod is already running aren't automatically applied. `"Recreate"`: recommended updates are applied by re-creating the Pod. The existing Pod is terminated, and a new Pod with the updated configuration is created. `"Auto"`: the default value that essentially enforces the `"Recreate"` mode. `"InPlaceOrRecreate"`: recommended updates are applied without re-creating the Pod, if possible.
`minReplicas`	`int32` The minimum number of replicas which need to be alive to attempt Pod eviction (pending other checks like Pod Disruption Budget). Only positive values are allowed. Defaults to `2`. Supported since GKE 1.22.

updateMode

string

Specifies whether recommended updates are applied when a Pod is started and whether recommended updates are applied during the life of a Pod. Possible values are the following:

"Off": recommended updates are generated, but aren't automatically applied to the Pod.
"Initial": recommended updates are applied only when a Pod is first started. Updates that happen while the Pod is already running aren't automatically applied.
"Recreate": recommended updates are applied by re-creating the Pod. The existing Pod is terminated, and a new Pod with the updated configuration is created.
"Auto": the default value that essentially enforces the "Recreate" mode.
"InPlaceOrRecreate": recommended updates are applied without re-creating the Pod, if possible.

minReplicas

int32

The minimum number of replicas which need to be alive to attempt Pod eviction (pending other checks like Pod Disruption Budget). Only positive values are allowed. Defaults to 2. Supported since GKE 1.22.

PodResourcePolicy v1 autoscaling.k8s.io

Fields

Fields
`containerPolicies`	`ContainerResourcePolicy array` An array of resource policies for individual containers. There can be at most one entry for every named container and optionally a single wildcard entry with `containerName = '*'`, which handles all containers that do not have individual policies.

containerPolicies

ContainerResourcePolicy array

An array of resource policies for individual containers. There can be at most one entry for every named container and optionally a single wildcard entry with `containerName = '*'`, which handles all containers that do not have individual policies.

ContainerResourcePolicy v1 autoscaling.k8s.io

Fields
`containerName`	`string` The name of the container that the policy applies to. If not specified, the policy serves as the default policy.
`mode`	`ContainerScalingMode` Specifies whether recommended updates are applied to the container when it is started and whether recommended updates are applied during the life of the container. Possible values are "Off" and "Auto". The default is "Auto" if you don't specify a value.
`minAllowed`	`ResourceList` Specifies the minimum CPU request and memory request allowed for the container. By default, there is no minimum applied.
`maxAllowed`	`ResourceList` Specifies the maximum CPU request and memory request allowed for the container. By default, there is no maximum applied.
`ControlledResources`	`[]ResourceName` Specifies the type of recommendations that will be computed (and possibly applied) by the `VerticalPodAutoscaler`. If empty, the default of `[ResourceCPU, ResourceMemory]` is used.

VerticalPodAutoscalerRecommenderSelector v1 autoscaling.k8s.io

Fields

Fields
`name`	`string` Name of the recommender responsible for generating recommendation for this object.

name

string

Name of the recommender responsible for generating recommendation for this object.

VerticalPodAutoscalerStatus v1 autoscaling.k8s.io

Fields

Fields
`recommendation`	`RecommendedPodResources` The most recently recommended CPU and memory requests.
`conditions`	`VerticalPodAutoscalerCondition array` Describes the current state of the `VerticalPodAutoscaler`.

recommendation

RecommendedPodResources

The most recently recommended CPU and memory requests.

conditions

VerticalPodAutoscalerCondition array

Describes the current state of the VerticalPodAutoscaler.

RecommendedPodResources v1 autoscaling.k8s.io

Fields

Fields
`containerRecommendation`	`RecommendedContainerResources array` An array of resource recommendations for individual containers.

containerRecommendation

RecommendedContainerResources array

An array of resource recommendations for individual containers.

RecommendedContainerResources v1 autoscaling.k8s.io

Fields
`containerName`	`string` The name of the container that the recommendation applies to.
`target`	`ResourceList` The recommended CPU request and memory request for the container.
`lowerBound`	`ResourceList` The minimum recommended CPU request and memory request for the container. This amount is not guaranteed to be sufficient for the application to be stable. Running with smaller CPU and memory requests is likely to have a significant impact on performance or availability.
`upperBound`	`ResourceList` The maximum recommended CPU request and memory request for the container. CPU and memory requests higher than these values are likely to be wasted.
`uncappedTarget`	`ResourceList` The most recent resource recommendation computed by the autoscaler, based on actual resource usage, not taking into account the ContainerResourcePolicy. If actual resource usage causes the target to violate the ContainerResourcePolicy, this might be different from the bounded recommendation. This field does not affect actual resource assignment. It is used only as a status indication.

VerticalPodAutoscalerCondition v1 autoscaling.k8s.io

Fields
`type`	`VerticalPodAutoscalerConditionType` The type of condition being described. Possible values are "RecommendationProvided", "LowConfidence", "NoPodsMatched", and "FetchingHistory".
`status`	`ConditionStatus` The status of the condition. Possible values are True, False, and Unknown.
`lastTransitionTime`	`Time` The last time the condition made a transition from one status to another.
`reason`	`string` The reason for the last transition from one status to another.
`message`	`string` A human-readable string that gives details about the last transition from one status to another.

What's next

Learn how to Scale container resource requests and limits.
Learn best practices for running cost-optimized Kubernetes applications on GKE.
Learn about Cluster autoscaler.

Vertical Pod autoscaling

How vertical Pod autoscaling works

Vertical Pod autoscaling modes

Auto (Recreate) mode

Initial mode

InPlaceOrRecreate mode

Off mode

Benefits

Limitations

Best practices

API reference

VerticalPodAutoscaler v1 autoscaling.k8s.io

VerticalPodAutoscalerSpec v1 autoscaling.k8s.io

VerticalPodAutoscalerList v1 autoscaling.k8s.io

PodUpdatePolicy v1 autoscaling.k8s.io

PodResourcePolicy v1 autoscaling.k8s.io

ContainerResourcePolicy v1 autoscaling.k8s.io

VerticalPodAutoscalerRecommenderSelector v1 autoscaling.k8s.io

VerticalPodAutoscalerStatus v1 autoscaling.k8s.io

RecommendedPodResources v1 autoscaling.k8s.io

RecommendedContainerResources v1 autoscaling.k8s.io

VerticalPodAutoscalerCondition v1 autoscaling.k8s.io

What's next

`Auto` (`Recreate`) mode

`Initial` mode

`InPlaceOrRecreate` mode

`Off` mode