Vertical Pod autoscaling

This page provides an overview of vertical Pod autoscaling and provides reference material for the VerticalPodAutoscaler custom resource and related types.

Overview

Vertical Pod autoscaling is an autoscaling tool to help size Pods for the optimal CPU and memory resources required by the Pods. Instead of having to set up-to-date CPU requests and limits and memory requests and limits for the containers in your Pods, you can configure vertical Pod autoscaling to provide recommended values for CPU and memory requests and limits, or to automatically update the values.

Setting the right resource requests and limits for your workloads is important for stability and cost efficiency. If your Pod resource sizes are smaller than your workloads require, your application can either be throttled or it can fail due to out-of-memory errors. If your resource sizes are too large, you have waste and, therefore, larger bills. Learn best practices for autoscaling in Best practices for running cost-optimized Kubernetes applications on GKE.

You configure vertical Pod autoscaling by enabling it in your GKE cluster and by creating a Custom Resource Definition object named VerticalPodAutoscaler. For Autopilot clusters, vertical Pod autoscaling is enabled by default; however, you still must configure vertical Pod autoscaling by creating VerticalPodAutoscaler objects. Learn more in Configuring vertical Pod autoscaling.

Benefits

Vertical Pod autoscaling provides these benefits:

  • Cluster nodes are used efficiently because Pods use exactly what they need.
  • Pods are scheduled onto nodes that have the appropriate resources available.
  • You don't have to run time-consuming benchmarking tasks to determine the correct values for CPU and memory requests.
  • Maintenance time is reduced because the autoscaler can adjust CPU and memory requests over time without any action on your part.

Google Kubernetes Engine (GKE) vertical Pod autoscaling provides these benefits over the Kubernetes open source autoscaler:

  • Takes maximum node size and resource quotas into account when determining the recommendation target.

  • Notifies the Cluster autoscaler to adjust cluster capacity.

  • Uses historical data, providing metrics collected prior to the Vertical Pod Autoscaler being enabled.

  • Runs Vertical Pod Autoscaler Pods as control plane processes, instead of deployments on your worker nodes.

Limitations

  • Vertical Pod autoscaling does not generate recommendations based on sudden increases in resource usage. Instead, it provides stable recommendations over a longer time period. For sudden increases, use the Horizontal Pod Autoscaler.

  • Vertical Pod autoscaling supports a maximum of 500 VerticalPodAutoscaler objects per cluster.

  • To use vertical Pod autoscaling with horizontal Pod autoscaling, use multidimensional Pod autoscaling. You can also use vertical Pod autoscaling with horizontal Pod autoscaling on custom and external metrics.

  • Vertical Pod autoscaling is not yet ready for use with JVM-based workloads due to limited visibility into actual memory usage of the workload.

API reference

This is the v1 API reference. We strongly recommend using this version of the API.

VerticalPodAutoscaler v1 autoscaling.k8s.io

Fields

TypeMeta

API group, version, and kind.

metadata

ObjectMeta

Standard object metadata.

spec

VerticalPodAutoscalerSpec

The desired behavior of the VerticalPodAutoscaler.

status

VerticalPodAutoscalerStatus

The most recently observed status of the VerticalPodAutoscaler.

VerticalPodAutoscalerSpec v1 autoscaling.k8s.io

Fields
targetRef

CrossVersionObjectReference

Reference to the controller that manages the set of Pods for the autoscaler to control, for example, a Deployment or a StatefulSet. You can point a VerticalPodAutoscaler at any controller that has a Scale subresource. Typically, the VerticalPodAutoscaler retrieves the Pod set from the controller's ScaleStatus. For some well known controllers, for example DaemonSet, the VerticalPodAutoscaler retrieves the Pod set from the controller's spec.

updatePolicy

PodUpdatePolicy

Specifies whether recommended updates are applied when a Pod is started and whether recommended updates are applied during the life of a Pod.

resourcePolicy

PodResourcePolicy

Specifies policies for how CPU and memory requests are adjusted for individual containers. The resource policy can be used to set constraints on the recommendations for individual containers. If not specified, the autoscaler computes recommended resources for all containers in the Pod, without additional constraints.

VerticalPodAutoscalerList v1 autoscaling.k8s.io

Fields

TypeMeta

API group, version, and kind.

metadata

ObjectMeta

Standard object metadata.

items

VerticalPodAutoscaler array

A list of VerticalPodAutoscaler objects.

PodUpdatePolicy v1 autoscaling.k8s.io

Fields
updateMode

string

Specifies whether recommended updates are applied when a Pod is started and whether recommended updates are applied during the life of a Pod. Possible values are "Off", "Initial", "Recreate", and "Auto".

PodResourcePolicy v1 autoscaling.k8s.io

Fields
containerPolicies

ContainerResourcePolicy array

An array of resource policies for individual containers. There can be at most one entry for every named container and optionally a single wildcard entry with `containerName = '*'`, which handles all containers that do not have individual policies.

ContainerResourcePolicy v1 autoscaling.k8s.io

Fields
containerName

string

The name of the container that the policy applies to. If not specified, the policy serves as the default policy.

mode

ContainerScalingMode

Specifies whether recommended updates are applied to the container when it is started and whether recommended updates are applied during the life of the container. Possible values are "Off" and "Auto".

minAllowed

ResourceList

Specifies the minimum CPU request and memory request allowed for the container.

maxAllowed

ResourceList

Specifies the maximum CPU request and memory request allowed for the container.

ControlledResources

[]ResourceName

Specifies the type of recommendations that will be computed (and possibly applied) by the VerticalPodAutoscaler. If empty, the default of [ResourceCPU, ResourceMemory] is used.

VerticalPodAutoscalerStatus v1 autoscaling.k8s.io

Fields
recommendation

RecommendedPodResources

The most recently recommended CPU and memory requests.

conditions

VerticalPodAutoscalerCondition array

Describes the current state of the VerticalPodAutoscaler.

RecommendedPodResources v1 autoscaling.k8s.io

Fields
containerRecommendation

RecommendedContainerResources array

An array of resource recommendations for individual containers.

RecommendedContainerResources v1 autoscaling.k8s.io

Fields
containerName

string

The name of the container that the recommendation applies to.

target

ResourceList

The recommended CPU request and memory request for the container.

lowerBound

ResourceList

The minimum recommended CPU request and memory request for the container. This amount is not guaranteed to be sufficient for the application to be stable. Running with smaller CPU and memory requests is likely to have a significant impact on performance or availability.

upperBound

ResourceList

The maximum recommended CPU request and memory request for the container. CPU and memory requests higher than these values are likely to be wasted.

uncappedTarget

ResourceList

The most recent resource recommendation computed by the autoscaler, based on actual resource usage, not taking into account the ContainerResourcePolicy. If actual resource usage causes the target to violate the ContainerResourcePolicy, this might be different from the bounded recommendation. This field does not affect actual resource assignment. It is used only as a status indication.

VerticalPodAutoscalerCondition v1 autoscaling.k8s.io

Fields
type

VerticalPodAutoscalerConditionType

The type of condition being described. Possible values are "RecommendationProvided", "LowConfidence", "NoPodsMatched", and "FetchingHistory".

status

ConditionStatus

The status of the condition. Possible values are True, False, and Unknown.

lastTransitionTime

Time

The last time the condition made a transition from one status to another.

reason

string

The reason for the last transition from one status to another.

message

string

A human-readable string that gives details about the last transition from one status to another.

What's next