Containers & Kubernetes

Using advanced Kubernetes autoscaling with Vertical Pod Autoscaler and Node Auto Provisioning


Editor's note: This is one of the many posts on unique differentiated capabilities in Google Kubernetes Engine (GKE) Advanced. Find the first post here for details on GKE Advanced.

Whether you run it on-premises or in the cloud, Kubernetes has emerged as the de facto tool for scheduling and orchestrating containers. But while Kubernetes excels at managing individual containers, you still need to manage both your workloads and the underlying infrastructure to make sure Kubernetes has sufficient resources to operate (but not too many resources). To do that, Kubernetes includes two mature autoscaling features: Horizontal Pod Autoscaler for scaling workloads running in pods, and Cluster Autoscaler to autoscale—you guessed it—your clusters. Here is how they relate to one another:


GKE, our cloud-hosted managed service, also supports Horizontal Pod Autoscaler and Cluster Autoscaler. But unlike open-source Kubernetes, where cluster autoscaler works with monolithic clusters, GKE uses node pools for its cluster automation. Node pools are a subset of node instances within a cluster that all have the same configuration. This lets administrators provision multiple node pools of varying machine sizes within the same cluster that the Kubernetes scheduler then uses to schedule workloads. This approach lets GKE use the right size instances from the get-go to avoid creating nodes that are too small to run some pods, or too big and waste unused compute space.

Although Horizontal Pod Autoscaler and Cluster Autoscaler are widely used on GKE, they don’t solve all the challenges that a DevOps administrator may face—pods that are over- or under-provisioned for CPU and RAM, and clusters that don’t have the appropriate nodes in a node pool with which to scale.

For those scenarios, GKE includes two advanced features: Vertical Pod Autoscaler, which automatically adjusts a pod’s CPU and memory requests, and Node Auto Provisioning, a feature of Cluster Autoscaler that automatically adds new node pools in addition to managing their size on the user's behalf. First introduced last summer in alpha, both of these features are now in beta and ready for you to try out as part of the GKE Advanced edition, introduced earlier this week. Once these features become generally available, they’ll be available only through GKE Advanced, available later this quarter.

Vertical Pod Autoscaler and Node Auto Provisioning in action

To better understand Vertical Pod Autoscaler and Node Auto Provisioning, let’s look at an example. Helen is a DevOps engineer in a medium-sized company. She’s responsible for deploying and managing workloads and infrastructure, and supports a team of around 100 developers who build and deploy around 50 various services for the company’s internet business.

The team deploys each of the services several times a week across dev, staging and production environments. And even though they thoroughly test every single deployment before it hits production, the services are occasionally saturated or run out of memory.

Helen and her team analyze the issues and realize that in many cases the applications go out of memory under a heavy load. This worries Helen. Why aren’t these problems caught during testing? She asks her team about how the resource requests are being estimated and assigned, but to her surprise, finds that no one really knows for sure how much CPU and RAM should be requested in the pod spec to guarantee the stability of workload. In most cases, an administrator set the memory request a long time ago and never changed it...until the application crashed, and they were forced to adjust it. Even then, adjusting the memory request isn’t always a systematic process—sometimes the admin regularly tests the app under heavy load, but more often they simply add some more memory. How much memory exactly? Nobody knows.

In some ways, the Kubernetes CPU and RAM allocation model is a bit of a trap: Request too much and the underlying cluster is less efficient; request too little and you put the entire service at risk. Helen checks the GKE documentation and discovers Vertical Pod Autoscaler.

Vertical Pod Autoscaler is inspired by a Google Borg service called AutoPilot. It does three things:

1. It observes the service’s resource utilization for the deployment.

2. It recommends resource requests.

3. It automatically updates the pods’ resource requests, both for new pods as well as for current running pods.

functional schema of the GKE.png
A functional schema of the GKE Vertical Pod Autoscaler

By turning on Vertical Pod Autoscaler, deployments won’t run out of memory and crash anymore, because every pod request is adjusted independently of what was set in the pod spec. Problem solved!

Vertical Pod Autoscaler solves the problem of pods that are over- or under-provisioned, but what if it requests far more resources in the cluster? Helen returns to the GKE documentation, where she is relieved to learn that Cluster Autoscaler is notified ahead of an update and scales the cluster so that all re-deployed pods find enough space in the cluster. But what if none of the node pools has a machine type big enough to fit the adjusted pod? Cluster Autoscaler has a solution for this too: Node Auto Provisioning automatically provisions an appropriately sized node pool if it is needed.

Putting GKE autoscaling to the test

Helen decides to set up a simple workload to familiarize herself with Vertical Pod Autoscaling and Node Auto Provisioning. She creates a new cluster where both are enabled.

  gcloud beta container clusters create test-vpa --project <user_project> --enable-vertical-pod-autoscaling --enable-autoprovisioning --max-cpu 50 --max-memory 1000 --zone <zone> --cluster-version 1.12.5-gke.10

Helen knows that by activating this functionality at cluster creation time, she is making sure that both features are available to that cluster—she won’t need to enable them later.

Helen deploys a simple shell script that uses a predictable amount of CPU. She sets her script to use 1.3 CPU, but only sets cpu: “0.3” in the pod’s resource request.

Here is the manifest:


  apiVersion: apps/v1
kind: Deployment
  name: stress
  namespace: default
  replicas: 2
      app: stress
        app: stress
      affinity:  # One pod per node
          - weight: 1
                - key: app
                  operator: In
                  - stress
      - name: cpu-demo
            cpu: "0.3"
        command: ["sh"]
        args: ["-c", "while true; do (timeout 1s yes >/dev/null &) && (timeout 0.3s yes >/dev/null; sleep 0.7s); done"] # consume 1.3 CPU

And here is how she creates the deployment.

  $ kubectl create -f deployment.yaml
deployment.apps/stress created

Please note that at this point no Vertical Pod Autoscaler is active on the deployment. After a couple of minutes Helen checks what is happening with her deployment. Apparently both of the deployed pods went way above allotted CPU, consuming all of the processing power of their respective nodes—much like what happens with some of the company’s production deployments.

  $ kubectl top pods
NAME                     CPU(cores)   MEMORY(bytes)   
stress-686b5c67f-tnvkl   970m         1Mi             
stress-686b5c67f-zdcwx   955m         1Mi

Helen decides to explore what happens if she enables Vertical Pod Autoscaler. First, she enables it in recommendation mode, without it taking any action automatically. She constructs a vpa.yaml file and creates a Vertical Pod Autoscaler in “Off” mode.


kind: VerticalPodAutoscaler
  name: my-vpa
    apiVersion: "extensions/v1beta1"
    kind: Deployment
    name: stress
    updateMode: "Off"

Create Vertical Pod Autoscaler:

  $ kubectl create -f vpa.yaml created

She waits a couple of minutes and then asks it for recommendations.

  $ kubectl describe vpa
Name:         my-vpa
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:
Kind:         VerticalPodAutoscaler
  Creation Timestamp:  2019-02-28T10:53:57Z
  Generation:          1
  Resource Version:    1319
  Self Link:           /apis/
  UID:                 25d32046-3b47-11e9-b244-42010af00060
  Target Ref:
    API Version:  extensions/v1beta1
    Kind:         Deployment
    Name:         stress
  Update Policy:
    Update Mode:  Off
    Last Transition Time:  2019-02-28T10:54:07Z
    Message:               Fetching history in progress
    Status:                True
    Type:                  FetchingHistory
    Last Transition Time:  2019-02-28T10:54:07Z
    Message:               Some containers have a small number of samples
    Reason:                cpu-demo
    Status:                True
    Type:                  LowConfidence
    Last Transition Time:  2019-02-28T10:54:07Z
    Status:                True
    Type:                  RecommendationProvided
    Container Recommendations:
      Container Name:  cpu-demo
      Lower Bound:
        Cpu:     595m
        Memory:  262144k
        Cpu:     1168m
        Memory:  262144k
      Uncapped Target:
        Cpu:     1168m
        Memory:  262144k
      Upper Bound:
        Cpu:     421648m
        Memory:  4151500k
Events:          <none>

After observing the workload for a short time, Vertical Pod Autoscaler provides some initial low-confidence recommendations for adjusting the pod spec, including the target as well as upper and lower bounds.

Then, Helen decides to enable the automatic actuation mode, which applies the recommendation to the pod by re-creating it and automatically adjusting the pod request. This is only done when the value is below the lower bound of the recommendation and only if allowed by the pod’s disruption budget.


kind: VerticalPodAutoscaler
  name: my-vpa
    apiVersion: "extensions/v1beta1"
    kind: Deployment
    name: stress
    updateMode: "Auto"
  $ kubectl apply -f vpa_auto.yaml configured

Note: This could also have been done using kubectl edit vpa and changing updateMode to Auto on the fly.

While Vertical Pod Autoscaler gathers data to generate its recommendations, Helen checks the pods’ status using filters to look just at the data she needs.

  $ kubectl get pod,PHASE:.status.phase,CPU-REQUEST:.spec.containers\[0\].resources.requests.cpu
NAME                     PHASE     CPU-REQUEST
stress-686b5c67f-2kvhr   Running   1168m
stress-686b5c67f-zwrmj   Running   1168m

To Helen’s surprise, the cluster that had been running only one-core machines is now running pods with 1168 mCPU.

  $ kubectl get nodes
gke-test-vpa-default-pool-a83c9ac3-c58p               Ready     <none>    1h        v1.12.5-gke.10
gke-test-vpa-default-pool-a83c9ac3-m5cj               Ready     <none>    1h        v1.12.5-gke.10
gke-test-vpa-default-pool-a83c9ac3-xh2t               Ready     <none>    1h        v1.12.5-gke.10
gke-test-vpa-nap-n1-highcpu-2-1551354-4454d396-j779   Ready     <none>    27s       v1.12.5-gke.10
gke-test-vpa-nap-n1-highcpu-2-1551354-4454d396-xphj   Ready     <none>    25s       v1.12.5-gke.10

Using Node Auto Provisioning, Cluster Autoscaler created two high-CPU machines and automatically deployed pods there. Helen can’t wait to run this in production.

Getting started with Vertical Pod Autoscaling and Node Auto Provisioning

Managing a Kubernetes cluster can be tricky. Luckily, if you use GKE, these sophisticated new tools can take the guesswork out of setting memory requests for nodes and sizing your clusters. To learn more about Vertical Pod Autoscaler and Node Auto Provisioning, check out the GKE documentation, and be sure to reach out to the team with questions and feedback.

Have questions about GKE? Contact your Google customer representative for more information, and sign up for our upcoming webcast, Your Kubernetes, Your Way Through GKE.