Containers & Kubernetes
Using advanced Kubernetes autoscaling with Vertical Pod Autoscaler and Node Auto Provisioning
Editor's note: This is one of the many posts on unique differentiated capabilities in Google Kubernetes Engine (GKE) Advanced. Find the first post here for details on GKE Advanced.
Whether you run it on-premises or in the cloud, Kubernetes has emerged as the de facto tool for scheduling and orchestrating containers. But while Kubernetes excels at managing individual containers, you still need to manage both your workloads and the underlying infrastructure to make sure Kubernetes has sufficient resources to operate (but not too many resources). To do that, Kubernetes includes two mature autoscaling features: Horizontal Pod Autoscaler for scaling workloads running in pods, and Cluster Autoscaler to autoscale—you guessed it—your clusters. Here is how they relate to one another:
GKE, our cloud-hosted managed service, also supports Horizontal Pod Autoscaler and Cluster Autoscaler. But unlike open-source Kubernetes, where cluster autoscaler works with monolithic clusters, GKE uses node pools for its cluster automation. Node pools are a subset of node instances within a cluster that all have the same configuration. This lets administrators provision multiple node pools of varying machine sizes within the same cluster that the Kubernetes scheduler then uses to schedule workloads. This approach lets GKE use the right size instances from the get-go to avoid creating nodes that are too small to run some pods, or too big and waste unused compute space.
Although Horizontal Pod Autoscaler and Cluster Autoscaler are widely used on GKE, they don’t solve all the challenges that a DevOps administrator may face—pods that are over- or under-provisioned for CPU and RAM, and clusters that don’t have the appropriate nodes in a node pool with which to scale.
For those scenarios, GKE includes two advanced features: Vertical Pod Autoscaler, which automatically adjusts a pod’s CPU and memory requests, and Node Auto Provisioning, a feature of Cluster Autoscaler that automatically adds new node pools in addition to managing their size on the user's behalf. First introduced last summer in alpha, both of these features are now in beta and ready for you to try out as part of the GKE Advanced edition, introduced earlier this week. Once these features become generally available, they’ll be available only through GKE Advanced, available later this quarter.
Vertical Pod Autoscaler and Node Auto Provisioning in action
To better understand Vertical Pod Autoscaler and Node Auto Provisioning, let’s look at an example. Helen is a DevOps engineer in a medium-sized company. She’s responsible for deploying and managing workloads and infrastructure, and supports a team of around 100 developers who build and deploy around 50 various services for the company’s internet business.
The team deploys each of the services several times a week across dev, staging and production environments. And even though they thoroughly test every single deployment before it hits production, the services are occasionally saturated or run out of memory.
Helen and her team analyze the issues and realize that in many cases the applications go out of memory under a heavy load. This worries Helen. Why aren’t these problems caught during testing? She asks her team about how the resource requests are being estimated and assigned, but to her surprise, finds that no one really knows for sure how much CPU and RAM should be requested in the pod spec to guarantee the stability of workload. In most cases, an administrator set the memory request a long time ago and never changed it...until the application crashed, and they were forced to adjust it. Even then, adjusting the memory request isn’t always a systematic process—sometimes the admin regularly tests the app under heavy load, but more often they simply add some more memory. How much memory exactly? Nobody knows.
In some ways, the Kubernetes CPU and RAM allocation model is a bit of a trap: Request too much and the underlying cluster is less efficient; request too little and you put the entire service at risk. Helen checks the GKE documentation and discovers Vertical Pod Autoscaler.
Vertical Pod Autoscaler is inspired by a Google Borg service called AutoPilot. It does three things:
1. It observes the service’s resource utilization for the deployment.
2. It recommends resource requests.
3. It automatically updates the pods’ resource requests, both for new pods as well as for current running pods.
By turning on Vertical Pod Autoscaler, deployments won’t run out of memory and crash anymore, because every pod request is adjusted independently of what was set in the pod spec. Problem solved!
Vertical Pod Autoscaler solves the problem of pods that are over- or under-provisioned, but what if it requests far more resources in the cluster? Helen returns to the GKE documentation, where she is relieved to learn that Cluster Autoscaler is notified ahead of an update and scales the cluster so that all re-deployed pods find enough space in the cluster. But what if none of the node pools has a machine type big enough to fit the adjusted pod? Cluster Autoscaler has a solution for this too: Node Auto Provisioning automatically provisions an appropriately sized node pool if it is needed.
Putting GKE autoscaling to the test
Helen decides to set up a simple workload to familiarize herself with Vertical Pod Autoscaling and Node Auto Provisioning. She creates a new cluster where both are enabled.
gcloud beta container clusters create test-vpa --project <user_project> --enable-vertical-pod-autoscaling --enable-autoprovisioning --max-cpu 50 --max-memory 1000 --zone <zone> --cluster-version 1.12.5-gke.10
Helen knows that by activating this functionality at cluster creation time, she is making sure that both features are available to that cluster—she won’t need to enable them later.
Helen deploys a simple shell script that uses a predictable amount of CPU. She sets her script to use 1.3 CPU, but only sets
cpu: “0.3” in the pod’s resource request.
Here is the manifest:
affinity: # One pod per node
- weight: 1
- key: app
- name: cpu-demo
args: ["-c", "while true; do (timeout 1s yes >/dev/null &) && (timeout 0.3s yes >/dev/null; sleep 0.7s); done"] # consume 1.3 CPU
And here is how she creates the deployment.
$ kubectl create -f deployment.yaml
Please note that at this point no Vertical Pod Autoscaler is active on the deployment. After a couple of minutes Helen checks what is happening with her deployment. Apparently both of the deployed pods went way above allotted CPU, consuming all of the processing power of their respective nodes—much like what happens with some of the company’s production deployments.
$ kubectl top pods
NAME CPU(cores) MEMORY(bytes)
stress-686b5c67f-tnvkl 970m 1Mi
stress-686b5c67f-zdcwx 955m 1Mi
Helen decides to explore what happens if she enables Vertical Pod Autoscaler. First, she enables it in recommendation mode, without it taking any action automatically. She constructs a vpa.yaml file and creates a Vertical Pod Autoscaler in “Off” mode.
Create Vertical Pod Autoscaler:
$ kubectl create -f vpa.yaml
She waits a couple of minutes and then asks it for recommendations.
$ kubectl describe vpa
API Version: autoscaling.k8s.io/v1beta2
Creation Timestamp: 2019-02-28T10:53:57Z
Resource Version: 1319
Self Link: /apis/autoscaling.k8s.io/v1beta2/namespaces/default/verticalpodautoscalers/my-vpa
API Version: extensions/v1beta1
Update Mode: Off
Last Transition Time: 2019-02-28T10:54:07Z
Message: Fetching history in progress
Last Transition Time: 2019-02-28T10:54:07Z
Message: Some containers have a small number of samples
Last Transition Time: 2019-02-28T10:54:07Z
Container Name: cpu-demo
After observing the workload for a short time, Vertical Pod Autoscaler provides some initial low-confidence recommendations for adjusting the pod spec, including the target as well as upper and lower bounds.
Then, Helen decides to enable the automatic actuation mode, which applies the recommendation to the pod by re-creating it and automatically adjusting the pod request. This is only done when the value is below the lower bound of the recommendation and only if allowed by the pod’s disruption budget.
$ kubectl apply -f vpa_auto.yaml
Note: This could also have been done using
kubectl edit vpa and changing
Auto on the fly.
While Vertical Pod Autoscaler gathers data to generate its recommendations, Helen checks the pods’ status using filters to look just at the data she needs.
$ kubectl get pod -o=custom-columns=NAME:.metadata.name,PHASE:.status.phase,CPU-REQUEST:.spec.containers\[0\].resources.requests.cpu
NAME PHASE CPU-REQUEST
stress-686b5c67f-2kvhr Running 1168m
stress-686b5c67f-zwrmj Running 1168m
To Helen’s surprise, the cluster that had been running only one-core machines is now running pods with 1168 mCPU.
$ kubectl get nodes
gke-test-vpa-default-pool-a83c9ac3-c58p Ready <none> 1h v1.12.5-gke.10
gke-test-vpa-default-pool-a83c9ac3-m5cj Ready <none> 1h v1.12.5-gke.10
gke-test-vpa-default-pool-a83c9ac3-xh2t Ready <none> 1h v1.12.5-gke.10
gke-test-vpa-nap-n1-highcpu-2-1551354-4454d396-j779 Ready <none> 27s v1.12.5-gke.10
gke-test-vpa-nap-n1-highcpu-2-1551354-4454d396-xphj Ready <none> 25s v1.12.5-gke.10
Using Node Auto Provisioning, Cluster Autoscaler created two high-CPU machines and automatically deployed pods there. Helen can’t wait to run this in production.
Getting started with Vertical Pod Autoscaling and Node Auto Provisioning
Managing a Kubernetes cluster can be tricky. Luckily, if you use GKE, these sophisticated new tools can take the guesswork out of setting memory requests for nodes and sizing your clusters. To learn more about Vertical Pod Autoscaler and Node Auto Provisioning, check out the GKE documentation, and be sure to reach out to the team with questions and feedback.
Have questions about GKE? Contact your Google customer representative for more information, and sign up for our upcoming webcast, Your Kubernetes, Your Way Through GKE.