In this tutorial, you can set up autoscaling based on one of four different metrics:
CPU
CPU utilization
Scale based on the percent utilization of CPUs across nodes. This can be cost effective, letting you maximize CPU resource utilization. Because CPU usage is a trailing metric, however, your users might experience latency while a scale-up is in progress.
Pub/Sub
Pub/Sub backlog
Scale based on the number of unacknowledged messages remaining in a Pub/Sub subscription. This can effectively reduce latency before it becomes a problem, but might use relatively more resources than autoscaling based on CPU utilization.
Custom metric
Custom Cloud Monitoring metric
Scale based on a custom user-defined metric exported by the Cloud Monitoring client libraries. To learn more, refer to Creating custom metrics in the Cloud Monitoring documentation.
Custom Prometheus
Custom Prometheus Metric
Scale based on a custom user-defined metric exported in the
Prometheus format. Your Prometheus metric must be of type
Gauge,
and must not contain the custom.googleapis.com
prefix.
Autoscaling is fundamentally about finding an acceptable balance between cost and latency. You might want to experiment with a combination of these metrics and others to find a policy that works for you.
Objectives
This tutorial covers the following tasks:- How to deploy the Custom Metrics Adapter.
- How to export metrics from within your application code.
- How to view your metrics on the Cloud Monitoring interface.
- How to deploy a HorizontalPodAutoscaler (HPA) resource to scale your application based on Cloud Monitoring metrics.
Costs
In this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage,
use the pricing calculator.
When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Clean up.
Before you begin
Take the following steps to enable the Kubernetes Engine API:- Visit the Kubernetes Engine page in the Google Cloud console.
- Create or select a project.
- Wait for the API and related services to be enabled. This can take several minutes.
-
Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.
You can follow this tutorial using Cloud Shell, which comes
preinstalled with the gcloud
and kubectl
command-line tools used
in this tutorial. If you use Cloud Shell, you don't need to install these
command-line tools on your workstation.
To use Cloud Shell:
- Go to the Google Cloud console.
Click the Activate Cloud Shell
button at the top of the Google Cloud console window.
A Cloud Shell session opens inside a new frame at the bottom of the Google Cloud console and displays a command-line prompt.
Setting up your environment
Set the default zone for the Google Cloud CLI:
gcloud config set compute/zone zone
Replace the following:
zone
: Choose a zone that's closest to you. For more information, see Regions and Zones.
Set the
PROJECT_ID
environment variable to your Google Cloud project ID (project-id):export PROJECT_ID=project-id
Set the default zone for the Google Cloud CLI:
gcloud config set project $PROJECT_ID
Create a GKE cluster
gcloud container clusters create metrics-autoscaling
Deploying the Custom Metrics Adapter
The Custom Metrics Adapter lets your cluster send and receive metrics with Cloud Monitoring.
CPU
Not applicable: Horizontal Pod Autoscalers can scale based on CPU utilization natively, so the Custom Metrics Adapter is not needed.
Pub/Sub
Grant your user the ability to create required authorization roles:
kubectl create clusterrolebinding cluster-admin-binding \
--clusterrole cluster-admin --user "$(gcloud config get-value account)"
Deploy the new resource model adapter on your cluster:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
Custom Metric
Grant your user the ability to create required authorization roles:
kubectl create clusterrolebinding cluster-admin-binding \
--clusterrole cluster-admin --user "$(gcloud config get-value account)"
Deploy the resource model adapter on your cluster:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
Custom Prometheus
Grant your user the ability to create required authorization roles:
kubectl create clusterrolebinding cluster-admin-binding \
--clusterrole cluster-admin --user "$(gcloud config get-value account)"
Deploy the legacy resource model adapter on your cluster:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter.yaml
Deploying an application with metrics
Download the repo containing the application code for this tutorial:
CPU
git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cd kubernetes-engine-samples/hello-app
Pub/Sub
git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cd kubernetes-engine-samples/cloud-pubsub
Custom Metric
git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cd kubernetes-engine-samples/custom-metrics-autoscaling/direct-to-sd
Custom Prometheus
git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cd kubernetes-engine-samples/custom-metrics-autoscaling/prometheus-to-sd
The repo contains code that exports metrics to Cloud Monitoring:
CPU
This application responds "Hello, world!" to any web requests on port
8080
. Compute Engine CPU metrics are automatically
collected by Cloud Monitoring.
Pub/Sub
This application polls a Pub/Sub subscription for new messages, acknowledging them as they arrive. Pub/Sub subscription metrics are automatically collected by Cloud Monitoring.
Custom Metric
This application exports a constant value metric using the Cloud Monitoring client libraries.
Custom Prometheus
This application exports a constant value metric using the Prometheus format.
The repo also contains a Kubernetes manifest to deploy the application to your cluster:
CPU
Pub/Sub
Custom Metric
Custom Prometheus
Deploy the application to your cluster:
CPU
kubectl apply -f manifests/helloweb-deployment.yaml
Pub/Sub
Enable the Pub/Sub API on your project:
gcloud services enable cloudresourcemanager.googleapis.com pubsub.googleapis.com
Create a Pub/Sub topic and subscription:
gcloud pubsub topics create echo
gcloud pubsub subscriptions create echo-read --topic=echo
Create a service account with access to Pub/Sub:
gcloud iam service-accounts create autoscaling-pubsub-sa
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member "serviceAccount:autoscaling-pubsub-sa@$PROJECT_ID.iam.gserviceaccount.com" \
--role "roles/pubsub.subscriber"
Download the service account key file:
gcloud iam service-accounts keys create key.json \
--iam-account autoscaling-pubsub-sa@$PROJECT_ID.iam.gserviceaccount.com
Import the service account key to your cluster as a Secret:
kubectl create secret generic pubsub-key --from-file=key.json=./key.json
Deploy the application to your cluster:
kubectl apply -f deployment/pubsub-with-secret.yaml
Custom Metric
kubectl apply -f custom-metrics-sd.yaml
Custom Prometheus
kubectl apply -f custom-metrics-prometheus-sd.yaml
After waiting a moment for the application to deploy, all Pods
reach the Ready
state:
CPU
kubectl get pods
Output:
NAME READY STATUS RESTARTS AGE
helloweb-7f7f7474fc-hzcdq 1/1 Running 0 10s
Pub/Sub
kubectl get pods
Output:
NAME READY STATUS RESTARTS AGE
pubsub-8cd995d7c-bdhqz 1/1 Running 0 58s
Custom Metric
kubectl get pods
Output:
NAME READY STATUS RESTARTS AGE
custom-metric-sd-58dbf4ffc5-tm62v 1/1 Running 0 33s
Custom Prometheus
kubectl get pods
Output:
NAME READY STATUS RESTARTS AGE
custom-metric-prometheus-sd-697bf7c7d7-ns76p 2/2 Running 0 49s
Viewing metrics on Cloud Monitoring
As your application runs, it writes your metrics to Cloud Monitoring.
To view the metrics for a monitored resource by using Metrics Explorer, do the following:
- In the Google Cloud console, go to Monitoring or use the following button:
Go to Monitoring - In the Monitoring navigation pane, click
Metrics Explorer.
- In the Select a metric pane, expand the Metric menu, and then use the submenus to
select a resource type and metric type. For example, to chart the CPU utilization of a
virtual machine, do the following:
- (Optional) To reduce the menu's options, enter part of the metric name in the
Filter bar. For this example, enter
utilization
. - In the Active resources menu, select VM instance.
- In the Active metric categories menu, select Instance.
- In the Active metrics menu, select CPU utilization and then click Apply.
- (Optional) To reduce the menu's options, enter part of the metric name in the
Filter bar. For this example, enter
- By default, Metrics Explorer averages all time series. To change this behavior,
do one of the following:
- To group time series by resource or metric labels, expand the Labels menu in the Group by section, and then make your selections. You can also change the Grouping function.
- To view all time series, on the Group by entry, click deleteDelete.
The resource type and metrics are the following:
CPU
Resource type: gce_instance
Metric: compute.googleapis.com/instance/cpu/utilization
Pub/Sub
Resource type: pubsub_subscription
Metric: pubsub.googleapis.com/subscription/num_undelivered_messages
Custom Metric
Resource type: k8s_pod
Metric: custom.googleapis.com/custom-metric
Custom Prometheus
Resource type: gke_container
Metric: custom.googleapis.com/custom_prometheus
Creating a HorizontalPodAutoscaler object
When you see your metric in Cloud Monitoring, you can deploy a
HorizontalPodAutoscaler
to resize your Deployment based on your metric.
CPU
Pub/Sub
Custom Metric
Custom Prometheus
Deploy the HorizontalPodAutoscaler
to your cluster:
CPU
kubectl apply -f manifests/helloweb-hpa.yaml
Pub/Sub
kubectl apply -f deployment/pubsub-hpa.yaml
Custom Metric
kubectl apply -f custom-metrics-sd-hpa.yaml
Custom Prometheus
kubectl apply -f custom-metrics-prometheus-sd-hpa.yaml
Generating load
For some metrics, you might need to generate load to watch the autoscaling:
CPU
Simulate 10,000 requests to the helloweb
server:
kubectl exec -it deployments/helloweb -- /bin/sh -c \
"for i in $(seq -s' ' 1 10000); do wget -q -O- localhost:8080; done"
Pub/Sub
Publish 200 messages to the Pub/Sub topic:
for i in {1..200}; do gcloud pubsub topics publish echo --message="Autoscaling #${i}"; done
Custom Metric
Not Applicable: The code used in this sample exports a constant value of 40
for the custom metric. The HorizontalPodAutoscaler is set with a
target value of 20
, so it attempts to scale up the Deployment
automatically.
Custom Prometheus
Not Applicable: The code used in this sample exports a constant value of 40
for the custom metric. The HorizontalPodAutoscaler is set with a
target value of 20
, so it attempts to scale up the Deployment
automatically.
Observing HorizontalPodAutoscaler scaling up
You can check the current number of replicas of your Deployment by running:
kubectl get deployments
After giving some time for the metric to propagate, the Deployment creates five Pods to handle the backlog.
You can also inspect the state and recent activity of the HorizontalPodAutoscaler by running:
kubectl describe hpa
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
CPU
Delete your GKE cluster:
gcloud container clusters delete metrics-autoscaling
Pub/Sub
Clean up the Pub/Sub subscription and topic:
gcloud pubsub subscriptions delete echo-read gcloud pubsub topics delete echo
Delete your GKE cluster:
gcloud container clusters delete metrics-autoscaling
Custom Metric
Delete your GKE cluster:
gcloud container clusters delete metrics-autoscaling
Custom Prometheus
Delete your GKE cluster:
gcloud container clusters delete metrics-autoscaling
What's next
Learn more about custom and external metrics for scaling workloads.
Explore other Kubernetes Engine tutorials.