Autoscaling Deployments with Cloud Monitoring Metrics

This tutorial demonstrates how to automatically scale your GKE workloads based on metrics available in Cloud Monitoring.

In this tutorial, you can set up autoscaling based on one of four different metrics:

CPU

CPU utilization

Scale based on the percent utilization of CPUs across nodes. This can be cost effective, allowing you to maximize CPU resources utilization. Because CPU usage is a trailing metric, however, your users may experience latency while a scale-up is in progress.

Pub/Sub

Pub/Sub backlog

Scale based on the number of unacknowledged messages remaining in a Pub/Sub subscription. This can efficitively reduce latency before it becomes a problem, but may use relatively more resources than autocaling based on CPU utilization.

Custom metric

Custom Cloud Monitoring metric

Scale based on a custom user-defined metric exported by the Cloud Monitoring client libraries. To learn more, refer to Creating custom metrics in the Cloud Monitoring documentation.

Custom Prometheus

Custom Prometheus Metric

Scale based on a custom user-defined metric exported in the Prometheus format. Your Prometheus metric must be of type Gauge, and must not contain the custom.googleapis.com prefix.

Autoscaling is fundamentally about finding an acceptable balance between cost and latency. You may want to experiment with a combination of these metrics and others to find a policy that works for you.

Objectives

This tutorial covers the following tasks:

  1. How to deploy the Custom Metrics Adapter.
  2. How to export metrics from within your application code.
  3. How to view your metrics on the Cloud Monitoring interface.
  4. How to deploy an HorizontalPodAutoscaler (HPA) resource to scale your application based on Cloud Monitoring metrics.

Before you begin

Take the following steps to enable the Kubernetes Engine API:
  1. Visit the Kubernetes Engine page in the Google Cloud Console.
  2. Create or select a project.
  3. Wait for the API and related services to be enabled. This can take several minutes.
  4. Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.

In this tutorial, you will use the Cloud Shell, which comes preinstalled with the gcloud, and kubectl command-line tools.

Setting up your environment

  1. Set the default zone for the gcloud command-line tool:

    gcloud config set compute/zone zone
    

    Replace the following:

    • zone: Choose a zone that's closest to you. For example: us-west1-a. For more information, see Regions and Zones.
  2. Set the PROJECT_ID environment variable to your Google Cloud project ID (project-id):

    export PROJECT_ID=project-id
    
  3. Set the default zone for the gcloud command-line tool:

    gcloud config set project $PROJECT_ID
    
  4. Create a GKE cluster

    gcloud container clusters create metrics-autoscaling
    

Step 1: Deploy the Custom Metrics Adapter

The Custom Metrics Adapter allows your cluster to send and receive metrics with Cloud Monitoring.

CPU

Not applicable: Horizontal Pod Autoscalers can scaled based on CPU utilization natively, so the Custom Metrics Adapter is not needed.

Pub/Sub

Grant your user the ability to create required authorization roles:

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

Deploy the new resource model adapter on your cluster:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/4ed2ef2a60212f07727370baaecf505d8b2ce678/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

Custom Metric

Grant your user the ability to create required authorization roles:

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

Deploy the new resource model adapter on your cluster:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/4ed2ef2a60212f07727370baaecf505d8b2ce678/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

Custom Prometheus

Grant your user the ability to create required authorization roles:

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

Deploy the legacy model adapter on your cluster:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/4ed2ef2a60212f07727370baaecf505d8b2ce678/custom-metrics-stackdriver-adapter/deploy/production/adapter.yaml

Step 2: Deploy an application with metrics

Download the repo containing the application code for this tutorial:

CPU

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cd kubernetes-engine-samples/hello-app

Pub/Sub

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cd kubernetes-engine-samples/cloud-pubsub

Custom Metric

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cd kubernetes-engine-samples/custom-metrics-autoscaling/direct-to-sd

Custom Prometheus

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cd kubernetes-engine-samples/custom-metrics-autoscaling/prometheus-to-sd

The repo contains code that will export metrics to Cloud Monitoring:

CPU

This application responds "Hello, world!" to any web requests on port 8080. Compute Engine CPU metrics are automatically collected by Cloud Monitoring.

package main

import (
	"fmt"
	"log"
	"net/http"
	"os"
)

func main() {
	// register hello function to handle all requests
	mux := http.NewServeMux()
	mux.HandleFunc("/", hello)

	// use PORT environment variable, or default to 8080
	port := os.Getenv("PORT")
	if port == "" {
		port = "8080"
	}

	// start the web server on port and accept requests
	log.Printf("Server listening on port %s", port)
	log.Fatal(http.ListenAndServe(":"+port, mux))
}

// hello responds to the request with a plain-text "Hello, world" message.
func hello(w http.ResponseWriter, r *http.Request) {
	log.Printf("Serving request: %s", r.URL.Path)
	host, _ := os.Hostname()
	fmt.Fprintf(w, "Hello, world!\n")
	fmt.Fprintf(w, "Version: 1.0.0\n")
	fmt.Fprintf(w, "Hostname: %s\n", host)
}

Pub/Sub

This application polls a Pub/Sub subscription for new messages, acknowledging them as they arrive. Pub/Sub subscription metrics are automatically collected by Cloud Monitoring.

def main():
    """Continuously pull messages from subsciption"""
    client = pubsub.Client()
    subscription = client.topic(PUBSUB_TOPIC).subscription(PUBSUB_SUBSCRIPTION)

    print('Pulling messages from Pub/Sub subscription...')
    while True:
        with pubsub.subscription.AutoAck(subscription, max_messages=10) as ack:
            for _, message in list(ack.items()):
                print("[{0}] Received message: ID={1} Data={2}".format(
                    datetime.datetime.now(),
                    message.message_id,
                    message.data))
                process(message)


def process(message):
    """Process received message"""
    print("[{0}] Processing: {1}".format(datetime.datetime.now(),
                                         message.message_id))
    time.sleep(3)
    print("[{0}] Processed: {1}".format(datetime.datetime.now(),
                                        message.message_id))

Custom Metric

This application exports a constant value metric using the Cloud Monitoring client libraries.

func exportMetric(stackdriverService *monitoring.Service, metricName string,
	metricValue int64, metricLabels map[string]string, monitoredResource string, resourceLabels map[string]string) error {
	dataPoint := &monitoring.Point{
		Interval: &monitoring.TimeInterval{
			EndTime: time.Now().Format(time.RFC3339),
		},
		Value: &monitoring.TypedValue{
			Int64Value: &metricValue,
		},
	}
	// Write time series data.
	request := &monitoring.CreateTimeSeriesRequest{
		TimeSeries: []*monitoring.TimeSeries{
			{
				Metric: &monitoring.Metric{
					Type:   "custom.googleapis.com/" + metricName,
					Labels: metricLabels,
				},
				Resource: &monitoring.MonitoredResource{
					Type:   monitoredResource,
					Labels: resourceLabels,
				},
				Points: []*monitoring.Point{
					dataPoint,
				},
			},
		},
	}
	projectName := fmt.Sprintf("projects/%s", resourceLabels["project_id"])
	_, err := stackdriverService.Projects.TimeSeries.Create(projectName, request).Do()
	return err
}

Custom Prometheus

This application exports a constant value metric using the Prometheus format.

metric := prometheus.NewGauge(
	prometheus.GaugeOpts{
		Name: *metricName,
		Help: "Custom metric",
	},
)
prometheus.MustRegister(metric)
metric.Set(float64(*metricValue))

http.Handle("/metrics", promhttp.Handler())
log.Printf("Starting to listen on :%d", *port)
err := http.ListenAndServe(fmt.Sprintf(":%d", *port), nil)

The repo also contains a Kubernetes manifest to deploy the application to your cluster:

CPU

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helloweb
  labels:
    app: hello
spec:
  selector:
    matchLabels:
      app: hello
      tier: web
  template:
    metadata:
      labels:
        app: hello
        tier: web
    spec:
      containers:
      - name: hello-app
        image: gcr.io/google-samples/hello-app:1.0
        ports:
        - containerPort: 8080

Pub/Sub

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pubsub
spec:
  selector:
    matchLabels:
      app: pubsub
  template:
    metadata:
      labels:
        app: pubsub
    spec:
      volumes:
      - name: google-cloud-key
        secret:
          secretName: pubsub-key
      containers:
      - name: subscriber
        image: gcr.io/google-samples/pubsub-sample:v1
        volumeMounts:
        - name: google-cloud-key
          mountPath: /var/secrets/google
        env:
        - name: GOOGLE_APPLICATION_CREDENTIALS
          value: /var/secrets/google/key.json

Custom Metric

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: custom-metric-sd
  name: custom-metric-sd
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      run: custom-metric-sd
  template:
    metadata:
      labels:
        run: custom-metric-sd
    spec:
      containers:
      - command: ["./sd_dummy_exporter"]
        args:
        - --use-new-resource-model=true
        - --use-old-resource-model=false
        - --metric-name=custom-metric
        - --metric-value=40
        - --pod-name=$(POD_NAME)
        - --namespace=$(NAMESPACE)
        image: gcr.io/google-containers/sd-dummy-exporter:v0.2.0
        name: sd-dummy-exporter
        resources:
          requests:
            cpu: 100m
        env:
        # save Kubernetes metadata as environment variables for use in metrics
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace

Custom Prometheus

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: custom-metric-prometheus-sd
  name: custom-metric-prometheus-sd
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      run: custom-metric-prometheus-sd
  template:
    metadata:
      labels:
        run: custom-metric-prometheus-sd
    spec:
      containers:
      # sample container generating custom metrics
      - name: prometheus-dummy-exporter
        image: gcr.io/google-containers/prometheus-dummy-exporter:v0.1.0
        command: ["./prometheus_dummy_exporter"]
        args:
        - --metric-name=custom_prometheus
        - --metric-value=40
        - --port=8080
      # pre-built 'prometheus-to-sd' sidecar container to export prometheus
      # metrics to Stackdriver
      - name: prometheus-to-sd
        image: gcr.io/google-containers/prometheus-to-sd:v0.5.0
        command: ["/monitor"]
        args:
        - --source=:http://localhost:8080
        - --stackdriver-prefix=custom.googleapis.com
        - --pod-id=$(POD_ID)
        - --namespace-id=$(POD_NAMESPACE)
        env:
        # save Kubernetes metadata as environment variables for use in metrics
        - name: POD_ID
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.uid
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace

Go ahead and deploy the application to your cluster:

CPU

kubectl apply -f manifests/helloweb-deployment.yaml

Pub/Sub

Enable the Pub/Sub API on your project:

gcloud services enable cloudresourcemanager.googleapis.com pubsub.googleapis.com

Create a Pub/Sub topic and subscription:

gcloud pubsub topics create echo
gcloud pubsub subscriptions create echo-read --topic=echo

Create a service account with access to Pub/Sub:

gcloud iam service-accounts create autoscaling-pubsub-sa
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member "serviceAccount:autoscaling-pubsub-sa@$PROJECT_ID.iam.gserviceaccount.com" \
  --role "roles/pubsub.subscriber"

Download the service account key file:

gcloud iam service-accounts keys create key.json \
  --iam-account autoscaling-pubsub-sa@$PROJECT_ID.iam.gserviceaccount.com

Import the service account key to your cluster as a Secret:

kubectl create secret generic pubsub-key --from-file=key.json=./key.json

Deploy the application to your cluster:

kubectl apply -f deployment/pubsub-with-secret.yaml

Custom Metric

kubectl apply -f custom-metrics-sd.yaml

Custom Prometheus

kubectl apply -f custom-metrics-prometheus-sd.yaml

After waiting a moment for the application to deploy, you should see all Pods reach the Ready state:

CPU

kubectl get pods

Output:

NAME                        READY   STATUS    RESTARTS   AGE
helloweb-7f7f7474fc-hzcdq   1/1     Running   0          10s

Pub/Sub

kubectl get pods

Output:

NAME                     READY   STATUS    RESTARTS   AGE
pubsub-8cd995d7c-bdhqz   1/1     Running   0          58s

Custom Metric

kubectl get pods

Output:

NAME                                READY   STATUS    RESTARTS   AGE
custom-metric-sd-58dbf4ffc5-tm62v   1/1     Running   0          33s

Custom Prometheus

kubectl get pods

Output:

NAME                                           READY   STATUS    RESTARTS   AGE
custom-metric-prometheus-sd-697bf7c7d7-ns76p   2/2     Running   0          49s

Step 3: View metrics on Cloud Monitoring

As you application runs, it will write your metrics to Cloud Monitoring.

To view the metrics for a monitored resource using Metrics Explorer, do the following:

  1. In the Google Cloud Console, go to Monitoring or use the following button:
    Go to Monitoring
  2. In the Monitoring navigation pane, click Metrics Explorer.
  3. Enter the monitored resource name in the Find resource type and metric text box.

The resource type and metric will be the following:

CPU

Resource type: gce_instance

Metric: compute.googleapis.com/instance/cpu/utilization

Pub/Sub

Resource type: pubsub_subscription

Metric: pubsub.googleapis.com/subscription/num_undelivered_messages

Custom Metric

Resource type: k8s_pod

Metric: custom.googleapis.com/custom-metric

Custom Prometheus

Resource type: gke_container

Metric: custom.googleapis.com/custom_prometheus

Step 4: Create a HorizontalPodAutoscaler object

Once you see your metric in Cloud Monitoring, you can deploy a HorizontalPodAutoscaler to resize your Deployment based on your metric.

CPU

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: cpu
spec:
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 30
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: helloweb

Pub/Sub

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: pubsub
spec:
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - external:
      metricName: pubsub.googleapis.com|subscription|num_undelivered_messages
      metricSelector:
        matchLabels:
          resource.labels.subscription_id: echo-read
      targetAverageValue: "2"
    type: External
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: pubsub

Custom Metric

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metric-sd
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: custom-metric-sd
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Pods
    pods:
      metricName: custom-metric
      targetAverageValue: 20

Custom Prometheus

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: custom-prometheus-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: custom-metric-prometheus-sd
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Pods
    pods:
      metricName: custom_prometheus
      targetAverageValue: 20

Deploy the HorizontalPodAutoscaler to your cluster:

CPU

kubectl apply -f manifests/helloweb-hpa.yaml

Pub/Sub

kubectl apply -f deployment/pubsub-hpa.yaml

Custom Metric

kubectl apply -f custom-metrics-sd-hpa.yaml

Custom Prometheus

kubectl apply -f custom-metrics-prometheus-sd-hpa.yaml

Step 5: Generate load

For some metrics, you might need to generate load to watch autoscaling in action:

CPU

Simulate 1000 requests to the helloweb server:

 kubectl exec -it deployments/helloweb -- /bin/sh -c \
     "for i in $(seq -s' ' 1 10000); do wget -q -O- localhost:8080; done"

Pub/Sub

Publish 200 messages to the Pub/Sub topic:

for i in {1..200}; do gcloud pubsub topics publish echo --message="Autoscaling #${i}"; done

Custom Metric

Not Applicable: The code used in this sample exports a constant value of 40 for the custom metric. The HorizontalPodAutoscaler is set with a target value of 20, so it will attempt to scale up the Deployment automatically.

Custom Prometheus

Not Applicable: The code used in this sample exports a constant value of 40 for the custom metric. The HorizontalPodAutoscaler is set with a target value of 20, so it will attempt to scale up the Deployment automatically.

Step 6: Observe HorizontalPodAutoscaler scaling up

You can check the current number of replicas of your deployment by running:

kubectl get deployments

After giving some time for the metric to propagate, you should see the Deployment spin up 5 Pods to handle the backlog.

You can also inspect the state and recent activity of the HorizontalPodAutoscaler by running:

kubectl describe hpa

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

CPU

Delete your GKE cluster:

 gcloud container clusters delete metrics-autoscaling

Pub/Sub

  1. Clean up the Pub/Sub subscription and topic:

    gcloud pubsub subscriptions delete echo-read
    gcloud pubsub topics delete echo
    
  2. Delete your GKE cluster:

    gcloud container clusters delete metrics-autoscaling
    

Custom Metric

Delete your GKE cluster:

 gcloud container clusters delete metrics-autoscaling

Custom Prometheus

Delete your GKE cluster:

 gcloud container clusters delete metrics-autoscaling

What's next