Monitoring GKE clusters for cost optimization using Cloud Monitoring

Last reviewed 2021-05-20 UTC

This tutorial explains how to monitor your Google Kubernetes Engine (GKE) clusters to optimize resource utilization. This kind of optimization is usually a complex task because you want to reduce costs by reducing resource consumption without compromising the stability and performance of your apps. This tutorial walks you through a process for setting up dashboards and alerting policies for the most common causes of over-provisioning. The tutorial also provides resource recommendations so that you can keep your apps running reliably and optimized for cost.

This tutorial is for developers and operators who want to optimize their GKE clusters and apps for achieving low cost, high performance, and high app stability. The tutorial assumes you are familiar with Docker, Kubernetes, Kubernetes CronJobs, GKE, Cloud Monitoring, and Linux.

Overview

Cost optimization is commonly misinterpreted as a one-time process that focuses on reducing costs. However, as Gartner defines it, cost optimization is a continuous discipline that must also maximize business value. When you instantiate such a discipline to the Kubernetes world, other perspectives are also important.

Balancing 4 different objectives: reducing cost, achieving performance goals, achieving stability, and maximizing business results.

As this diagram shows, cost optimization on Kubernetes requires that you balance 4 different objectives: reducing cost, achieving performance goals, achieving stability, and maximizing business results. In other words, cost reduction should not come at the expense of your user experience or business performance, unless that impact is well understood and deliberate.

Google Cloud provides the tools needed to balance these objectives. However, not all teams that embrace cloud-based solutions like Kubernetes for their workloads have the expertise to achieve their performance and stability goals. Often, their solution is to over-provision their environments in order to mitigate business impact.

Over-provisioning can provide short-term relief, but at a higher cost. Use it with care and only as part of a continuous cost-optimization process. The following image shows the top 4 problems found in teams that initiate such a cost-optimization journey.

The top 4 problems found in teams: cultural, bin packing, app right-sizing, and not scaling down off-peak.

  • The first problem is cultural. Many teams that embrace the public cloud aren't used to the pay-as-you-go billing style, and frequently don't fully understand the environment their apps are running on—in this case, Kubernetes. The FinOps movement, which is getting lots of attention recently, is all about evolving such a culture. One FinOps best practice is to provide teams with real-time information about their spending and their business impact. Small things like these have considerable impact on companies' culture, resulting in a more balanced cost-optimization equation.

  • The second problem is bin packing. Bin packing is the ability to pack apps into GKE nodes. The more efficiently you pack apps into nodes, the more you save.

  • The third problem is app right-sizing. Right-sizing is the ability to configure the appropriate resource request and workload autoscale target for objects that are being deployed in the cluster. The more precise you are in setting accurate resources to Pods, the more reliably your app runs and, in the majority of cases, the more space you open in the cluster.

  • The last problem is not scaling down your cluster during off-peak hours. Ideally, to save money during low-demand periods—for example, at night—your cluster should be able to scale down following the actual demand. However, in some cases, scaling down doesn't happen as expected due to workloads or cluster configurations that block Cluster Autoscaler (CA).

In order to effectively cost-optimize your environment, you must continuously work on these problems. To focus on practical items, the remainder of this tutorial skips the cultural problem and guides you on how to use Cloud Monitoring to monitor bin packing and app right-sizing in a GKE cluster. For more information about cost-optimizing your cluster during low-demand periods, see Reducing costs by scaling down GKE clusters during off-peak hours.

Objectives

  • Create a GKE cluster.
  • Deploy an example app.
  • Set up the components for exporting required metrics to Cloud Monitoring.
  • Dynamically generate dashboards for monitoring resource utilization and recommendations.
  • Dynamically generate over- and under-provisioning alerting policies.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

  1. In the Google Cloud console, go to the project selector page.

    Go to project selector

  2. Select or create a Google Cloud project.

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Clean up.

Preparing your environment

  1. In the Google Cloud console, open Cloud Shell. Throughout this tutorial, you execute commands in Cloud Shell.

    At the bottom of the Google Cloud console, a Cloud Shell session opens and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed, including the Google Cloud CLI, and with values already set for your current project. It can take a few seconds for the session to initialize.

  2. In Cloud Shell, configure your Google Cloud project ID and email address, and enable the Compute Engine, GKE, and Cloud Monitoring APIs:

    PROJECT_ID=YOUR_PROJECT_ID
    ALERT_EMAIL=YOUR_EMAIL_ADDRESS
    CLUSTER=gke-cost-optimization-monitoring
    gcloud config set project $PROJECT_ID
    
    gcloud services enable \
        compute.googleapis.com \
        container.googleapis.com \
        monitoring.googleapis.com
    
    gcloud config set compute/region us-central1
    gcloud config set compute/zone us-central1-f
    

    Replace the following:

    • YOUR_PROJECT_ID: the Google Cloud project ID for the project you're using in this tutorial.
    • YOUR_EMAIL_ADDRESS: an email address for being notified when over- and under-provisioning opportunities are found in your cluster.

    You can choose a different region and zone.

  3. Clone the gke-cost-optimization-monitoring GitHub repository:

    git clone https://github.com/GoogleCloudPlatform/gke-cost-optimization-monitoring
    cd gke-cost-optimization-monitoring
    

    The code in this repository is organized into the following folders:

    • Root: Contains the main.go and Dockerfile files that the CronJob uses to export custom metrics to Cloud Monitoring.
    • api/: Contains the golang API for manipulating Kubernetes and Monitoring objects.
    • k8s/templates/: Contains the template used to create the CronJob, VPA, and Horizontal Pod Autoscaler (HPA) objects in your cluster.
    • monitoring/dashboards/templates/: Contains the templates used to dynamically create the bin packing and app right-sizing dashboards.
    • monitoring/policies/templates/: Contains the templates used to dynamically create bin packing and app right-sizing alerting policies.

Creating the GKE cluster

  1. In Cloud Shell, create a GKE cluster:

    gcloud container clusters create $CLUSTER \
        --enable-ip-alias \
        --release-channel=stable \
        --machine-type=e2-standard-2 \
        --enable-autoscaling --min-nodes=1 --max-nodes=5 \
        --enable-vertical-pod-autoscaling
    

    This setup isn't a production configuration, but it's suitable for this tutorial. In this setup, you enable VPA to extract the foundation for app right-sizing.

Deploying the example app

  1. Deploy the Online Boutique app:

    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/main/release/kubernetes-manifests.yaml
    

    The Online Boutique app is a demo retail web store composed of many microservices that are written in different computer languages.

  2. To simulate a more realistic environment, create an HPA for Online Boutique deployments:

    kubectl get deployments --field-selector='metadata.name!=adservice,metadata.name!=cartservice' -o go-template-file=k8s/templates/cpu-hpa.gtpl | kubectl apply -f -
    
    kubectl get deployments --field-selector='metadata.name==cartservice' -o go-template-file=k8s/templates/memory-hpa.gtpl | kubectl apply -f -
    

    Note that you create CPU target HPA objects for most of the Online Boutique deployments, memory target HPA for cartservice deployment, and no HPA configuration for adservice. This setup helps demonstrate different dashboard visualizations, as shown in the following sections.

Setting up the components for exporting metrics to Cloud Monitoring

  1. Build and push the custom metric exporter code:

    docker build . -t gcr.io/$PROJECT_ID/metrics-exporter
    docker push gcr.io/$PROJECT_ID/metrics-exporter
    

    This code is responsible for querying VPA and HPA objects in your cluster and sending custom metrics based on that data to Cloud Monitoring. This implementation exports VPA target recommendations and HPA resource target utilization—CPU and memory defined in percentage form.

  2. Deploy the CronJob to send workload autoscaler metrics to Cloud Monitoring every minute:

    sed "s/PROJECT_ID/$PROJECT_ID/g" k8s/templates/metrics-exporter.yaml > k8s/metrics-exporter.yaml
    kubectl create ns custom-metrics
    kubectl apply -f k8s/metrics-exporter.yaml
    

    If you are running this tutorial in your own cluster (instead of the one you created previously), and it has workload identity enabled, make sure you follow the steps in Using Workload Identity in order to allow metrics to be exported to Cloud Monitoring.

  3. Create a VPA for all Deployments, StatefulSets, and DaemonSets objects in the cluster:

    rm k8s/vpas.yaml 2> /dev/null
    ALL_NAMESPACES=$(kubectl get namespaces -o jsonpath={.items[*].metadata.name})
    for NAMESPACE in $ALL_NAMESPACES
    do
        kubectl get deployments,statefulset,daemonset -n $NAMESPACE -o go-template-file=k8s/templates/vpa.gtpl >> k8s/vpas.yaml
    done
    kubectl apply -f k8s/vpas.yaml
    

    The preceding snippet creates a VPA in Off mode for all objects in all namespaces, including system objects. This approach offers a more accurate view of recommendations at the cluster level and node pool level. However, to avoid overloading the metrics-server service, we don't recommend that you execute the preceding script as is in big clusters, with a few hundred apps deployed. For a big-cluster scenario, we recommend that you run the preceding script only for namespaces that you are interested in cost-optimizing. In this scenario, recommendations at the cluster and node pool level aren't accurate, so ignore or remove them.

Setting up dashboards for monitoring resource utilization and recommendations

  1. In the Google Cloud console, go to the Monitoring page.

    Go to Monitoring

  2. Create a project.

    1. In the Add your project to a Workspace page, select your Google Cloud project.
    2. Click Add.

    It can take several minutes for the workspace to be created.

  3. In Cloud Shell, dynamically generate your dashboards for cost optimization:

    YOUR_NAMESPACES=$( echo $ALL_NAMESPACES| sed 's/[a-zA-Z0-9_.-]*-system//g; s/gke-[a-zA-Z0-9_.-]*//g; s/kube-public//g; s/kube-node-lease//g; s/custom-metrics//g')
    for NAMESPACE in $YOUR_NAMESPACES
    do
        GTPL_FILE='./monitoring/dashboards/templates/app-rightsizing.gtpl'
        OUTPUT_FILE="./monitoring/dashboards/app-rightsizing-$CLUSTER-$NAMESPACE.yaml"
    
        kubectl get deployments,statefulset,daemonset -n $NAMESPACE -o go-template-file=$GTPL_FILE > $OUTPUT_FILE
    
        sed -i.bkp "s/CLUSTER_TO_REPLACE/$CLUSTER/g" $OUTPUT_FILE
        sed -i.bkp "s/NAMESPACE_TO_REPLACE/$NAMESPACE/g" $OUTPUT_FILE
    
        replace=""
        i=0
        while : ; do
            if grep -q "Y_POS_TO_REPLACE_$i" $OUTPUT_FILE
            then
                ((yPos=12 + (4 * $i)))
                replace="s/Y_POS_TO_REPLACE_$i/$yPos/g; ${replace}"
                ((i=i+1))
            else
                break
            fi
        done
        eval "sed -i.bkp '$replace' $OUTPUT_FILE"
        rm "$OUTPUT_FILE.bkp"
    done
    
    sed "s/CLUSTER_TO_REPLACE/$CLUSTER/g" ./monitoring/dashboards/templates/binpacking.yaml > ./monitoring/dashboards/binpacking.yaml
    

    Beyond creating a Bin Packing dashboard, this script also generates an app right-sizing dashboard for each namespace in your cluster, excluding the system namespaces. In this tutorial, the script generates only one dashboard for the default namespace, because Online Boutique is deployed entirely in that namespace. But if you run the same script in your own GKE cluster, it generates one dashboard for each of your namespaces.

  4. Import the generated dashboards into Cloud Monitoring:

    for filename in monitoring/dashboards/*.yaml; do
        echo "Creating dashboard policy based on file: $filename"
        gcloud monitoring dashboards create \
            --config-from-file=$filename
    done
    

    The output is similar to the following:

    Creating dashboard policy based on file: monitoring/dashboards/app-rightsizing-gke-cost-optimization-monitoring-default.yaml
    Created [9c1a6cb6-3424-44a8-b824-f2ec959f6588].
    Creating dashboard policy based on file: monitoring/dashboards/binpacking.yaml
    Created [97f6349d-4880-478d-9da8-ca3c8a433093].
    

Viewing the App Right-Sizing dashboard

  1. Go to the Monitoring Dashboards page.

    Go to the Dashboards page

  2. Click the GKE - App Right-Sizing (gke-cost-optimization-monitoring:default) dashboard.

The rest of this section explains how to view and interpret the charts shown in the dashboard.

The installed demo environment has a constant simulated load. In other words, you won't see big changes in the charts over time. However, if you run the same tutorial in your own clusters, you need to wait a few hours (ideally, 24 hours or more) to see the dynamics of scale-ups and scale-downs, and how different load distributions happen during the day and the week.

First row in chart shows namespace over-provisioning; second row shows top 5 over-provisioned apps; and third row shows the top 5 under-provisioned apps.

As the preceding chart shows, the first three rows in the dashboard summarize the following aggregate namespace information:

  • First row: CPU and Memory: Namespace over-provisioning. Provides a short overview of how cost-optimized your apps are in this given namespace.
  • Second row: CPU and Memory: Top 5 over-provisioned apps. Shows you where to find the apps you should work on first when looking for ways to improve the namespace.
  • Third row: CPU and Memory: Top 5 under-provisioned apps. Similar to the second row, this row shows you apps that require special attention. However, in this scenario you're not focusing on saving money but on making the apps run smoothly in your cluster. If you see a "No data available" message, that means no opportunities were found.

The following chart shows the details that the dashboard presents per app with respect to CPU, memory, and replicas. Each row represents an app in the given namespace. This information is useful for right-sizing your apps by comparing what your developers say the apps need (requested_<cores|memory> lines) compared to what the apps actually use (used_<cores|memory> lines).

App Right-Sizing dashboard shows details for CPU, memory, and replicas.

The following sections discuss the three rows in the preceding charts .

First row: CPU (p/ Pod)

Depending on how your workload is configured, these charts shows different hints to help you determine the right size for your app:

  • VPA CPU recommendation (vpa_recommended_cores): This hint is shown when your app has no HPA configured (CPU: adservice (p/Pod) chart in the dashboard) or whether HPA is configured with any metric but CPU (CPU: cartservice (p/Pod) chart in the dashboard). When you see these hints, we strongly recommend that you apply them statically, or if you are comfortable looking at the chart history, enable the VPA with Initial or Auto mode.

  • HPA CPU target utilization (hpa_target_utilization): This hint is shown when your app is configured with HPA based on CPU utilization (all other CPU charts). In this scenario, we recommend that you do the following:

    • Over-provisioning cases: If actual utilization (used_cores) is consistently far below the HPA target (hpa_target_utilization), that means your deployment is running at the value specified in the HPA minReplicas value. The suggested action in this scenario is to decrease the minReplicas value.
    • Under-provisioning cases: If actual utilization (used_cores) is consistently above the HPA target (hpa_target_utilization), then the deployment is running at the value specified in the HPA maxReplicas value. The suggested action here is to increase the maxReplicas value or increase the resources requested to make Pods bigger.
    • Understand scale-ups and scale-downs: Watch the CPU and Replicas charts to understand when HPA on CPU is triggering Pod scale-ups and scale-downs.
    • HPA target-utilization fine-tuning: Review our best practices before applying anything to your environment.
    • It's also important that you avoid mixing VPA and HPA with CPU or memory in the same workload. For that, use MPA.

Second row: Memory (p/ Pod)

Similar to the CPU row, depending on how your workload is configured, these charts show different hints to help you determine the right size for your app:

  • VPA memory recommendation (vpa_recommended_bytes): This hint is shown when your application has no HPA configured (Mem: adservice (p/Pod) chart in the dashboard) or whether HPA is configured with any metric but memory (Men: emailservice (p/Pod), for example). Consider applying such recommendations to avoid resource waste and "Out Of Memory Kill" (OOMKill) events, or if you are comfortable looking at the chart history, enable VPA with Initial or Auto mode.

  • HPA memory target utilization (hpa_target_utilization): This hint is shown when your app is configured with HPA based on memory utilization (Mem: cartservice (p/Pod) chart in the dashboard). In this scenario, we recommend you do the following:

    • Over-provisioning cases: If the actual utilization (used_bytes) is consistently far below the HPA target (hpa_target_utilization), your deployment is running at the value specified in the HPA minReplicas value. The suggested action in this scenario is to decrease the minReplicas value.
    • Under-provisioning cases: If the actual utilization (used_bytes) is consistently above the HPA target (hpa_target_utilization), your deployment is running at the value specified in the HPA maxReplicas value. The suggested action here is to increase the maxReplicas value or increase the resources requested in order to make Pods bigger.
    • Understand scale-ups and scale-downs: Watch Mem and Replicas charts to understand when HPA on memory is triggering Pod scale-ups and scale-downs.
    • HPA target-utilization fine-tuning: Review our best practices before applying anything to your environment.
    • It's also important that you avoid mixing VPA and HPA with CPU or memory in the same workload. For that, use MPA.

Third row: Replicas

The charts in this row show the number of Pods a given app has. This information is useful in order to understand the volatility of used resources versus recommended resources for workloads that are deployed along with HPA.

Viewing the Bin Packing dashboard

  1. Go to the Monitoring Dashboards page again.

    Go to Dashboards

  2. Click the GKE - Cluster Bin Packing (gke-cost-optimization-monitoring) dashboard.

As the following charts show, the Bin Packing dashboard presents information aggregated by cluster (first row) and node pools (second row). This information is useful for bin-packing your cluster by comparing the allocable capacity in the cluster and node pools (allocable_<cores|memory> lines) with what your developers say their apps need (requested_<cores|memory> lines).

Bin Packing dashboard presents information aggregated by cluster (first row) and node pools (second row).

One useful analysis you can run on top of these charts is checking if you are running out of one resource type—for example, memory—and wasting another resource type—for example, CPU. In this scenario, the Cluster Autoscaler triggers scale-ups because there is no memory available to fit scheduled Pods into the cluster. Another common case is related to Pod density, or the number of Pods per node. In the Node Pool: Number of Pods chart, you can see the Pod's density in each node pool and compare it with your configured information (not provided in the chart). If you have reached your node pool's configured density, the CA will spin up new nodes to fit your scheduled Pods, even if you have lots of CPU and memory available.

Another important analysis, not available in the preceding dashboards, is related to your autoscalers' minimum and maximum configuration. For example, your cluster might not be scaling down at night because either your HPAs or your CA need more than the minimum. Moreover, your CA might not be spinning up Pods in the expected node pool because it might have reached the maximum number of nodes configured for that pool.

Dashboard limitations

  • Although the App Right-Sizing dashboard presents aggregated information for all apps in a given namespace, it only shows CPU, memory, and replicas for the first 8 apps, ordered by name, due to a limitation of the number of widgets allowed in a unique dashboard. If you find that the apps shown are not the most important ones, edit the dashboard to fit your needs.
  • The Bin Packing dashboard provides aggregated data for clusters and node pools. When dealing with lots clusters or node pools, the visualization is limited because filters aren't allowed to run against Monitoring Query Language (MQL) used to build the charts.
  • The Bin Packing dashboard can take a long time to load when monitoring big clusters with hundreds of apps. In this case, to reduce the amount of data loaded, we recommend that you avoid filtering the dashboard by a large period of time.

Setting up over- and under-provisioning alerting policies

When your company's finance team asks why your cloud bill doubled recently, you probably don't want to tell them you have an over-provisioning problem. To avoid such a situation, we strongly recommend that you create alerting policies that trigger when your environment begins diverging from what you have planned.

  1. In Cloud Shell, create a notification channel:

    gcloud beta monitoring channels create \
        --display-name="Cost Optimization team (Primary)" \
        --description="Primary contact method for the Cost Optimization effort"  \
        --type=email \
        --channel-labels=email_address=${ALERT_EMAIL}
    

    The output is similar to the following:

    Created notification channel [projects/your_project/notificationChannels/13166436832066782447].
    

    The preceding command creates a notification channel of type email to simplify the tutorial steps. For production environments, we recommend that you use a less asynchronous strategy by setting the notification channel to sms or pagerduty.

  2. Set a variable that has the value that was displayed in the NOTIFICATION_CHANNEL_ID placeholder:

    NOTIFICATION_CHANNEL_ID=$(gcloud beta monitoring channels list --filter='displayName="Cost Optimization team (Primary)"' | grep 'name:' | sed 's/name: //g')
    
  3. Dynamically create and deploy the alerting policies:

    for NAMESPACE in $YOUR_NAMESPACES
    do
        for templatefile in monitoring/policies/templates/rightsizing/*.yaml; do
            outputfile=monitoring/policies/$(basename $templatefile)
            sed "s/CLUSTER_TO_REPLACE/$CLUSTER/g;s/NAMESPACE_TO_REPLACE/$NAMESPACE/g" $templatefile > $outputfile
            echo "Creating alert policy based on file: $outputfile"
            gcloud alpha monitoring policies create \
                --policy-from-file=$outputfile \
                --notification-channels=$NOTIFICATION_CHANNEL_ID
        done
    done
    
    for templatefile in monitoring/policies/templates/binpacking/*.yaml; do
        outputfile=monitoring/policies/$(basename $templatefile)
        sed "s/CLUSTER_TO_REPLACE/$CLUSTER/g;s/NAMESPACE_TO_REPLACE/$NAMESPACE/g" $templatefile > $outputfile
        echo "Creating alert policy based on file: $outputfile"
        gcloud alpha monitoring policies create \
            --policy-from-file=$outputfile \
            --notification-channels=$NOTIFICATION_CHANNEL_ID
    done
    

    The output is similar to the following:

    Creating alert policy based on file: monitoring/policies/app-rightsizing-cpu-overprovisioning-alert.yaml
    Created alert policy [projects/rubbo-vpa-3-1/alertPolicies/18091138402474167583].
    Creating alert policy based on file: monitoring/policies/app-rightsizing-cpu-underprovisioning-alert.yaml
    Created alert policy [projects/rubbo-vpa-3-1/alertPolicies/8586234469403227589].
    Creating alert policy based on file: monitoring/policies/app-rightsizing-memory-overprovisioning-alert.yaml
    Created alert policy [projects/rubbo-vpa-3-1/alertPolicies/9685822323903723723].
    Creating alert policy based on file: monitoring/policies/app-rightsizing-memory-underprovisioning-alert.yaml
    Created alert policy [projects/rubbo-vpa-3-1/alertPolicies/15705075159352926212].
    Creating alert policy based on file: monitoring/policies/nodepools-binpacking-cpu-overprovisioning-alert.yaml
    Created alert policy [projects/rubbo-vpa-3-1/alertPolicies/14555072091442814207].
    Creating alert policy based on file: monitoring/policies/nodepools-binpacking-memory-overprovisioning-alert.yaml
    Created alert policy [projects/rubbo-vpa-3-1/alertPolicies/1442392910032052087].
    

    By default, the alerting policies created contain the specification to trigger alerts if apps are over-provisioned by more than 80% and node pools by more than 40%, during a period larger than a day. Make sure that you fine-tune these policies to meet your resource utilization requirements.

  4. Go to the Monitoring Alerting page to view the alert policy.

    Go to the Alerting page

  5. Click in any of the created policies and verify or edit the details of your alerting configuration.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial.

Delete the project

The easiest way to eliminate billing is to delete the project you created for the tutorial.

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next