Right-size your GKE workloads at scale


This tutorial shows you how to right-size your Google Kubernetes Engine (GKE) applications by exporting Vertical Pod Autoscaler (VPA) recommendations from Cloud Monitoring into BigQuery without creating VPA objects for Deployment workloads.

In BigQuery, you can use SQL queries to view and analyze:

  • Top over-provisioned GKE workloads across all of your projects.
  • Top under-provisioned GKE workloads across all your projects.
  • GKE workloads at risk of reliability or performance issues.

Understand why resource rightsizing is important

Under-provisioning can starve your containers of the necessary resources to run your applications, making them slow and unreliable. Over-provisioning doesn't impact the performance of your applications but might increase your monthly bill.

The following table describes the implications of under-provisioning and over-provisioning CPU and memory:

Resource Provisioning status Risk Explanation
CPU Over Cost Increases the cost of your workloads by reserving unnecessary resources.
Under Performance Can cause workloads to slow down or become unresponsive.
Not set Reliability CPU can be throttled to 0 causing your workloads to become unresponsive.
Memory Over Cost Increases the cost of your workloads by reserving unnecessary resources.
Under Reliability Can cause applications to terminate with an out of memory (OOM) error.
Not set Reliability kubelet can stop your Pods, at any time, and mark them as failed.

Objectives

In this tutorial, you will learn how to:

  • Deploy a sample application.
  • Export GKE recommendations metrics from Monitoring to BigQuery.
  • Use BigQuery and Looker Studio to view GKE container recommendations across projects.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Clean up.

Before you begin

Set up your project

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.

  4. Enable the Google Kubernetes Engine, Cloud Monitoring, BigQuery, Cloud Run, Cloud Build APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.

  7. Enable the Google Kubernetes Engine, Cloud Monitoring, BigQuery, Cloud Run, Cloud Build APIs.

    Enable the APIs

  1. Grant roles to your Google Account. Run the following command once for each of the following IAM roles: roles/serviceusage.serviceUsageAdmin, roles/container.clusterAdmin, roles/iam.serviceAccountAdmin, roles/iam.securityAdmin, roles/container.admin

    $ gcloud projects add-iam-policy-binding PROJECT_ID --member="user:EMAIL_ADDRESS" --role=ROLE
    • Replace PROJECT_ID with your project ID.
    • Replace EMAIL_ADDRESS with your email address.
    • Replace ROLE with each individual role.

Set up your environment

In this tutorial, you use Cloud Shell to manage resources hosted on Google Cloud. Cloud Shell is preinstalled with the software you need for this tutorial, including Docker, kubectl and the gcloud CLI.

To use Cloud Shell to set up your environment:

  1. Launch a Cloud Shell session from the Google Cloud console by clicking Cloud Shell activation icon Activate Cloud Shell in the Google Cloud console. This launches a session in the bottom pane of Google Cloud console.

  2. Set environment variables:

    export PROJECT_ID=PROJECT_ID
    export REGION=us-central1
    export ZONE=us-central1-f
    

    Replace PROJECT_ID with your Google Cloud project ID.

  3. Set the default environment variables:

    gcloud config set project PROJECT_ID
    
  4. Clone the code repository.

    git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
    
  5. Change to the working directory.

    cd kubernetes-engine-samples/gke-vpa-recommendations
    

Set up the sample application

To simulate a realistic environment, you will use a setup script to deploy Online Boutique.

The following steps install the sample application and modify the default configuration. For example, the instructions configure the Horizontal Pod Autoscaler (HPA) for some workloads and changes resource requests and limits.

  1. Run the setup script:

    ./scripts/setup.sh
    

    The setup script does the following:

    • Creates a GKE cluster.
    • Deploys the Online Boutique sample application.
    • Updates Pod CPU and memory resource requests.
    • Configures a HorizontalPodAutoscaler resource for the adservice and redis-cart workloads to simulate a realistic environment.

    The setup script might take up to 10 minutes to complete.

  2. Verify that the sample application is ready:

    kubectl get deployment
    

    The output is similar to the following:

    NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
    adservice               2/2     2            2           4m54s
    cartservice             1/1     1            1           4m55s
    checkoutservice         1/1     1            1           4m56s
    currencyservice         1/1     1            1           4m55s
    emailservice            1/1     1            1           4m56s
    frontend                1/1     1            1           4m55s
    loadgenerator           1/1     1            1           4m55s
    paymentservice          1/1     1            1           4m55s
    productcatalogservice   1/1     1            1           4m55s
    recommendationservice   1/1     1            1           4m56s
    redis-cart              1/1     1            1           4m54s
    shippingservice         1/1     1            1           4m54s
    

Export metrics to Cloud Monitoring

This tutorial filters out HPA-enabled workloads. HPA scales the number of Pods and VPA scales by increasing or decreasing CPU and memory resources within the existing pod container. VPA and HPA scale based on the same resource metrics, such as CPU and memory usage. When a scaling event happens, both VPA and HPA attempt to scale resources, which might result in unexpected behavior.

If you want to see recommendations for HPA-enabled workloads, you can skip this section.

  1. Deploy the metric exporter:

    envsubst < scripts/k8s/templates/hpa-metrics-exporter.yaml > scripts/k8s/metrics-exporter.yaml
    kubectl apply -f scripts/k8s/metrics-exporter.yaml
    

    This job queries HPA objects in your cluster and sends custom metrics based on data to Monitoring. The data includes HPA resource target utilization in the form of CPU and memory defined in percentage form.

    If you are running this tutorial in your own cluster and it has Workload Identity enabled, you must follow the steps in Using Workload Identity to export metrics to Monitoring.

  2. Verify the cron job ran and completed successfully:

    kubectl describe -n custom-metrics cronjob metrics-exporter
    

    The output is similar to the following:

    Events:
      Type     Reason            Age              From                Message
      ----     ------            ----             ----                -------
      ...
      Normal   SawCompletedJob   72s              cronjob-controller  Saw completed job: metrics-exporter-27772443, status: Complete
    

    Wait until the status is Complete.

View container recommendations

GKE provides out-of-the-box recommendations for containers in the Google Cloud console:

  1. Go to the Workloads page in the Google Cloud console.

    Go to Workloads

  2. Select the Cost Optimization tab.

  3. In the workloads list, click the name of the workload that you want to get recommendations for.

  4. Click Actions > Scale > Edit resource requests.

VPA provides recommendations based on usage patterns at a point in time, which means the recommendations can change depending on when you view them. If your workloads have constant load that spikes throughout the month, setting resources too low can cause reliability or performance issues and setting resources too high increase cost.

The following steps show you how to address this issue by exporting metrics for the previous 30 days.

Export metrics to BigQuery

In this section, you create a container running in Cloud Run that builds a VPA container recommendation table to view and analyze the top under-provisioned and over-provisioned workloads and workloads with a risk of reliability issues.

  1. Create a BigQuery table to store the metrics from Monitoring:

    bq mk metric_export
    bq mk --table metric_export.mql_metrics scripts/bigquery_schema.json
    
  2. Create BigQuery tables to store container recommendations:

    bq mk --table metric_export.vpa_container_recommendations scripts/bigquery_recommendation_schema.json
    

    These commands create the following BigQuery tables:

    • mql_metrics: temporarily stores 30 days of VPA recommendations and the last hour of GKE resource metrics from Monitoring.
    • vpa_container_recommendations: stores VPA container recommendations aggregated over a 30 day period.
  3. Create a service account to run the pipeline:

    gcloud iam service-accounts create mql-export-metrics \
        --display-name="MQL export metrics SA" \
        --description="Used for the function that export monitoring metrics"
    
  4. Assign IAM roles to the service account:

    gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:mql-export-metrics@$PROJECT_ID.iam.gserviceaccount.com" --role="roles/monitoring.viewer"
    gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:mql-export-metrics@$PROJECT_ID.iam.gserviceaccount.com" --role="roles/bigquery.dataEditor"
    gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:mql-export-metrics@$PROJECT_ID.iam.gserviceaccount.com" --role="roles/bigquery.dataOwner"
    gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:mql-export-metrics@$PROJECT_ID.iam.gserviceaccount.com" --role="roles/bigquery.jobUser"
    gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:mql-export-metrics@$PROJECT_ID.iam.gserviceaccount.com" --role="roles/run.invoker"
    
  5. Deploy the Cloud Run job:

    gcloud beta run jobs deploy metric-exporter \
        --image=us-docker.pkg.dev/google-samples/containers/gke/metrics-exporter:latest \
        --set-env-vars=PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python \
        --set-env-vars=PROJECT_ID=$PROJECT_ID \
        --task-timeout=3600 \
        --execute-now \
        --memory=1Gi \
        --max-retries=1 \
        --parallelism=0 \
        --service-account=mql-export-metrics@$PROJECT_ID.iam.gserviceaccount.com \
        --region=$REGION
    

    To capture spikes, metrics points for every 60 seconds are used. You can update this value to allow for a larger window for less spikey traffic by changing the LATEST_WINDOW_SECONDS variable in metrics-exporter/config.py.

  6. Create a schedule to run daily:

    gcloud scheduler jobs create http metric-exporter \
      --location $REGION \
      --schedule="0 23 * * *" \
      --uri="https://$REGION-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/$PROJECT_ID/jobs/metric-exporter:run" \
      --http-method POST \
      --oauth-service-account-email "mql-export-metrics@$PROJECT_ID.iam.gserviceaccount.com"
    
  7. Select the Logs tab on the mql-export-metric details page.

  8. Verify the metrics logs are being processed in the Cloud Run console:

    Go to Cloud Run

    The logs show metrics being written to BigQuery. The output should be similar to the following:

    Query results loaded to the table [PROJECT_ID].metric_export.vpa_container_recommendations
    
  9. In the Cloud Shell terminal, verify that GKE metric data exists in BigQuery:

    bq query \
        --use_legacy_sql=false \
        "SELECT DISTINCT metric_name FROM ${PROJECT_ID}.metric_export.mql_metrics ORDER BY metric_name"
    

    Depending on the number of workloads, it might take a few minutes to write all metrics to BigQuery.

    The output is similar to the following:

    +---------------------------------------------+
    |                 metric_name                 |
    +---------------------------------------------+
    | container_count                             |
    | cpu_limit_cores                             |
    | cpu_request_95th_percentile_recommendations |
    | cpu_request_max_recommendations             |
    | cpu_requested_cores                         |
    | hpa_cpu                                     |
    | hpa_memory                                  |
    | memory_limit_bytes                          |
    | memory_request_recommendations              |
    | memory_requested_bytes                      |
    +---------------------------------------------+
    

    If output does not match, wait five minutes and then run the command gcloud scheduler jobs run metric-exporter --location $REGION.

View the container recommendation in BigQuery

  1. Go to the BigQuery page in the Google Cloud console:

    Go to BigQuery

  2. In the query editor, select all rows in the recommendation table:

    SELECT * FROM `PROJECT_ID.metric_export.vpa_container_recommendations` where latest = TRUE
    

    Replace PROJECT_ID with your project ID.

The recommendation_date is the date the function created the recommendation. All workloads might not be visible on the recommendation table after the first run. For production environments, wait 24 hours for non-HPA workloads to appear in the VPA recommendation table.

If you don't see recommendations, run the scheduler:

gcloud scheduler jobs run metric-exporter --location=REGION

Visualize recommendations in Looker Studio

Looker Studio is a free, self-service business intelligence platform that lets you build and consume data visualizations, dashboards, and reports. With Looker Studio, you can connect to your data, create visualizations, and share your insights with others.

Use Looker Studio to visualize data in the BigQuery vpa_container_recommendation table:

  1. Open the VPA container recommendations dashboard template
  2. Click Use my own data.
  3. Select your project.
  4. For Dataset, select metric_export.
  5. For Table, select vpa_container_recommendations.
  6. Click Add.
  7. Click Add to Report.

Looker Studio template details

The Looker Studio template details page provides the following information:

  • location: the Compute Engine zone or region of the cluster.
  • project ID: the project of the cluster.
  • cluster: cluster name.
  • controller: controller name.
  • type: controller type.
  • count: total number of replicas.
  • cpu requested cores: the current workloads requested CPU.
  • cpu limit cores: the current workloads CPU limit.
  • QoS CPU: set based on the resource and limits set within each deployment or stateful set. You should use either the maximum VPA recommendation or the 95th percentile for CPU. The following list describes how the QoS CPU field is calculated:
    • If the requested CPU is equal to the CPU limit, QoS CPU is Guaranteed.
    • If the requested CPU and limits are set, and the limit is greater than the requested value, QoS CPU is Burstable.
    • If both values are unset, QoS CPU is BestEffort.
  • cpu request recommendation: if the QoS column is Guaranteed or BestEffort, then this is the maximum VPA recommendations over the 30-day window. If the QoS column is Burstable, then this is the 95th percentile over the 30-day window.
  • cpu limit recommendation: if the QoS column is Guaranteed or BestEffort, then this is the maximum VPA recommendations over the 30-day window. If the QoS column is Burstable, then the CPU request recommendation is the CPU recommendation multiplied by the CPU requested/CPU limit.
  • memory requested bytes: the current workloads requested memory.
  • memory limit bytes: the current workloads memory limit.
  • QoS memory: set with the same logic as the QoS CPU field. This value does not determine the value of the memory recommendation or limit.
  • memory request recommendation and memory limit recommendation: use either the maximum VPA recommendation for both request and limit as a best practice.
  • priority: calculates a priority rating for each workload based on the difference between what is set as the request and limit for both CPU and memory. The formula used to calculate priority is:

    priority = (CPU requested - CPU recommendation) + ((memory requested - memory
    recommendation) / (Predefined vCPUs/Predefined Memory))
    

    Negative values represent under provisioned workloads. These workloads might have reliability or performance issues. Positive values represent over provisioned workloads. Adjust these workloads to reduce costs.

  • Total Memory (MiB): Displays the difference between memory requested and the memory recommendations, with positive values indicating over provisioning and negative values indicating under provisioning.

  • Total mCPU: shows the difference between the CPU requested compared to the CPU VPA recommendations, with positive values indicating over provisioning and negative values indicating under provisioning.

GKE container recommendations logic

This section describes the logic used to create the SQL queries generated from recommendation.sql and loaded into the vpa_container_recommendations table with recommendations for the last 30 days.

To customize the recommendation range, update the RECOMMENDATION_WINDOW_SECONDS variable in the config.py file and redeploy.

Filtering out HPA workloads

Monitoring is unaware of workloads with HPA-enabled and provides exclusively VPA recommendations. To omit workloads with HPA-enabled, the SQL query in recommendation.sql uses a custom metric to select any metric with "hpa" in the metric name.

CPU requested and limit container recommendation

If the workloads CPU requested and limit values are equal, the QoS is considered Guaranteed, and the CPU recommendation is set to the maximum within the window period of 30 days. Otherwise, the 95th percentile of the VPA CPU requested recommendation within 30 days is used.

When the CPU request and limit values are equal, the recommendation for CPU limit is set to the maximum CPU VPA recommendation. If the request and limit of the workload are not identical, the existing limit ratio is used.

Memory requested and limit container recommendation

Memory recommendations use the maximum VPA recommendation to ensure the workloads reliability. It is best practice to use the same amount of memory for requests and limits because memory is an incompressible resource. When memory is exhausted, the Pod must be taken down. To avoid having Pods taken down and destabilizing your environment, you must set the requested memory to the memory limit.

Prioritizing recommendations

A priority value is assigned to each row to surface workloads which require immediate attention based on the VPA recommendations. The units of CPU and memory are different. To normalize the units, the E2 machine type on-demand price ratio between predefined CPU and memory is used as an approximation to convert memory units to CPU units.

The priority is calculated using the following formula:

priority = (CPU requested - CPU recommendation) + ((memory requested -
memory recommendation) / (vCPUs on-demand pricing /memory on-demand pricing ))

For Autopilot, the total resources requested by your deployment configuration should be within the supported minimum and maximum values.

View VPA recommendations for multiple projects

To view VPA container recommendations across multiple projects, use a new project as a scoping project.

When deploying this project in your production environment, add all projects you want to analyze to the new project's metrics scope.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

The easiest way to avoid billing is to delete the project you created for the tutorial.

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

What's next