Scale container resource requests and limits

This page explains how you can analyze and adjust the CPU requests and memory requests of a container in a Google Kubernetes Engine (GKE) cluster using vertical Pod autoscaling.

You can scale container resources manually through the Google Cloud console or using a VerticalPodAutoscaler object, or you can configure automatic autoscaling using vertical Pod autoscaling.

To learn more about best practices for resource requests, see Kubernetes best practices: Resource requests and limits.

For Autopilot clusters, vertical Pod autoscaling is enabled by default.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Ensure that you have enabled the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • Ensure that you have installed the Google Cloud CLI.
  • Set up default Google Cloud CLI settings for your project by using one of the following methods:
    • Use gcloud init, if you want to be walked through setting project defaults.
    • Use gcloud config, to individually set your project ID, zone, and region.

    gcloud init

    1. Run gcloud init and follow the directions:

      gcloud init

      If you are using SSH on a remote server, use the --console-only flag to prevent the command from launching a browser:

      gcloud init --console-only
    2. Follow the instructions to authorize the gcloud CLI to use your Google Cloud account.
    3. Create a new configuration or select an existing one.
    4. Choose a Google Cloud project.
    5. Choose a default Compute Engine zone.
    6. Choose a default Compute Engine region.

    gcloud config

    1. Set your default project ID:
      gcloud config set project PROJECT_ID
    2. Set your default Compute Engine region (for example, us-central1):
      gcloud config set compute/region COMPUTE_REGION
    3. Set your default Compute Engine zone (for example, us-central1-c):
      gcloud config set compute/zone COMPUTE_ZONE
    4. Update gcloud to the latest version:
      gcloud components update

    By setting default locations, you can avoid errors in gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location.

Analyze resource requests

The Vertical Pod Autoscaler automatically analyzes your containers and provides suggested resource requests. You can view these resource requests using the console, Cloud Monitoring, or Google Cloud CLI.

Console

To view suggested resource requests in the console, you must have an existing workload deployed that is at least 24 hours old.

  1. Go to the Workloads page in console.

    Go to Workloads

  2. In the workloads list, click the name of the workload you want to scale.

  3. Click Actions > Scale > Edit resource requests.

    The Analyze resource utilization data section shows historic usage data that the Vertical Pod Autoscaler controller analyzed to create the suggested resource requests in the Adjust resource requests and limits section.

Cloud Monitoring

To view suggested resource requests in Cloud Monitoring, you must have an existing workload deployed.

  1. Go to the Metrics Explorer page in console.

    Go to Metrics Explorerr

  2. Click Configuration.

  3. Expand the Select a Metric menu.

  4. In the Resource menu, select Kubernetes Scale.

  5. In the Metric category menu, select Autoscaler.

  6. In the Metric menu, select Recommended per replicate request bytes.

  7. Click Apply.

gcloud CLI

To view suggested resource requests, you must create a VerticalPodAutoscaler object and a Deployment.

  1. Enable vertical Pod autoscaling for your cluster:

    gcloud container clusters update CLUSTER_NAME --enable-vertical-pod-autoscaling
    

    Replace CLUSTER_NAME with the name of your cluster.

  2. Save the following manifest as my-rec-deployment.yaml:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-rec-deployment
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: my-rec-deployment
      template:
        metadata:
          labels:
            app: my-rec-deployment
        spec:
          containers:
          - name: my-rec-container
            image: nginx
    

    This manifest describes a Deployment that does not have CPU or memory requests. The containers.name value of my-rec-deployment specifies that all Pods in the Deployment belong to the VerticalPodAutoscaler.

  3. Apply the manifest to the cluster:

    kubectl create -f my-rec-deployment.yaml
    
  4. Save the following manifest as my-rec-vpa.yaml:

    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: my-rec-vpa
    spec:
      targetRef:
        apiVersion: "apps/v1"
        kind:       Deployment
        name:       my-rec-deployment
      updatePolicy:
        updateMode: "Off"
    

    This manifest describes a VerticalPodAutoscaler. The updateMode value of Off means that when Pods are created, the Vertical Pod Autoscaler controller analyzes the CPU and memory needs of the containers and records those recommendations in the status field of the resource. The Vertical Pod Autoscaler controller does not automatically update the resource requests for running containers.

  5. Apply the manifest to the cluster:

    kubectl create -f my-rec-vpa.yaml
    
  6. After some time, view the VerticalPodAutoscaler:

    kubectl get vpa my-rec-vpa --output yaml
    

    The output is similar to the following:

    ...
      recommendation:
        containerRecommendations:
        - containerName: my-rec-container
          lowerBound:
            cpu: 25m
            memory: 262144k
          target:
            cpu: 25m
            memory: 262144k
          upperBound:
            cpu: 7931m
            memory: 8291500k
    ...
    

    This output shows recommendations for CPU and memory requests.

Set Pod resource requests manually

You can set Pod resource requests manually using the Google Cloud CLI or the console.

Console

  1. Go to the Workloads page in console.

    Go to Workloads

  2. In the workloads list, click the name of the workload you want to scale.

  3. Click Actions > Scale > Edit resource requests.

    1. The Adjust resource requests and limits section shows the current CPU and memory requests for each container as well as suggested CPU and memory requests.
  4. Click Apply Latest Suggestions to view suggested requests for each container.

  5. Click Save Changes.

  6. Click Confirm.

gcloud

To set resource requests for a Pod, set the requests.cpu and memory.cpu values in your Deployment manifest.In this example, you manually modify the Deployment created in Analyze resource requests with suggested resource requests.

  1. Save the following example manifest as my-adjusted-deployment.yaml:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-rec-deployment
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: my-rec-deployment
      template:
        metadata:
          labels:
            app: my-rec-deployment
        spec:
          containers:
          - name: my-rec-container
            image: nginx
            resources:
              requests:
                cpu: 25m
                memory: 256Mi
    

    This manifest describes a Deployment that has two Pods. Each Pod has one container that requests 25 milliCPU and 256 MiB of memory.

  2. Apply the manifest to the cluster:

    kubectl apply -f my-adjusted-deployment.yaml
    

You can also apply changes manually by performing the following steps:

  1. Go to the Workloads page in console.

    Go to Workloads

  2. In the workloads list, click the name of the workload you want to scale.

  3. Configure your container requests.

  4. Click Get Equivalent YAML.

  5. Select the containers you want to apply the resource requests to.

  6. Click Download Deployment or copy and paste the manifest into a file named resource-adjusted.yaml.

  7. Apply the manifest to your cluster:

    kubectl create -f resource-adjusted.yaml
    

Set Pod resource requests automatically

Vertical Pod autoscaling uses the VerticalPodAutoscaler object to automatically set resource requests on Pods when the updateMode is "Auto".

  1. Enable vertical Pod autoscaling for your cluster:

    gcloud container clusters update CLUSTER_NAME --enable-vertical-pod-autoscaling
    

    Replace CLUSTER_NAME with the name of your cluster.

  2. Save the following manifest as my-auto-deployment.yaml:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-auto-deployment
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: my-auto-deployment
      template:
        metadata:
          labels:
            app: my-auto-deployment
        spec:
          containers:
          - name: my-container
            image: k8s.gcr.io/ubuntu-slim:0.1
            resources:
              requests:
                cpu: 100m
                memory: 50Mi
            command: ["/bin/sh"]
            args: ["-c", "while true; do timeout 0.5s yes >/dev/null; sleep 0.5s; done"]
    

    This manifest describes a Deployment that has two Pods. Each Pod has one container that requests 100 milliCPU and 50 MiB of memory.

  3. Apply the manifest to the cluster:

    kubectl create -f my-auto-deployment.yaml
    
  4. List the running Pods:

    kubectl get pods
    

    The output shows the names of the Pods in my-deployment:

    NAME                            READY     STATUS             RESTARTS   AGE
    my-auto-deployment-cbcdd49fb-d6bf9   1/1       Running            0          8s
    my-auto-deployment-cbcdd49fb-th288   1/1       Running            0          8s
    
  5. Save the following manifest as my-vpa.yaml:

    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: my-vpa
    spec:
      targetRef:
        apiVersion: "apps/v1"
        kind:       Deployment
        name:       my-auto-deployment
      updatePolicy:
        updateMode: "Auto"
    

    This manifest describes a VerticalPodAutoscaler. The targetRef.name value specifies that any Pod that is controlled by a Deployment named my-deployment belongs to this VerticalPodAutoscaler. The updateMode value of Auto means that the Vertical Pod Autoscaler controller can delete a Pod, adjust the CPU and memory requests, and then start a new Pod.

    You can also configure vertical Pod autoscaling to assign resource requests only at Pod creation time, using updateMode: "Initial".

  6. Apply the manifest to the cluster:

    kubectl create -f my-vpa.yaml
    
  7. Wait a few minutes, and view the running Pods again:

    kubectl get pods
    

    The output shows that the Pod names have changed:

    NAME                                 READY     STATUS             RESTARTS   AGE
    my-auto-deployment-89dc45f48-5bzqp   1/1       Running            0          8s
    my-auto-deployment-89dc45f48-scm66   1/1       Running            0          8s
    

    If the Pod names have not changed, wait a bit longer, and then view the running Pods again.

  8. Get detailed information about one of your running Pods:

    kubectl get pod POD_NAME --output yaml
    

    Replace POD_NAME with the name of one of your Pods that you retrieved in the previous step.

    The output is similar to the following:

    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        vpaUpdates: 'Pod resources updated by my-vpa: container 0: cpu capped to node
          capacity, memory capped to node capacity, cpu request, memory request'
    ...
    spec:
      containers:
      ...
        resources:
          requests:
            cpu: 510m
            memory: 262144k
       ...
    

    This output shows that the Vertical Pod Autoscaler controller has increased the memory request to 262144k and CPU request to 510 milliCPU.

  9. Get detailed information about the VerticalPodAutoscaler:

    kubectl get vpa my-vpa --output yaml
    

    The output is similar to the following:

    ...
      recommendation:
        containerRecommendations:
        - containerName: my-container
          lowerBound:
            cpu: 536m
            memory: 262144k
          target:
            cpu: 587m
            memory: 262144k
          upperBound:
            cpu: 27854m
            memory: "545693548"
    

    This output shows recommendations for CPU and memory requests. The target attribute specifies that for the container to run optimally, it should request 587 milliCPU and 26,2144 kilobytes of memory.

    The Vertical Pod Autoscaler uses the lowerBound and upperBound attributes to decide whether to delete a Pod and replace it with a new Pod. If a Pod has requests less than the lower bound or greater than the upper bound, the Vertical Pod Autoscaler deletes the Pod and replaces it with a Pod that meets the target attribute.

Opt out specific containers

In this exercise you create a VerticalPodAutoscaler object that has a specific container opted out. Then you create a Deployment that has one Pod with two containers. When the Pod is created, the Vertical Pod Autoscaler creates and applies a recommendation only for single container, ignoring the one that was opted out.

  1. Save the following manifest as my-opt-vpa.yaml:

    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: my-opt-vpa
    spec:
      targetRef:
        apiVersion: "apps/v1"
        kind:       Deployment
        name:       my-opt-deployment
      updatePolicy:
        updateMode: "Auto"
      resourcePolicy:
        containerPolicies:
        - containerName: my-opt-sidecar
          mode: "Off"
    

    This manifest describes a VerticalPodAutoscaler. The mode: "Off" value turns off recommendations the container my-opt-sidecar.

  2. Apply the manifest to the cluster:

    kubectl apply -f my-opt-vpa.yaml
    
  3. Save the following manifest as my-opt-deployment.yaml:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-opt-deployment
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: my-opt-deployment
      template:
        metadata:
          labels:
            app: my-opt-deployment
        spec:
          containers:
          - name: my-opt-container
            image: nginx
          - name: my-opt-sidecar
            image: busybox
            command: ["sh","-c","while true; do echo Doing sidecar stuff!; sleep 60; done"]
    
  4. Apply the manifest to the cluster:

    kubectl apply -f my-opt-deployment.yaml
    
  5. After some time, view the Vertical Pod Autoscaler:

    kubectl get vpa my-opt-vpa --output yaml
    

    The output shows recommendations for CPU and memory requests:

    ...
      recommendation:
        containerRecommendations:
        - containerName: my-opt-container
    ...
    

    Notice that there are recommendations only for one container. There are no recommendations for my-opt-sidecar.

    The Vertical Pod Autoscaler never updates resources on opted out containers. If you wait a few minutes, the Pod recreates but only one container has updated resource requests.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this page, perform the following tasks:

  1. Disable vertical Pod autoscaling in your cluster:

    gcloud container clusters update CLUSTER_NAME --no-enable-vertical-pod-autoscaling
    

    Replace CLUSTER_NAME with the name of the cluster.

  2. Optionally, you can delete the cluster.

What's next