Scaling based on CPU utilization

The simplest form of autoscaling is to scale a managed instance group (MIG) based on the CPU utilization of its instances.

You can also scale a MIG based on the serving capacity of an external HTTP(S) load balancer or on Monitoring metrics.

Before you begin

Scaling based on CPU utilization

You can autoscale based on the average CPU utilization of a managed instance group (MIG). Using this policy tells the autoscaler to collect the CPU utilization of the instances in the group and determine whether it needs to scale. You set the target CPU utilization the autoscaler should maintain and the autoscaler works to maintain that level.

The autoscaler treats the target CPU utilization level as a fraction of the average use of all vCPUs over time in the instance group. If the average utilization of your total vCPUs exceeds the target utilization, the autoscaler adds more VM instances. If the average utilization of your total vCPUs is less than the target utilization, the autoscaler removes instances. For example, setting a 0.75 target utilization tells the autoscaler to maintain an average utilization of 75% among all vCPUs in the instance group.

You can also scale based on forecasted CPU utilization. For more information, and to see if this is suitable for your workload, see Using predictive autoscaling.

Enable autoscaling based on CPU utilization

Console

  1. In the console, go to the Instance groups page.

    Go to Instance groups

  2. If you have an instance group, select it and click Edit. If you don't have an instance group, click Create instance group.

  3. If no autoscaling configuration exists, under Autoscaling, click Configure autoscaling.

  4. Under Autoscaling mode, select Autoscale to enable autoscaling.

  5. In the Autoscaling policy section, if an existing CPU utilization metric does not yet exist, add one:

    1. Click Add new metric.
    2. Under Metric type, select CPU utilization.
    3. Enter the Target CPU utilization that you want. This value is treated as a percentage. For example, for 75% CPU utilization, enter 75.
    4. Click Done.
  6. Under Predictive autoscaling, select Off. To learn more about predictive autoscaling, and whether it is suitable for your workload, see Using predictive autoscaling.

  7. You can use the Cool down period to tell the autoscaler how long it takes for your application to initialize. Specifying an accurate cool down period improves autoscaler decisions. For example, when scaling out, the autoscaler ignores data from VMs that are still initializing because those VMs might not yet represent normal usage of your application. The default cool down period is 60 seconds.

  8. Specify the minimum and maximum numbers of instances that you want the autoscaler to create in this group.

  9. Click Save.

gcloud

Use the set-autoscaling sub-command to enable autoscaling for a managed instance group. For example, the following command creates an autoscaler that has a target CPU utilization of 60%. Along with the --target-cpu-utilization parameter, the --max-num-replicas parameter is also required when creating an autoscaler:

gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
  --max-num-replicas 20 \
  --target-cpu-utilization 0.60 \
  --cool-down-period 90

You can use the --cool-down-period flag to tell the autoscaler how long it takes for your application to initialize. Specifying an accurate cool down period improves autoscaler decisions. For example, when scaling out, the autoscaler ignores data from VMs that are still initializing because those VMs might not yet represent normal usage of your application. The default cool down period is 60 seconds.

Optionally, you can enable predictive autoscaling to scale out ahead of predicted load. To learn whether predictive autoscaling is suitable for your workload, see Using predictive autoscaling.

You can verify that autoscaling is successfully enabled by using the instance-groups managed describe sub-command, which describes the corresponding managed instance group and provides information about any autoscaling features for that instance group:

gcloud compute instance-groups managed describe example-managed-instance-group

For a list of available gcloud commands and flags, see the gcloud reference.

API

In the API, make a POST request to the autoscalers.insert method:

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/autoscalers/

Your request body must contain the name, target, and autoscalingPolicy fields. autoscalingPolicy must define cpuUtilization and maxNumReplicas.

You can use the coolDownPeriodSec field to tell the autoscaler how long it takes for your application to initialize. Specifying an accurate cool down period improves autoscaler decisions. For example, when scaling out, the autoscaler ignores data from VMs that are still initializing because those VMs might not yet represent normal usage of your application. The default cool down period is 60 seconds.

Optionally, you can enable predictive autoscaling to scale out ahead of predicted load. To learn whether predictive autoscaling is suitable for your workload, see Using predictive autoscaling.

{
  "name": "example-autoscaler",
  "target": "https://www.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/instanceGroupManagers/example-managed-instance-group",
  "autoscalingPolicy": {
    "maxNumReplicas": 10,
    "cpuUtilization": {
      "utilizationTarget": 0.6
    },
    "coolDownPeriodSec": 90
  }
}

For more information about enabling autoscaling based on CPU utilization, complete the tutorial, Using autoscaling for highly scalable apps.

How autoscaler handles heavy CPU utilization

During periods of heavy CPU utilization, if utilization is close to 100%, the autoscaler estimates that the group might already be heavily overloaded. In these cases, the autoscaler increases the number of virtual machines by 50% at most.

What's next