Scaling based on CPU or load balancing serving capacity

The simplest form of autoscaling is to scale based on the CPU utilization of a group of virtual machine (VM) instances. You can also choose to scale based on the HTTP(S) load balancing serving capacity of a group of instances.

This document describes both options.

Before you begin

Scaling based on CPU utilization

You can autoscale based on the average CPU utilization of a managed instance group (MIG). Using this policy tells the autoscaler to collect the CPU utilization of the instances in the group and determine whether it needs to scale. You set the target CPU utilization the autoscaler should maintain and the autoscaler works to maintain that level.

The autoscaler treats the target CPU utilization level as a fraction of the average use of all vCPUs over time in the instance group. If the average utilization of your total vCPUs exceeds the target utilization, the autoscaler adds more virtual machines. For example, setting a 0.75 target utilization tells the autoscaler to maintain an average utilization of 75% among all vCPUs in the instance group.

Enable autoscaling based on CPU utilization

For more information about enabling autoscaling based on CPU utilization, complete the tutorial, Using autoscaling for highly scalable apps.

Console


  1. Go to the Instance groups page.
  2. If you have an instance group, select it and click Edit group. If you don't have an instance group, click Create instance group.
  3. Under Autoscaling, select On.
  4. Under Autoscale based on, select CPU utilization.
  5. Enter the Target CPU utilization you want. This value is treated as a percentage. For example, for 60% CPU utilization, enter 60.
  6. Provide a number for the maximum number of instances that you want in this instance group. You can also set the minimum number of instances and the cool down period. The cool down period is the number of seconds the autoscaler should wait after a virtual machine has started before the autoscaler starts collecting information from it. This accounts for the amount of time it can take for a virtual machine to initialize, during which the collected data is not reliable for autoscaling. The default cool down period is 60 seconds.
  7. Save your changes.

gcloud


Use the set-autoscaling sub-command to enable autoscaling for a managed instance group. For example, the following command creates an autoscaler that has a target CPU utilization of 75%. Along with the --target-cpu-utilization parameter, the --max-num-replicas parameter is also required when creating an autoscaler:

gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
    --max-num-replicas 20 \
    --target-cpu-utilization 0.75 \
    --cool-down-period 90

Optionally, you can use the --cool-down-period flag, which tells the autoscaler how many seconds to wait after a new instance has started before it considers data from the new instance. This cool down period accounts for the amount of time it might take for the instance to initialize, during which the collected utilization data is not reliable for autoscaling. After the cool-down period passes, the autoscaler begins to include the instance's utilization data to determine if the group needs to scale. The default cool down period is 60 seconds.

You can verify that autoscaling was successfully enabled using the instance-groups managed describe sub-command, which describes the corresponding managed instance group and provides information about any autoscaling features for that instance group:

gcloud compute instance-groups managed describe example-managed-instance-group

For a list of available gcloud commands and flags, see the gcloud reference.

API


Note: Although autoscaling is a feature of managed instance groups, it is a separate API resource. Keep that in mind when you construct API requests for autoscaling.

In the API, make a POST request to the autoscalers.insert method:

POST https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/autoscalers/

Your request body must contain the name, target and, autoscalingPolicy fields. autoscalingPolicy must define cpuUtilization and maxNumReplicas.

Optionally, you can use the coolDownPeriodSec field, which tells the autoscaler how many seconds to wait after a new instance has started before it considers data from the new instance. This cool down period accounts for the amount of time it might take for the instance to initialize, during which the collected utilization data is not reliable for autoscaling. After the cool-down period passes, the autoscaler begins to include the instance's utilization data to determine if the group needs to scale. The default cool down period is 60 seconds.

{
 "name": "example-autoscaler",
 "target": "https://www.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/instanceGroupManagers/example-managed-instance-group",
 "autoscalingPolicy": {
    "maxNumReplicas": 10,
    "cpuUtilization": {
       "utilizationTarget": 0.8
     },
    "coolDownPeriodSec": 90
  }
}

How autoscaler handles heavy CPU utilization

During periods of heavy CPU utilization, if utilization reaches close to 100%, the autoscaler estimates that the group may already be heavily overloaded. In these cases, the autoscaler increases the number of virtual machines by 50% at most.

Scaling based on HTTP(S) load balancing serving capacity

Compute Engine provides support for load balancing within your instance groups. You can use autoscaling in conjunction with load balancing by setting up an autoscaler that scales based on the load of your instances.

An HTTP(S) load balancer spreads load across backend services, which distributes traffic among instance groups. Within the backend service, you can define the load balancing serving capacity of the instance groups associated with the backend as maximum CPU utilization, maximum requests per second (RPS), or maximum requests per second of the group. When an instance group reaches the serving capacity, the backend service starts sending traffic to another instance group.

When you attach an autoscaler to an HTTP(S) load balancer, the autoscaler scales the managed instance group to maintain a fraction of the load balancing serving capacity.

For example, assume the load balancing serving capacity of a managed instance group is defined as 100 RPS per instance. If you create an autoscaler with the HTTP(S) load balancing policy and set it to maintain a target utilization level of 0.8 or 80%, the autoscaler adds or removes instances from the managed instance group to maintain 80% of the serving capacity, or 80 RPS per instance.

The following diagram shows how the autoscaler interacts with a managed instance group and backend service:

The
  relationships between the autoscaler, managed instance groups, and load
  balancing backend services.
The autoscaler watches the serving capacity of the managed instance group, which is defined in the backend service, and scales based on the target utilization. In this example, the serving capacity is measured in the maxRatePerInstance value.

Applicable load balancing configurations

You can set one of three options for your load balancing serving capacity. When you first create the backend, you can choose among maximum CPU utilization, maximum requests per second per instance, or maximum requests per second of the whole group. Autoscaling only works with maximum CPU utilization and maximum requests per second/instance because the value of these settings can be controlled by adding or removing instances. For example, if you set a backend to handle 10 requests per second per instance, and the autoscaler is configured to maintain 80% of that rate, then the autoscaler can add or remove instances when the requests per second per instance changes.

Autoscaling does not work with maximum requests per group because this setting is independent of the number of instances in the instance group. The load balancer continuously sends the maximum number of requests per group to the instance group, regardless of how many instances are in the group.

For example, if you set the backend to handle 100 maximum requests per group per second, the load balancer sends 100 requests per second to the group, whether the group has two instances or 100 instances. Because this value cannot be adjusted, autoscaling does not work with a load balancing configuration that uses the maximum number of requests per second per group.

Enable autoscaling based on load balancing serving capacity

For more information about enabling autoscaling based on load balancing serving capacity, complete the tutorial, Globally autoscaling a web service on Compute Engine.

Console


  1. Go to the Instance groups page in the Google Cloud Platform Console.

    Go to the Instance groups page

  2. If you have an instance group, select it, and then click Edit group. If you don't have an instance group, click Create instance group.
  3. Under Autoscaling, select On.
  4. Under Autoscale based on, select HTTP load balancing utilization.
  5. Enter the Target load balancing utilization. This value is treated as a percentage. For example, for 60% HTTP load balancing utilization, enter 60.
  6. Provide a number for the maximum number of instances that you want in this instance group. You can also set the minimum number of instances and the cool down period. The cool down period is the number of seconds the autoscaler should wait after a virtual machine has started before the autoscaler starts collecting information from it. This accounts for the amount of time it might take for the instance to initialize, during which the collected data is not reliable for autoscaling. The default cool down period is 60 seconds.
  7. Save your changes.

gcloud


To enable an autoscaler that scales on serving capacity, use the set-autoscaling sub-command. For example, the following command creates an autoscaler that scales the target managed instance group to maintain 60% of the serving capacity. Along with the --target-load-balancing-utilization parameter, the --max-num-replicas parameter is also required when creating an autoscaler:

gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
    --max-num-replicas 20 \
    --target-load-balancing-utilization 0.6 \
    --cool-down-period 90

Optionally, you can use the --cool-down-period flag, which tells the autoscaler how many seconds to wait after a new instance has started before it considers data from the new instance. This cool down period accounts for the amount of time it might take for the instance to initialize, during which the collected utilization data is not reliable for autoscaling. After the cool-down period passes, the autoscaler begins to include the instance's utilization data to determine whether the group needs to scale. The default cool down period is 60 seconds.

You can verify that your autoscaler was successfully created by using the describe sub-command:

gcloud compute instance-groups managed describe example-managed-instance-group

For a list of available gcloud commands and flags, see the gcloud reference.

API


Note: Although autoscaling is a feature of managed instance groups, it is a separate API resource. Keep that in mind when you construct API requests for autoscaling.

In the API, make a POST request to the autoscalers.insert method:

POST https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/autoscalers/

Your request body must contain the name, target and, autoscalingPolicy fields. autoscalingPolicy must define loadBalancingUtilization.

Optionally, you can use the coolDownPeriodSec field, which tells the autoscaler how many seconds to wait after a new instance has started before it considers data from the new instance. This cool down period accounts for the amount of time it might take for the instance to initialize, during which the collected utilization data is not reliable for autoscaling. After the cool-down period passes, the autoscaler begins to include the instance's utilization data to determine whether the group needs to scale. The default cool down period is 60 seconds.

{

 "name": "example-autoscaler",
 "target": "zones/us-central1-f/instanceGroupManagers/example-managed-instance-group",
 "autoscalingPolicy": {
    "maxNumReplicas": 20,
    "loadBalancingUtilization": {
       "utilizationTarget": 0.8
     },
    "coolDownPeriodSec": 90
  }
}

Scaling based on network load balancing

A network load balancer distributes load using lower-level protocols such as TCP and UDP. Network load balancing lets you distribute traffic that is not based on HTTP(S), such as SMTP.

You can autoscale a managed instance group that is part of a network load balancer target pool using CPU utilization or custom metrics. For more information, see Scaling based on CPU utilization or Scaling based on Stackdriver Monitoring metrics.

Var denne side nyttig? Giv os en anmeldelse af den:

Send feedback om...

Compute Engine Documentation