The simplest form of autoscaling is to scale a managed instance group (MIG) based on the CPU utilization of its instances.
You can also scale a MIG based on the serving capacity of an external HTTP(S) load balancer or on Monitoring metrics.
Before you begin
- If you want to use the command-line examples in this guide:
- Install or update to the latest version of the gcloud command-line tool.
- Set a default region and zone.
- If you want to use the API examples in this guide, set up API access.
- Review the autoscaler limitations.
- Read about autoscaler fundamentals.
Scaling based on CPU utilization
You can autoscale based on the average CPU utilization of a managed instance group (MIG). Using this policy tells the autoscaler to collect the CPU utilization of the instances in the group and determine whether it needs to scale. You set the target CPU utilization the autoscaler should maintain and the autoscaler works to maintain that level.
The autoscaler treats the target CPU utilization level as a fraction of the average use of all vCPUs over time in the instance group. If the average utilization of your total vCPUs exceeds the target utilization, the autoscaler adds more VM instances. If the average utilization of your total vCPUs is less than the target utilization, the autoscaler removes instances. For example, setting a 0.75 target utilization tells the autoscaler to maintain an average utilization of 75% among all vCPUs in the instance group.
Enable autoscaling based on CPU utilization
Console
- Go to the Instance groups page.
- If you have an instance group, select it and click Edit group. If you don't have an instance group, click Create instance group.
- Under Autoscaling, select On.
- Under Autoscale based on, select CPU utilization.
- Enter the Target CPU utilization you want. This value is treated as a
percentage. For example, for 60% CPU utilization, enter
60
. - Provide a number for the maximum number of instances that you want in this instance group. You can also set the minimum number of instances and the cool down period. The cool down period is the number of seconds the autoscaler should wait after a VM has started before the autoscaler starts collecting information from it. This accounts for the amount of time it can take for a VM to initialize, during which the collected data is not reliable for autoscaling. The default cool down period is 60 seconds.
- Save your changes.
gcloud
Use the
set-autoscaling
sub-command to enable autoscaling for a managed instance group. For example,
the following command creates an autoscaler that has a target CPU
utilization of 60%. Along with the --target-cpu-utilization
parameter,
the --max-num-replicas
parameter is also required when creating an
autoscaler:
gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
--max-num-replicas 20 \
--target-cpu-utilization 0.60 \
--cool-down-period 90
Optionally, you can use the --cool-down-period
flag, which tells the
autoscaler how many seconds to wait after a new instance has started before
it considers data from the new instance. This cool down period accounts for
the amount of time it might take for the instance to initialize, during
which the collected utilization data is not reliable for autoscaling. After
the cool-down period passes, the autoscaler begins to include the instance's
utilization data to determine if the group needs to scale. The default cool
down period is 60 seconds.
You can verify that autoscaling is successfully enabled by using the
instance-groups managed describe
sub-command, which describes the
corresponding managed instance group and provides information about
any autoscaling features for that instance group:
gcloud compute instance-groups managed describe example-managed-instance-group
For a list of available gcloud
commands and flags, see the
gcloud
reference.
API
Note: Although autoscaling is a feature of managed instance groups, it is a separate API resource. Keep that in mind when you construct API requests for autoscaling.
In the API, make a POST
request to the
autoscalers.insert
method:
POST https://compute.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/autoscalers/
Your request body must contain the name
, target
, and autoscalingPolicy
fields. autoscalingPolicy
must define cpuUtilization
and
maxNumReplicas
.
Optionally, you can use the coolDownPeriodSec
field, which tells the
autoscaler how many seconds to wait after a new instance has started before
it considers data from the new instance. This cool down period accounts for
the amount of time it might take for the instance to initialize, during
which the collected utilization data is not reliable for autoscaling. After
the cool-down period passes, the autoscaler begins to include the instance's
utilization data to determine if the group needs to scale. The default cool
down period is 60 seconds.
{
"name": "example-autoscaler",
"target": "https://www.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/instanceGroupManagers/example-managed-instance-group",
"autoscalingPolicy": {
"maxNumReplicas": 10,
"cpuUtilization": {
"utilizationTarget": 0.6
},
"coolDownPeriodSec": 90
}
}
For more information about enabling autoscaling based on CPU utilization, complete the tutorial, Using autoscaling for highly scalable apps.
How autoscaler handles heavy CPU utilization
During periods of heavy CPU utilization, if utilization is close to 100%, the autoscaler estimates that the group may already be heavily overloaded. In these cases, the autoscaler increases the number of virtual machines by 50% at most.