Autoscaling groups of instances

Managed instance groups offer autoscaling capabilities that let you automatically add or delete instances from a managed instance group based on increases or decreases in load. Autoscaling helps your apps gracefully handle increases in traffic and reduce costs when the need for resources is lower. You define the autoscaling policy and the autoscaler performs automatic scaling based on the measured load.

Autoscaling works by adding more instances to your instance group when there is more load (upscaling), and deleting instances when the need for instances is lowered (downscaling).


Autoscaling uses the following fundamental concepts and services.

Managed instance groups

Autoscaling is a feature of managed instance groups. A managed instance group is a pool of homogeneous instances, created from a common instance template. An autoscaler adds or deletes instances from a managed instance group. Although Compute Engine has both managed and unmanaged instance groups, only managed instance groups can be used with autoscaler.

To understand the difference between a managed instance group and an unmanaged instance group, see Instance groups.

Autoscaling policy and target utilization

To create an autoscaler, specify the autoscaling policy and a target utilization level that the autoscaler uses to determine when to scale the group. You can choose to scale using the following policies:

  • Average CPU utilization.
  • HTTP load balancing serving capacity, which can be based on either utilization or requests per second.
  • Cloud Monitoring metrics.

The autoscaler continuously collects usage information based on the policy, compares actual utilization to your desired target utilization, and determines whether the group needs to be scaled up or down.

The target utilization level is the level at which you want to maintain your virtual machine (VM) instances. For example, if you scale based on CPU utilization, you can set your target utilization level at 75% and the autoscaler will maintain the CPU utilization of the specified group of instances at or close to 75%. The utilization level for each metric is interpreted differently based on the autoscaling policy.

For a brief summary of each policy, see policies. For a detailed discussion of each policy, see:

Cool down period

While an instance is initializing, information about its usage might not reflect normal circumstances, so that usage information might not be reliable for autoscaler decisions and you might want to omit that data. Specify a cool down period to allow your instances to finish initializing before the autoscaler begins collecting usage information from them. By default, the cool down period is 60 seconds.

Actual initialization times vary because of numerous factors. We recommend that you test how long your application takes to initialize. To do this, create an instance and time the startup process from when the instance becomes RUNNING until the application is ready.

If you set a cool down period value that is significantly longer than the time it takes for an instance to initialize, then your autoscaler might ignore legitimate utilization data, and it might underestimate the required size of your group, causing a delay in scaling up.

Stabilization period

For the purposes of scaling down, the autoscaler calculates the group's recommended target size based on peak load over the last 10 minutes. These last 10 minutes are referred to as the stabilization period.

This 10-minute stabilization period might appear as a delay in scaling down, but it is actually a built-in feature of autoscaling. The delay ensures that the smaller group size is enough to support peak load from the last 10 minutes.

Autoscaling mode

If you need to investigate or configure your group without interference from autoscaler operations, you can temporarily turn off or restrict autoscaling activities. The autoscaler's configuration persists while it is turned off or restricted, and all autoscaling activities resume when you turn it on again or lift the restriction.


  • Autoscaling only works with zonal and regional managed instance groups. Unmanaged instance groups are not supported.
  • Autoscaling does not work with regional MIGs if proactive instance redistribution is disabled.
  • You cannot create instances with specific names while autoscaling is turned on. However, you can turn on autoscaler after instances with specific names are created.
  • Do not use Compute Engine autoscaling with managed instance groups that are owned by Google Kubernetes Engine. For Google Kubernetes Engine groups, use cluster autoscaling instead.

    If you are not sure whether your group is part of a GKE cluster, look for the gke prefix in the managed instance group name. For example, gke-test-1-3-default-pool-eadji9ah.

  • An autoscaler can make scaling decisions based on multiple metrics, but it can handle only one policy per metric type except in the case of Cloud Monitoring metrics; an autoscaler can handle up to five policies based on Monitoring metrics. The autoscaler calculates the recommended number of virtual machines for each policy and then scales based on the policy that provides the largest number of virtual machines in the group.

  • Autoscaling works independently from autohealing. If you configure autohealing for your group and an instance fails the health check, the autohealer attempts to recreate the instance. Recreating an instance can cause the number of instances in the group to fall below the autoscaling threshold (minNumReplicas) that you specify.

Before you begin

  1. Learn about managed instance groups

    Because autoscaler is a feature of managed instance groups, learn how managed instance groups work before you use autoscaling.

  2. Get a managed instance group name or URL

    For all autoscaling requests, you must provide either a managed instance group name or a managed instance group URL. In the gcloud command-line tool, you can use a managed instance group name, while the API requires a fully qualified URL.

    To get the URL to an existing managed instance group, you can use either the instance-groups managed list --uri command or the instance-groups managed list [INSTANCE_GROUP] --uri command. For example, the following command provides the URL of a managed instance group in the us-central1-f zone:

    gcloud compute instance-groups managed list example-group --uri --filter="zone:(us-central1-f)"

    The gcloud tool returns the managed instance group's URL:

    If you don't have an existing managed instance group, review how to create a managed instance group.

Next steps

When you are ready, create an autoscaler that scales on CPU or load balancing serving capacity or on a Cloud Monitoring metric. Or try a tutorial to learn how to use regional MIGs and autoscaling to build scalable applications.