Understanding autoscaler decisions

Autoscaling performs automatic scaling of a managed instance group (MIG) on your behalf. Use this document to understand some of the decisions that the autoscaler makes when scaling your MIGs.

When you configure an autoscaler for a MIG, the autoscaler constantly monitors the group and sets the group's recommendedSize to the number of instances that are required to serve peak load over the last 10 minutes. These last 10 minutes are referred to as the stabilization period.

The MIG's response to the autoscaler's recommended size depends on how you configure the autoscaler's mode:

  • ON. The MIG sets its targetSize to the recommended size and the group is automatically scaled to meet its target size.
  • ONLY_UP. The MIG's target size can only be increased in response to an increased recommended size.
  • OFF. The target size is unaffected by the recommended size. However, the recommended size is still calculated.

If the autoscaler configuration is deleted, then no recommended size is calculated.

Gaps between target and actual utilization

During the autoscaling process, you might notice that, for smaller instance groups, the actual utilization of the instance group and the target utilization might seem far apart. This is because an autoscaler always acts conservatively by rounding up or down when it interprets utilization data and determines how many instances to add or remove. This prevents the autoscaler from adding an insufficient number of resources or removing too many resources.

For example, if you set a utilization target of 0.7 and your app exceeds the utilization target, the autoscaler might determine that adding 1.5 virtual machine (VM) instances would decrease the utilization to close to 0.7. Because it is not possible to add 1.5 VM instances, the autoscaler rounds up and adds two instances. This might decrease the average CPU utilization to a valuebelow 0.7 but ensures that your app has enough resources to support it.

Similarly, if the autoscaler determines that removing 1.5 VM instances would increase your utilization too close to 0.7, it will remove just one instance.

For larger groups with more VM instances, the utilization is divided up over a larger number of instances, and adding or removing VM instances causes less of a gap between actual utilization and target utilization.

Regional MIGs and uneven VM distributions

If a region has an unbalanced number of instances between zones, whether due to recovering from a zonal failure or due to an unevenly distributed workload, autoscaling keeps more instances running in zones that have a higher than average actual utilization. Compute Engine takes this precaution to guarantee high availability across the region as a whole as well as all zones individually, even if some of the zones experience heavier load than others.

Delays in scaling down

For the purposes of scaling down, the autoscaler calculates the group's recommended target size based on peak load over the last 10 minutes. These last 10 minutes are referred to as the stabilization period.

Observing the last 10 minutes of usage helps the autoscaler:

  • Ensure that usage information collected from the instance group is stable.
  • Prevent behavior where an autoscaler continuously adds or removes instances at an excessive rate.
  • Safely remove instances by determining that the smaller group size is enough to support peak load from the last 10 minutes.

This 10 minute stabilization period might appear as a delay in scaling down, but it is actually a built-in feature of autoscaling. The delay also ensures that if a new instance is added to the managed instance group, the instance is running for at least 10 minutes before it is eligible to be terminated.

Cool down periods for new instances are ignored when deciding whether to scale down a group.

Connection draining causing delays

If the group is part of a backend service that has enabled connection draining, it can take up to an additional 60 seconds after the connection draining duration has elapsed before the VM instance is removed or deleted.

Delays in scaling up

The autoscaler ignores usage data from instances while they are initializing, that is, during their cool down period. If you set a cool down period value that is significantly longer than the time it takes for an instance to initialize, then your autoscaler might ignore legitimate utilization data, and it might underestimate the required size of your group.

For example, say you are scaling based on CPU utilization, and an instance's CPU utilization increases after it has initialized. The increase in CPU utilization could warrant a scale up. However, if the instance is still within its cool down period, then that increase in CPU utilization is ignored and the group does not scale up because of it.

To avoid delays in scaling up, set a cool down period that closely matches the amount of time it takes for your instances to initialize.

Preparing for instance terminations

When the autoscaler scales down, it determines the number of VM instances it needs to shut down, and selects VM instances that have low utilization to terminate from the instance group. Before an instance is terminated, you might want to make sure these instances perform certain tasks, such as closing any existing connections, gracefully shut down any apps or app servers, uploading logs, and so on. You can instruct your instance to perform these tasks using a shutdown script.

A shutdown script runs, on a best-effort basis, in the brief period between when the termination request is made and when the instance is actually terminated. During this period, Compute Engine attempts to run your shutdown script to perform any tasks you provide in the script.

This is particularly useful if you are using load balancing with your managed instance group. If your instance becomes unhealthy, it might take some time for the load balancer to recognize that the instance is unhealthy, causing the load balancer to continue sending new requests to the instance. With a shutdown script, the instance can report that it is unhealthy while it is shutting down so that the load balancer can stop sending traffic to the instance. For more information, see Handling unhealthy instances in the load balancing documentation.

For more information about shutdown scripts, see Shutdown scripts.

For more information about instance shutdown, read documentation on stopping or deleting an instance.

Viewing autoscaling charts for utilization

If you have a managed instance group that is being autoscaled, Compute Engine provides an autoscaling chart that tracks the total utilization and the number of autoscaled instances at any point in time. You can access this chart in the Google Cloud Console.

  1. In the Cloud Console, go to the Instance groups page.

    Go to the Instance Groups page

  2. Click the name of an autoscaled managed instance group you want to view. The group must be using autoscaling based on CPU utilization (other autoscaling metrics are not yet supported).
  3. On the managed instance group details page, select the Monitoring tab, if it is not already selected.

    Monitoring tab.

  4. Make sure that Autoscaled size is selected in the drop-down list.

The charts track the number of instances over CPU usage. Use the following information to understand these charts.

  • The blue line on the upper graph indicates the number of instances in the managed instance group.
  • The blue line on the lower chart shows the total CPU utilization of the group.
  • The green line on the lower chart shows the remaining available capacity of the managed instance group.
    • If the green line is above the blue line, there is a large amount of capacity available and your VM instances are likely under utilized.
    • If the green line is below the blue line, then there is little, if any, remaining capacity and you should add more instances to the instance group.
  • If your capacity drops, then it likely means that the size of your instance group has decreased, so the blue line on the upper chart will also drop. Similarly, if your capacity increases, the size of your instance group has likely also increased.

For example, the following graph captures an autoscaled managed instance group that reaches capacity, which causes the autoscaler to add more VM instances to the group, increasing capacity of the group.

Screenshot of autoscaling chart

Viewing status messages

When the autoscaler experiences an issue scaling, it returns a warning or error message. You can review these status messages in one of two ways.

View status messages on the Instance groups page

You can view status messages directly on the Instance groups page in the Google Cloud Console.

  1. In the Google Cloud Console, go to the Instance groups page.

    Go to the Instance groups page

  2. Look for any instance groups that have the Caution icon (!). For example:

    Status messages on instance groups page.

  3. Hover over a status icon to get details of the status message.

View status messages on the Instance group details page

You can go directly to the details page of a specific instance group to view relevant status messages.

  1. In the Cloud Console, go to the Instance groups page.

    Go to the Instance groups page

  2. Click the instance group for which you want to view status messages.
  3. On the details page, view the status message on the Members tab. For example:

    Status messages on instance group details page.

Commonly returned status messages

When the autoscaler experiences an issue scaling, it returns a warning or error message. Here are some commonly returned messages and what they mean.

All instances in the instance group are unhealthy (not in RUNNING state). If this is an error, check the instances.
All of the instances in the instance group have a state that is something other than RUNNING. If this is intentional, then you can ignore this message. If this is not intentional, troubleshoot the instance group.
The number of instances has reached the maxNumReplicas. The autoscaler cannot add more instances.
When you created the autoscaler, you indicated the maximum number of instances the instance group can have. The autoscaler is currently attempting to resize the instance group up to meet demand but has reached the maxNumReplicas. For information about how to update maxNumReplicas to a larger number, see Updating an autoscaler.
The monitoring metric that was specified does not exist or does not have the required labels. Check the metric.
You are autoscaling using a Cloud Monitoring metric but the metric you provided does not exist or lacks the necessary labels. Depending on whether the metric is a standard or custom metric, different labels are required. See the documentation for Scaling based on a Monitoring metric for more information.
Quota for some resources is exceeded. Increase the quota or delete resources to free up more quota.
You can get information about your available quota on the Quota page in the Google Cloud Console.
Autoscaling does not work with an HTTP/S load balancer configured for maxRate.
The instance group is being load balanced using the maxRate configuration but the autoscaler does not support this mode. Either change the configuration or disable autoscaling. To learn more about maxRate, read the Restrictions and guidelines in the load balancing documentation.
The autoscaler is configured to scale based on a load balancing signal but the instance group has not received any queries from the load balancer. Check that the load balancing configuration is working.
The instance group is being load balanced but the group has no incoming queries. The service might be experiencing a period of idleness, in which case there is nothing to worry about. However, this message can also be caused by misconfiguration; for example an autoscaled instance group might be the target of more than one load balancer, which is not supported. For a full list of guidelines, see Restrictions and guidelines in the load balancing documentation.