Autoscaling performs automatic scaling of a managed instance group on your behalf. Use this document to understand some of the decisions that Autoscaler might make when scaling your managed instance groups.
Gaps between target and actual utilization
During the autoscaling process, you might notice that, for smaller instance groups, the actual utilization of the instance group and the target utilization might seem far apart. This is because an Autoscaler will always act conservatively by rounding up or down when it interprets utilization data and determines how many instances to add or remove. This prevents the Autoscaler from adding an insufficient number of resources or removing too many resources.
For example, if you set a utilization target of 0.7 and your application exceeds the utilization target, the Autoscaler might determine that adding 1.5 virtual machines would decrease the utilization to close to 0.7. Since it is not possible to add 1.5 virtual machines, the Autoscaler rounds up and will add two virtual machines. This might decrease the average CPU utilization to a value below 0.7 but ensures that your application has enough resources to support it.
Similarly, if the Autoscaler determines that removing 1.5 virtual machines would increase your utilization to close to 0.7, it will remove just one virtual machine.
For larger groups with more virtual machines, the utilization is divided up over a larger number of instances and adding or removing virtual machines causes less of a gap between actual utilization and target utilization.
Delays in scaling down
For the purposes of scaling down, an Autoscaler takes into account the last 10 minutes of virtual machine instance usage to decide how to scale. Using at least the last 10 minutes of usage helps:
- Ensure that information collected from the instance group is stable.
- Prevent behavior where an Autoscaler continuously adds or remove instances at an excessive rate.
- Allow the Autoscaler to determine that it is safe to remove instances. An Autoscaler will only remove instances once it has determined that the smaller group size will be enough to support load from the last 10 minutes.
This 10 minute delay might appear as a delay in scaling down, but is actually a built-in feature of autoscaling. The delay also ensures that if a new virtual machine instance is added to the managed instance group, the instance is running for at least 10 minutes before it is eligible to be terminated.
Cool down periods for new instances are ignored when deciding whether to scale down a group.
Connection draining causing delays
If the group is part of a backend service that has enabled connection draining, it can take up to 60 seconds after the connection draining duration has elapsed before the VM instance is removed or deleted.
Preparing for instance terminations
When the Autoscaler scales down, it determines the number of virtual machines it needs to shut down, and selects virtual machine instances that have low utilization to terminate from the instance group. Before an instance is terminated, you might want to make sure these instances perform certain tasks, such as closing any existing connections, gracefully shut down any applications or application servers, uploading logs, and so on. You can instruct your instance to perform these tasks using a shutdown script.
A shutdown script will run, on a best-effort basis, in the brief period between when the termination request is made and when the instance is actually terminated. During this period, Compute Engine will attempt to run your shutdown script to perform any tasks you provide in the script.
This is particularly useful if you are using load balancing with your managed instance group. If your instance becomes unhealthy, it might take some time for the load balancer to recognize that the instance is unhealthy, causing the load balancer to continue sending new requests to the instance. With a shutdown script, the instance can report that it is unhealthy while it is shutting down so that the load balancer can stop sending traffic to the instance. For more information, see Handling unhealthy instances in the load balancing documentation.
For more information on shutdown scripts, see Shutdown scripts.
Viewing autoscaling charts for utilization
If you have a managed instance group that is being autoscaled, Compute Engine provides an autoscaling chart that tracks the total utilization and the number of autoscaled instances at any point in time. You can access this chart in the Google Cloud Platform Console.
- Go to the Instance Groups page in the GCP Console.
- Click the name of an autoscaled managed instance group you want to view. The group must be using autoscaling based on CPU utilization (other autoscaling metrics are not yet supported).
On the managed instance group details page, select the Monitoring tab, if it is not already selected.
Make sure that Autoscaled size is selected in the dropdown menu.
The charts track the number of instances over CPU usage. Use the following information to understand these charts.
- The blue line on the upper graph indicates the number of instances in the managed instance group.
- The blue line on the lower chart shows the total CPU utilization of the group.
- The green line on the lower chart shows the remaining available capacity
of the managed instance group.
- If the green line is above blue line, there is a large amount of capacity available and your VM instances are likely underutilized.
- If the green line is below the blue line, then there is little, if any, remaining capacity and you should add more instances to the instance group.
- If your capacity drops, then it likely means that the size of your instance group has decreased so the blue line on the upper chart will also drop. Similarly, if your capacity increases, the size of your instance group has likely also increased.
For example, the following graph captures an autoscaled managed instance group that reaches capacity, which causes the Autoscaler to add more VM instances to the group, increasing capacity of the group.
Viewing status messages
When the Autoscaler experiences an issue scaling, it returns a warning or error message. You can review these status messages in one of two ways.
View status messages on the Instance groups page
You can view status messages directly on the Instance groups page in the Google Cloud Platform Console.
- Go to the Instance groups page in the Google Cloud Platform Console.
Look for any instance groups that have the caution ! icon. For example:
Hover over a status icon to get details of the status message.
View status messages on the Instance group details page
You can go directly to the details page of a specific instance group to view relevant status messages.
- Go to the Instance groups page in the Google Cloud Platform Console.
- Click the instance group for which you want to view status messages.
On the details page, view the status message on the Members tab. For example:
Commonly returned status messages
When the Autoscaler experiences an issue scaling, it returns a warning or error message. Here are some commonly returned messages and what they mean.
All instances in the instance group are unhealthy (not in RUNNING state). If this is an error, check the instances.
- All of the instances in the instance group has a state that is something other
RUNNING. If this is intentional, then you can ignore this message. If this is not intentional, troubleshoot the instance group.
The number of instances has reached the maxNumReplicas. The Autoscaler cannot add more instances.
- When you created the Autoscaler, you indicated the maximum number of
instances the instance group can have. The Autoscaler is currently attempting
to resize the instance group up to meet demand but has reached the
maxNumReplicas. To update
maxNumReplicasto a larger number, read Updating an Autoscaler.
The monitoring metric that was specified does not exist or does not have the required labels. Check the metric.
- You are autoscaling using a Stackdriver metric but the metric you provided does not exist or lacks the necessary labels. Depending on whether the metric is a standard or custom metric, different labels are required. See the documentation for Scaling based on a Stackdriver Monitoring metric for more information.
Quota for some resources is exceeded. Increase the quota or delete resources to free up more quota.
- You can get information about your available quota on the Quota page in the Google Cloud Platform Console.
Autoscaling does not work with an HTTP/S load balancer configured for maxRate.
- The instance group is being load balanced using the
maxRateconfiguration but Autoscaler does not support this mode. Either change the configuration or disable autoscaling. To learn more about
maxRate, read the Restrictions and Guidelines in the load balancing documentation.
The autoscaler is configured to scale based on a load balancing signal but the instance group has not received any queries from the load balancer. Check that the load balancing configuration is working.
- The instance group is being load balanced but the group has no incoming queries. The service might be experiencing a period of idleness, in which case there is nothing to worry about. However, this message can also be caused by misconfiguration; for example an autoscaled instance group might be the target of more than one load balancer, which is not supported. For a full list of guidelines, see Restrictions and Guidelines in the load balancing documentation.