Autoscaling performs automatic scaling of a managed instance group on your behalf. Use this document to understand some of the decisions that autoscaler might make when scaling your managed instance groups.
Gaps between target and actual utilization
During the autoscaling process, you might notice that, for smaller instance groups, the actual utilization of the instance group and the target utilization might seem far apart. This is because an autoscaler will always act conservatively by rounding up or down when it interprets utilization data and determines how many instances to add or remove. This prevents the autoscaler from adding an insufficient number of resources or removing too many resources. However, for smaller instance groups with fewer instances, the utilization of the group may seem far from the target utilization.
For example, if you set a utilization target of 0.7 and your application exceeds the utilization target, the autoscaler might determine that adding 1.5 virtual machines would decrease the utilization to close to 0.7. Since it is not possible to add 1.5 virtual machines, the autoscaler rounds up and will add two virtual machines. This might decrease the average CPU utilization to a value below 0.7 but ensures that your application has enough resources to support it.
Similarly, if the autoscaler determines that removing 1.5 virtual machines would increase your utilization to close to 0.7, it will remove just one virtual machine.
For larger groups with more virtual machines, the utilization is divided up over a larger number of instances and adding or removing virtual machines causes less of a gap between actual utilization and target utilization.
Delays in scaling down
For the purposes of scaling down, an autoscaler takes into account the last 10 minutes of virtual machine instance usage to decide how to scale. Using at least the last 10 minutes of usage helps:
- Ensure that information collected from the instance group is stable.
- Prevent behavior where an autoscaler continuously adds or remove instances at an excessive rate.
- Allow the autoscaler to determine that it is safe to remove instances. An autoscaler will only remove instances once it has determined that the smaller group size will be enough to support load from the last 10 minues.
This 10 minute delay might appear as a delay in scaling down, but is actually a built-in feature of autoscaling. The delay also ensures that if a new virtual machine instance is added to the managed instance group, the instance is running for at least 10 minutes before it is eligible to be terminated. Since Compute Engine charges for a 10-minute minimum for running instances, the 10 minute delay prevents virtual machine instances from being terminated before the instance has been running for 10 minutes.
Cool down periods for new instances are ignored when deciding whether to scale down a group.
Preparing for instance terminations
When the autoscaler scales down, it determines the number of virtual machines it needs to shut down, and selects virtual machine instances that have low utilization to terminate from the instance group. Before an instance is terminated, you might want to make sure these instances perform certain tasks, such as closing any existing connections, gracefully shut down any applications or application servers, uploading logs, and so on. You can instruct your instance to perform these tasks using a shutdown script.
A shutdown script will run, on a best-effort basis, in the brief period between when the termination request is made and when the instance is actually terminated. During this period, Compute Engine will attempt to run your shutdown script to perform any tasks you provide in the script.
This is particularly useful if you are using load balancing with your managed instance group. If your instance becomes unhealthy, it might take some time for the load balancer to recognize that the instance is unhealthy, causing the load balancer to continue sending new requests to the instance. With a shutdown script, the instance can report that it is unhealthy while it is shutting down so that the load balancer can stop sending traffic to the instance. For more information, see Handling unhealthy instances in the load balancing documentation.
For more information on shutdown scripts, see Shutdown scripts.
For more information on instance shutdown, see Stopping or deleting an instance.
Viewing autoscaling charts for CPU utilization
If you have a managed instance group that is being autoscaled based on CPU utilization, Compute Engine provides an autoscaling chart that tracks the total CPU utilization and the number of autoscaled instances at any point in time. You can access this chart in the Google Cloud Platform Console.
- Go to the Instance Groups page in the Cloud Platform Console.
- Click on an autoscaled managed instance group you want to view. The group must be using autoscaling based on CPU utilization (other autoscaling metrics are not yet supported).
On the managed instance group details page, select the Members tab, if it is not already selected.
Make sure that Autoscaled size is selected in the dropdown menu.
The chart tracks the number of instances over CPU usage. Use the following information to understand your chart:
- The blue line indicates the number of instances in the managed instance group.
- The dark green line shows the total CPU utilization of the group.
The light green shaded area shows the remaining available capacity of the managed instance group. If the light green shaded area is large and the height exceeds the dark green line, there is a large amount of capacity available and your VM instances are likely under-utilized. If the light green shaded area is small or lacking, then there is very little, if any, remaining capacity and you should add more instances to the instance group.
If your capacity drops, then it likely means that the size of your instance group has decreased so the blue line will also drop. Similarly, if your capacity increases, the size of your instance group has likely also increased.
For example, the following graph captures an autoscaled managed instance group that reaches capacity, which causes the autoscaler to add more VM instances to the group, increasing capacity of the group:
Commonly returned status messages
When the autoscaler experiences an issue scaling, it returns a warning or error message. Here are some commonly returned messages and what they mean.
All instances in the instance group are unhealthy (not in RUNNING state). If this is an error, check the instances.
All of the instances in the instance group has a state that is something other than
RUNNING. If this is intentional, then you can ignore this message. If this is not intentional, troubleshoot the instance group.
The number of instances has reached the maxNumReplicas. The autoscaler cannot add more instances.
When you created the autoscaler, you indicated the maximum number of instances the instance group can have. The autoscaler is currently attempting to resize the instance group up to meet demand but has reached the
maxNumReplicas. To update
maxNumReplicasto a larger number, read Updating an autoscaler.
The monitoring metric that was specified does not exist or does not have the required labels. Check the metric.
You are autoscaling using a Stackdriver metric but the metric you provided does not exist or lacks the necessary labels. Depending on whether the metric is a standard or custom metric, different labels are required. See the documentation for Scaling based on a Stackdriver Monitoring metric for more information.
Quota for some resources is exceeded. Increase the quota or delete resources to free up more quota.
You can get information about your available quota on the Quota page in the Google Cloud Platform Console.
Autoscaling does not work with an HTTP/S load balancer configured for maxRate.
The instance group is being load balanced using the
maxRateconfiguration but autoscaler does not support this mode. Either change the configuration or disable autoscaling. To learn more about
maxRate, read the Restrictions and Guidelines in the load balancing documentation.