This document describes how to scale a managed instance group (MIG) based on Monitoring metrics.
You can also scale a MIG based on its CPU utilization or its the serving capacity of an external HTTP(S) load balancer.
When you scale a MIG based on Monitoring metrics, you can scale based on the following metric types:
- Scale using per-instance metrics where the selected metric provides data for each virtual machine (VM) instance in the MIG indicating resource utilization.
- Scale using per-group metrics where the group scales based on a metric that provides a value related to the whole managed instance group.
These metrics can be either standard metrics provided by the Cloud Monitoring service, or custom Cloud Monitoring metrics that you create.
Limitations
You cannot autoscale based on Cloud Monitoring logs-based metrics.
Regional managed instance groups do not support filtering for per-instance metrics.
Regional managed instance groups do not support autoscaling using per-group metrics.
Before you begin
- If you want to use the command-line examples in this guide:
- Install or update to the latest version of the gcloud command-line tool.
- Set a default region and zone.
- If you want to use the API examples in this guide, set up API access.
- Review the autoscaler limitations.
- Read about autoscaler fundamentals.
Per-instance metrics
Per-instance metrics provide data for each VM in a MIG separately, indicating resource utilization for each instance. When using per-instance metrics, the MIG cannot scale below a size of 1 VM because the autoscaler requires metrics about at least one running VM in order to operate.
If you need to scale using Cloud Monitoring metrics that aren't specific to individual VMs or if you need to scale your MIG down to zero VMs from time to time, you can configure your MIG to scale using per-group metrics instead.
Standard per-instance metrics
Cloud Monitoring has a set of standard metrics that you can use to monitor your VMs. However, not all standard metrics are a valid utilization metric that the autoscaler can use.
A valid utilization metric for scaling meets the following criteria:
The standard metric must contain data for a
gce_instance
monitored resource. You can use thetimeSeries.list
API call to verify whether a specific metric exports data for this resource.The standard metric describes how busy an instance is, and the metric value increases or decreases proportionally to the number of VMs in the group.
The following metric is invalid because the value does not change based on usage, and the autoscaler can't use the value to scale proportionally:
compute.googleapis.com/instance/cpu/reserved_cores
After you select a standard metric you want to use for your autoscaler, you can configure autoscaling using that metric.
Custom metrics
You can create custom metrics using Cloud Monitoring and write your own monitoring data to the Monitoring service. This gives you side-by-side access to standard Google Cloud data and your custom monitoring data, with a familiar data structure and consistent query syntax. If you have a custom metric, you can choose to scale based on the data from these metrics.
Prerequisites
To use custom metrics, you must first do the following:
- Create a custom metric. For information about creating a custom metric, see Using custom metrics.
- Set up your MIG to export the custom metric from all VMs in the group.
Choose a valid custom metric
Not all custom metrics can be used by the autoscaler. To choose a valid custom metric, the metric must have all of the following properties:
- The metric must be a per-instance metric. The metric must export data that is relevant to each specific Compute Engine VM instance separately.
- The exported per-instance values must be associated with a
gce_instance
monitored resource, which contains the following labels:zone
with the name of the zone the instance is in.instance_id
with the value of unique numerical ID assigned to the VM.
- The metric must export data at least every 60 seconds. If you export data more often than every 60 seconds, the autoscaler can respond to load changes more quickly. If you export your data less than every 60 seconds, the autoscaler might not respond to load changes quickly enough.
- The metric must be a valid utilization metric, which means that data from the metric can be used to proportionally scale out or in the number of VMs.
- The metric must export
int64
ordouble
data values.
For autoscaler to work with your custom metric, you must export data for this custom metric from all the VMs in the MIG.
Configuring autoscaling using per-instance monitoring metrics
The process of setting up an autoscaler for a standard or custom metric is the same. To create an autoscaler that uses Cloud Monitoring metrics, you must provide the metric identifier, the desired target utilization level, and the utilization target type. Each of these properties is described below:
Metric identifier: The name of the metric to use. If you use a custom metric, you defined this name when you created the metric. The identifier has the following format:
custom.googleapis.com/path/to/metric
See Using custom metrics for more information about creating, browsing, and reading metrics.
Target utilization level: The level that the autoscaler must maintain. This must be a positive number. For example, both
24.5
and1100
are acceptable values. Note that this is different from CPU and load balancing, which must be a float value between 0.0 and 1.0.Target type: How the autoscaler computes the data collected from the instances. The possible target types are:
GAUGE
. The autoscaler computes the average value of the data collected in the last couple minutes and compares that to the target utilization value of the autoscaler.DELTA_PER_MINUTE
. The autoscaler calculates the average rate of growth per minute and compares that to the target utilization.DELTA_PER_SECOND
. The autoscaler calculates the average rate of growth per second and compares that to the target utilization.
For accurate comparisons, if you set the target utilization in seconds, use
DELTA_PER_SECOND
as the autoscaler target type. Likewise, useDELTA_PER_MINUTE
for a target utilization in minutes.
Console
In the Google Cloud Console, go to the Instance groups page.
If you do not have a managed instance group, create one. Otherwise, click the name of a MIG from the list to open the instance group details page.
On the instance group details page, click the Edit Group button.
If no autoscaling configuration exists, under Autoscaling, click Configure autoscaling.
Under Autoscaling mode, select Autoscale to enable autoscaling.
In the Autoscaling metrics section, if an existing metric exists, click it to edit it, or click Add new metric to add another metric.
Set the Metric type to Stackdriver Monitoring metric.
In the Metric export scope section, select Time series per instance to configure autoscaling using per-instance metrics.
In the Metric identifier section, enter the metric name in the following format:
example.googleapis.com/path/to/metric
.In the Additional filter expression section:
- For a zonal MIG, optionally enter a filter to use individual values from metrics with multiple streams or labels. For more information, see Filtering per-instance metrics.
- For a regional MIG, leave this section blank.
In the Utilization target section, specify the target value.
In the Utilization target type section, verify that the target type corresponds to the metric's kind of measurement.
Save your changes when you are finished.
gcloud
For example, in the gcloud
command-line tool, the following command creates an
autoscaler that
uses the GAUGE
target type. Along with the --custom-metric-utilization
parameter, the --max-num-replicas
parameter is also required when creating
an autoscaler:
gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \ --custom-metric-utilization metric=example.googleapis.com/path/to/metric,utilization-target-type=GAUGE,utilization-target=10 \ --max-num-replicas 20 \ --cool-down-period 90 \ --region us-west1
Optionally, you can use the --cool-down-period
flag, which tells the
autoscaler how many seconds to wait after a new VM has started
before the autoscaler starts collecting usage information from it. This
accounts for the amount of time it might take for the VM to
initialize, during which the collected usage is not reliable for
autoscaling. The default cool down period is 60 seconds.
To see a full list of available commands and flags for the
gcloud
tool, see the
gcloud
reference.
API
In the API, make a POST
request to the following URL, replacing
myproject
with your own project ID and us-central1-f
with the
zone of your choice:
POST https://compute.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers/
Your request body must contain the name
, target
, and autoscalingPolicy
fields. In autoscalingPolicy
, provide the maxNumReplicas
and the
customMetricUtilizations
properties.
Optionally, you can use the coolDownPeriodSec
parameter, which tells the
autoscaler how many seconds to wait after a new VM has started before
it starts to collect usage data. After the cool-down period passes, the
autoscaler begins to collect usage information from the new VM and
determines whether the MIG requires additional VMs. This accounts
for the amount of time it can take for the VM to initialize, during
which the collected usage data is not reliable for autoscaling. The
default cool-down period is 60 seconds.
POST https://compute.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers { "name": "example-autoscaler", "target": "zones/us-central1-f/instanceGroupManagers/example-managed-instance-group", "autoscalingPolicy": { "maxNumReplicas": 10, "coolDownPeriodSec": 90, "customMetricUtilizations": [ { "metric": "example.googleapis.com/some/metric/name", "utilizationTarget": 10, "utilizationTargetType": "GAUGE" } ] } }
Filtering per-instance metrics (beta)
You can apply filters to per-instance Cloud Monitoring metrics, which lets you scale zonal (single-zone) MIGs using individual values from metrics with multiple streams or labels.
Per-instance metric filtering requirements
Autoscaler filtering is compatible with the Cloud Monitoring filter syntax with some limitations. The filters for per-instance metrics must meet the following requirements:
- You can use only the
AND
operator for joining selectors. - You can use only the
=
direct equality comparison operator, but you cannot use the operator with any functions. For example, you cannot use thestartswith()
function with the=
comparison operator. - You must wrap the value of a filter in double quotes, for example:
metric.label.state = "used"
. - You cannot use wildcards.
- You must not set the
resource.type
orresource.label.*
selectors. Per-instance metrics always use all of the instance resources from the group. - For best results, create a filter that is specific enough to return a single time series for each instance. If the filter returns multiple time series, they are added together.
Configuring autoscalers to filter metrics
Use the Google Cloud Console, the
gcloud
command-line tool (beta),
or the
Compute Engine beta API to
add metric filters for autoscaling of a zonal MIG.
Console
The process for creating an autoscaler that filters a per-instance metric is
similar to creating a normal per-instance
autoscaler, but you also specify a metric filter. For example, the
compute.googleapis.com/instance/network/received_bytes_count
metric includes the instance_name
and loadbalanced
labels. To filter
based on the loadbalanced
boolean value:
In the Cloud Console, go to the Instance groups page.
If you do not have a zonal MIG, create one. Otherwise, click the name of a zonal MIG to open the instance group details page.
On the instance group details page, click the Edit Group button.
If no autoscaling configuration exists, under Autoscaling, click Configure autoscaling.
Under Autoscaling mode, select Autoscale to enable autoscaling.
In the Autoscaling metrics section, if an existing metric exists, click it to edit it, or click Add new metric to add another metric.
In the Metric type section, select Stackdriver Monitoring metric.
In the Metric export scope section, select Time series per instance to configure autoscaling using per-instance metrics.
In the Metric identifier section, enter the metric name. For example,
compute.googleapis.com/instance/network/received_bytes_count
.In the Additional filter expression section, enter a filter. For example,
'metric.label.loadbalanced = true'
.Save your changes when you are finished.
gcloud
The process for creating an autoscaler that filters a per-instance metric is
similar to creating a normal per-instance
autoscaler, but you must specify a metric filter and individual flags for
the utilization target and target type. For example, the
compute.googleapis.com/instance/network/received_bytes_count
metric includes the instance_name
and loadbalanced
labels. To filter
based on the loadbalanced
boolean, specify the
--stackdriver-metric-filter
filter flag with the
'metric.label.loadbalanced = true'
value. Include the
utilization target and target type flags individually.
gcloud beta compute instance-groups managed set-autoscaling example-managed-instance-group \ --update-stackdriver-metric=compute.googleapis.com/instance/network/received_bytes_count \ --stackdriver-metric-utilization-target-utilization-target=10 \ --stackdriver-metric-utilization-target-type=DELTA_PER_SEC \ --stackdriver-metric-filter='metric.label.loadbalanced = true' \ --max-num-replicas 20 \ --cool-down-period 90 \ --region us-west1
This example configures autoscaling to use only the loadbalanced
traffic data as part of the utilization target.
To see a list of available gcloud
commands and flags, see the
gcloud
tool reference (beta).
API
The process for creating an autoscaler that filters a per-instance metric is
similar to creating a normal per-instance
autoscaler, but you must specify a metric filter and individual flags for
the utilization target and target type. For example, the
compute.googleapis.com/instance/network/received_bytes_count
metric includes the instance_name
and loadbalanced
labels. To filter
based on the loadbalanced
boolean value, specify the filter
parameter
with the "metric.label.loadbalanced = true"
value.
In the API, make a POST
request to the following URL, replacing
myproject
with your own project ID and us-central1-f
with the
zone of your choice. The request body must contain the name
, target
,
and autoscalingPolicy
fields. In autoscalingPolicy
, provide the
maxNumReplicas
and the customMetricUtilizations
properties.
POST https://compute.googleapis.com/compute/beta/projects/myproject/zones/us-central1-f/autoscalers { "name": "example-autoscaler", "target": "zones/us-central1-f/instanceGroupManagers/example-managed-instance-group", "autoscalingPolicy": { "maxNumReplicas": 10, "coolDownPeriodSec": 90, "customMetricUtilizations": [ { "metric": "compute.googleapis.com/instance/network/received_bytes_count", "filter": "metric.label.loadbalanced = true", "utilizationTarget": 10, "utilizationTargetType": "DELTA_PER_SEC" } ] } }
This example configures autoscaling to use only the loadbalanced
traffic data as part of the utilization target.
Per-group metrics
Per-group metrics allow autoscaling with a standard or custom metric that does not export per-instance utilization data. Instead, the group scales based on a value that applies to the whole group and corresponds to how much work is available for the group or how busy the group is. The group scales based on the fluctuation of that group metric value and the configuration that you define.
When you configure autoscaling on per-group metrics, you must indicate how you want the autoscaler to provision instances relative to the metric:
- Instance assignment: Specify an instance assignment to indicate that you
want the autoscaler to add or remove VMs depending on how much work
is available to assign to each VM. Specify a value for this parameter
that represents how much work you expect each VM to handle.
For example, specify
2
to assign two units of work to each VM, or specify0.5
to assign half a unit of work to each VM. The autoscaler scales the MIG to ensure that there are enough VMs to complete the available work as indicated by the metric. If the metric value is10
and you assigned0.5
units of work to each VM, the autoscaler creates 20 VM in the MIG. Scaling with instance assignment allows the group to shrink to0
VMs when the metric value drops down to0
—and back up again when it rises above0
. The following diagram shows the proportional relationship between metric value and number of VMs when scaling with an instance assignment policy. - Utilization target: Specify a utilization target to indicate that you
want the autoscaler to add or remove VMs to try and maintain the metric
at a specified value. When the metric is above the specified target,
autoscaler gradually adds VMs until the metric decreases to the target
value. When the metric is below the specified target value, autoscaler
gradually removes VMs until the metric increases to the target value.
Scaling with a utilization target cannot shrink the group to
0
VMs. The following diagram shows how autoscaler adds and removes VMs in response to a metric value to maintain a utilization target.
Each option has the following use cases:
- Instance assignment: Scale the size of your MIG based on the number of unacknowledged messages in a Pub/Sub subscription or a total QPS rate of a network endpoint.
- Utilization target: Scale the size of your MIG based on a utilization target for a custom metric that does not come from the standard per-instance CPU or memory use metrics. For example, you might scale the group based on a custom latency metric.
When you configure autoscaling with per-group metrics and you specify an instance assignment, your MIG can scale in to 0 VMs. If your metric indicates that there is no work for your group to complete, the group scales in to 0 VMs until the metric detects that new work is available. In contrast to scaling based on per-group metrics, per-instance autoscaling requires resource utilization metrics from at least one VM, so the group cannot scale below a size of 1.
Filtering per-group metrics
You can apply filters to per-group Cloud Monitoring metrics, which lets you scale MIGs using individual values from metrics that have multiple streams or labels.
Per-group metric filtering requirements
Autoscaler filtering is compatible with the Cloud Monitoring filter syntax with some limitations. The filters for per-group metrics must meet the following requirements:
- You can use only the
AND
operator for joining selectors. - You can't use the
=
direct equality comparison operator with any functions for each selector. For example, you cannot use thestartswith()
function with the=
comparison operator. - You cannot use wildcards.
- You must wrap the value of a filter in double quotes, for example:
metric.label.state = "used"
. - You can specify a metric type selector of
metric.type = "..."
in the filter and also include the originalmetric
field. Optionally, you can use only themetric
field. The metric must meet the following requirements:- The metric must be specified in at least in one place.
- The metric can be specified in both places, but must be equal.
- You must specify the
resource.type
selector, but you cannot set it togce_instance
if you want to scale using per-group metrics. - For best results, the filter should be specific enough to return a single time series for the group. If the filter returns multiple time series, they are added together.
Configuring autoscaling using per-group monitoring metrics
Use the Google Cloud Console, the
gcloud
command-line tool,
or the
Compute Engine API
to configure autoscaling with per-group metrics for a single-zone MIG.
Console
In the Cloud Console, go to the Instance groups page.
If you do not have a managed instance group, create one. Otherwise, click the name of a zonal MIG to open the instance group details page.
On the instance group details page, click the Edit Group button.
If no autoscaling configuration exists, under Autoscaling, click Configure autoscaling
Under Autoscaling mode, select Autoscale to enable autoscaling.
In the Autoscaling metrics section, if an existing metric exists, click it to edit it, or click Add new metric to add another metric.
Set the Metric type to Stackdriver Monitoring metric.
In the Metric export scope section, select Single time series per group.
In the Metric identifier section, specify the metric name in the following format:
example.googleapis.com/path/to/metric
.Specify the Metric resource type.
If you want to use individual values from metrics that have multiple streams or labels, provide an Additional filter expression. The filter must meet the autoscaler filtering requirements.
In the Scaling policy section, select either Instance assignment or Utilization target.
- If you select an instance assignment policy, then provide a Single
instance assignment value that represents the amount of work to assign
to each VM instance in the MIG. For example, specify
2
to assign two units of work to each VM. The autoscaler maintains enough VMs to complete the available work (as indicated by the metric). If the metric value is10
and you assigned2
units of work to each VM, the autoscaler creates5
VMs in the MIG. - If you select a utilization target policy:
- Provide a Utilization target value that represents the metric value that the autoscaler should try to maintain.
- Select the Utilization target type that represents the value type for the metric.
- If you select an instance assignment policy, then provide a Single
instance assignment value that represents the amount of work to assign
to each VM instance in the MIG. For example, specify
Save your changes when you are finished.
gcloud
Create an autoscaler for a MIG similarly to the
per-instance autoscaler, but specify the
--update-stackdriver-metric
flag. You can specify how you want the
autoscaler to provision instances by including one of the following
flags:
- Instance assignment: Specify the
--stackdriver-metric-single-instance-assignment
flag. - Utilization target: Specify the
--stackdriver-metric-utilization-target
flag.
Instance assignment:
Specify a metric that you want to measure and specify the
--stackdriver-metric-single-instance-assignment
flag to indicate
the amount of work that you expect each instance to handle. You must also
specify a filter for the metric using the
--stackdriver-metric-filter
flag.
gcloud compute instance-groups managed set-autoscaling GROUP_NAME \ --zone=ZONE \ --max-num-replicas=MAX_INSTANCES \ --min-num-replicas=MIN_INSTANCES \ --update-stackdriver-metric='METRIC_URL' \ --stackdriver-metric-filter='METRIC_FILTER' \ --stackdriver-metric-single-instance-assignment=INSTANCE_ASSIGNMENT
Replace the following:
GROUP_NAME
: The name of the MIG where you want to add an autoscaler.ZONE
: The zone where the MIG is located. You cannot specify a region for autoscalers on per-group metrics.MAX_INSTANCES
: The maximum number of VMs that the MIG can have.MIN_INSTANCES
: The minimum number of VMs that the MIG can have.METRIC_URL
: A protocol-free URL of a Monitoring metric.METRIC_FILTER
: A Cloud Monitoring filter where you specify a monitoring filter with a relevantTimeSeries
and aMonitoredResource
. The filter must meet the autoscaler filtering requirements.INSTANCE_ASSIGNMENT
: The amount of work to assign to each VM instance in the MIG. For example, specify2
to assign two units of work to each VM, or specify0.5
to assign half a unit of work to each VM. The autoscaler scales the MIG to ensure that there are enough VMs to complete the available work, which is indicated by the metric. If the metric value is10
and you've assigned0.5
units of work to each VM, the autoscaler provisions20
VMs in the MIG.
Utilization target:
In some situations, you might want to use utilization targets with
per-group metrics rather than specify a number of VMs relative
to the value of the metric that your autoscaler measures. You can
still point the autoscaler to a per-group metric, but the autoscaler
attempts to maintain the specified utilization target. Specify the target
and target type with the --stackdriver-metric-utilization-target
flag.
You must also specify a filter for the metric using the
--stackdriver-metric-filter
flag.
gcloud compute instance-groups managed set-autoscaling GROUP_NAME \ --zone=ZONE \ --max-num-replicas=MAX_INSTANCES \ --min-num-replicas=MIN_INSTANCES \ --update-stackdriver-metric='METRIC_URL' \ --stackdriver-metric-filter='METRIC_FILTER' \ --stackdriver-metric-utilization-target=TARGET_VALUE \ --stackdriver-metric-utilization-target-type=TARGET_TYPE
Replace the following:
GROUP_NAME
: The name of the MIG where you want to add an autoscaler.ZONE
: The zone where the MIG is located. You cannot specify a region for autoscalers on per-group metrics.MAX_INSTANCES
: The maximum number of VMs that the MIG can have.MIN_INSTANCES
: The minimum number of VMs that the MIG can have.METRIC_URL
: A protocol-free URL of a Monitoring metric.METRIC_FILTER
: A Cloud Monitoring filter where you specify a monitoring filter with a relevantTimeSeries
and aMonitoredResource
. You must specify aresource.type
value, but you cannot specifygce_instance
if you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.TARGET_VALUE
: The metric value that the autoscaler attempts to maintain.TARGET_TYPE
: The value type for the metric. You can set the autoscaler to monitor the metric as aGAUGE
, by thedelta-per-minute
of the value, or by thedelta-per-second
of the value.
To see a list of available autoscaler gcloud
command-line tool commands and flags
that work with per-group autoscaling, see the
gcloud
command-line tool reference.
API
Create an autoscaler for a MIG. You can specify how you want the autoscaler to provision instances by including one of the following parameters:
- Instance assignment: Specify the
singleInstanceAssignment
parameter. - Utilization target: Specify the
utilizationTarget
parameter.
Instance assignment:
In the API, make a POST
request to create an autoscaler.
In the request body, include the normal parameters that you would use to
create a per-instance autoscaler, but specify the
single-instance-assignment
parameter. The single-instance-assignment
parameter specifies the amount
of work that you expect each VM to handle.
POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/autoscalers { "name": "example-autoscaler", "target": "zones/ZONE/instanceGroupManagers/GROUP_NAME", "autoscalingPolicy": { "maxNumReplicas": MAX_INSTANCES, "minNumReplicas": MIN_INSTANCES, "customMetricUtilizations": [ { "metric": "METRIC_URL", "filter": "METRIC_FILTER", "singleInstanceAssignment": INSTANCE_ASSIGNMENT } ], } }
Replace the following:
PROJECT_ID
: Your project ID.ZONE
: The zone where the MIG is located. You cannot specify a region for autoscalers on per-group metrics.GROUP_NAME
: The name of the MIG where you want to add an autoscaler.MAX_INSTANCES
: The maximum number of VMs that the MIG can have.MIN_INSTANCES
: The minimum number of VMs that the MIG can have.METRIC_URL
: A protocol-free URL of a Monitoring metric.METRIC_FILTER
: A Cloud Monitoring filter where you specify a monitoring filter with a relevantTimeSeries
and aMonitoredResource
. You must specify aresource.type
value, but you cannot specifygce_instance
if you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.INSTANCE_ASSIGNMENT
: The amount of work to assign to each VM instance in the MIG. For example, specify2
to assign two units of work to each VM, or specify0.5
to assign half a unit of work to each VM. The autoscaler scales the MIG to ensure that there are enough VMs to complete the available work, which is indicated by the metric. If the metric value is10
and you've assigned0.5
units of work to each VM, the autoscaler provisions20
VM in the MIG.
Utilization target:
In some situations, you might want to use utilization targets with
per-group metrics rather than specify a number of VMs relative
to the value of the metric that your autoscaler measures. You can
still point the autoscaler to a per-group metric, but the autoscaler
attempts to maintain the specified utilization target. Specify
those targets with the utilizationTarget
parameter. You must also
specify a filter for the metric using the filter
parameter.
POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/autoscalers { "name": "example-autoscaler", "target": "zones/ZONE/instanceGroupManagers/GROUP_NAME", "autoscalingPolicy": { "maxNumReplicas": MAX_INSTANCES, "minNumReplicas": MIN_INSTANCES, "customMetricUtilizations": [ { "metric": "METRIC_URL", "filter": "METRIC_FILTER", "utilizationTarget": TARGET_VALUE, "utilizationTargetType": TARGET_TYPE } ], } }
Replace the following:
PROJECT_ID
: Your project ID.ZONE
: The zone where the MIG is located. You cannot specify a region for autoscalers on per-group metrics.GROUP_NAME
: The name of the MIG where you want to add an autoscaler.MAX_INSTANCES
: The maximum number of VMs that the MIG can have.MIN_INSTANCES
: The minimum number of VMs that the MIG can have.METRIC_URL
: A protocol-free URL of a Monitoring metric.METRIC_FILTER
: A Cloud Monitoring filter where you specify a monitoring filter with a relevantTimeSeries
and aMonitoredResource
. You must specify aresource.type
value, but you cannot specifygce_instance
if you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.TARGET_VALUE
: The metric value that the autoscaler attempts to maintain.TARGET_TYPE
: The value type for the metric. You can set the autoscaler to monitor the metric as aGAUGE
, by theDELTA_PER_MINUTE
of the value, or by theDELTA_PER_SECOND
of the value.
Example: Using instance assignment to scale based on a Pub/Sub queue
Assume the following setup:
- An active Pub/Sub topic receives messages from some source.
- An active Pub/Sub subscription is connected to the topic in a
pull configuration. The subscription is named
our-subscription
. - A pool of workers is pulling messages from that subscription and processing
them. The pool is a zonal MIG named
our-instance-group
and is located in zoneus-central1-a
. The pool must not exceed 100 workers, and should scale in to 0 workers when there are no messages in the queue. - On average, a worker processes a single message in one minute.
To determine the optimal instance assignment value, consider several approaches:
- To process all messages in the queue as fast as possible, you can choose
1
as the instance assignment value. This creates one VM instance for each message in the queue (limited to the maximum number of VMs in our group). However, this can cause overprovisioning. In the worst case, a VM is created to process just one message before the autoscaler shuts it down, which consumes resources for much longer than doing actual work.- Note that if the workers are able to process multiple messages concurrently, it makes sense to increase the value to the number of concurrent processes.
- Note that, in this example, it does not make sense to set the value below
1
because one message cannot be processed by more than one worker.
- Alternatively, if processing latency is less important than resource
utilization and overhead costs, you can calculate how many messages each
VM must process within its lifetime to be considered efficiently
utilized. Take into account startup and shutdown time and the fact that
autoscaling does not immediately delete VMs. For example, assuming that
startup and shutdown time takes about 5 minutes in total and assuming that
autoscaling deletes VMs only after a period of approximately 10 minutes,
you calculate that it is efficient to create an additional VM in the
group as long as it can process at least 15 messages before the autoscaler
shuts it down, which results in, at most, 25% overhead due to the total time
it takes to create, start, and shutdown the VM. In this case, you can
choose
15
as the instance assignment value. - Both approaches can be balanced out, resulting in a number between
1
and15
, depending on which factor takes priority, processing latency versus resource utilization.
Looking at the
available Pub/Sub metrics,
we find a metric that represents the subscription queue length:
subscription/num_undelivered_messages
.
Note that this metric exports the total number of messages in the queue, including messages that are currently being processed but that are not yet acknowledged. Using a metric that does not include the messages being processed is not recommended because such a metric can drop down to 0 when there is still work being done, which prompts autoscaling to scale in and possibly interrupt the actual work.
You can now configure autoscaling for the queue:
gcloud compute instance-groups managed set-autoscaling \ our-instance-group \ --zone=us-central1-a \ --max-num-replicas=100 \ --min-num-replicas=0 \ --update-stackdriver-metric=pubsub.googleapis.com/subscription/num_undelivered_messages \ --stackdriver-metric-filter="resource.type = pubsub_subscription AND resource.label.subscription_id = our-subscription" \ --stackdriver-metric-single-instance-assignment=15
Example: Using a utilization target to scale based on average latency
There might be a situation when the metric providing a relevant signal does not represent a total amount of available work or another resource applicable to the group, as in the previous example, but instead an average, a percentile, or some other statistical property. For this example, assume you will scale based on the group's average processing latency.
Assume the following setup:
- A MIG named
our-instance-group
is assigned to perform a particular task. The group is located in zoneus-central1-a
. - You have a
Cloud Monitoring custom metric
that exports a value that you would like to maintain at a particular level. For
this example, assume the metric represents the average latency of processing
queries assigned to the group.
- The custom metric is named:
custom.googleapis.com/example_average_latency
. - The custom metric has a label with a key named
group_name
and value equal to the MIG's name,our-instance-group
. - The custom metric exports data for the global monitored resource, that is, it is not associated with any specific VM.
- The custom metric is named:
You have determined that when the metric value goes above some specific value,
you need to add more VMs to the group to handle the load, while when it
goes below that value, you can free up some resources. Autoscaling gradually
adds or removes VMs at a rate that is proportional to how much the metric
is above or below the target. For this example, assume that the calculated
target value is 100
.
You can now configure autoscaling for the group using a per-group utilization
target of 100
, which represents the metric value that the autoscaler must
attempt to maintain:
gcloud compute instance-groups managed set-autoscaling \ our-instance-group \ --zone=us-central1-a \ --max-num-replicas=100 \ --min-num-replicas=0 \ --update-stackdriver-metric=custom.googleapis.com/example_average_latency \ --stackdriver-metric-filter "resource.type = global AND metric.label.group_name = our-instance-group" \ --stackdriver-metric-utilization-target=100 \ --stackdriver-metric-utilization-target-type=delta-per-second
What's next
- Learn about managing autoscalers.
- Learn how autoscalers make decisions.
- Learn how to use multiple autoscaling signals to scale your group.