You can set up autoscaler to scale based on the following metric types:
- Scale using per-instance metrics where the selected metric provides data for each instance in the managed instance group indicating resource utilization.
- Scale using per-group metrics (beta) where the group scales based on a metric that provides a value related to the whole managed instance group.
These metrics can be either standard metrics provided by the Stackdriver Monitoring service, or custom Stackdriver Monitoring metrics that you create.
Before you begin
- If you want to use the command-line examples in this guide:
- Install or update to the latest version of the gcloud command-line tool.
- Set a default region and zone.
- If you want to use the API examples in this guide, set up API access.
- Read the Before you begin section of the Autoscaling groups of instances overview topic for important setup steps.
Per-instance metrics
Per-instance metrics provide data for each instance in a group separately. The metrics provide data for each instance in the managed instance group indicating resource utilization. For per-instance metrics, the instance group cannot scale below a size of 1 because the autoscaler requires metrics about at least one running instance in order to operate.
If you need to scale using Stackdriver Monitoring metrics that aren't specific to individual instances or if you need to scale your instance groups down to zero instances from time to time, you can configure your instances to scale using per-group metrics instead.
Standard per-instance metrics
Stackdriver Monitoring has a set of standard metrics that you can use to monitor your virtual machine (VM) instances. However, not all standard metrics are a valid utilization metric that the autoscaler can use.
A valid utilization metric for scaling meets the following criteria:
The standard metric must contain data for a
gce_instance
monitored resource. You can use thetimeSeries.list
API call to verify whether a specific metric exports data for this resource.The standard metric describes how busy an instance is, and the metric value increases or decreases proportionally to the number of VM instances in the group.
The following metric is invalid because the value does not change based on usage, and the autoscaler can't use the value to scale proportionally:
compute.googleapis.com/instance/cpu/reserved_cores
After you select a standard metric you want to use for your autoscaler, you can configure autoscaling using that metric.
Custom metrics
You can create custom metrics using Stackdriver Monitoring and write your own monitoring data to the Monitoring service. This gives you side-by-side access to standard Google Cloud data and your custom monitoring data, with a familiar data structure and consistent query syntax. If you have a custom metric, you can choose to scale based on the data from these metrics.
Prerequisites
To use custom metrics, you must first do the following:
- Create a custom metric. For information about creating a custom metric, see Using custom metrics.
- Set up your managed instance group to export the custom metric from all instances in the managed instance group.
Choose a valid custom metric
Not all custom metrics can be used by the autoscaler. To choose a valid custom metric, the metric must have all of the following properties:
- The metric must be a per-instance metric. The metric must export data that is relevant to each specific Compute Engine instance separately.
- The exported per-instance values must be associated with a
gce_instance
monitored resource, which contains the following labels:zone
with the name of the zone the instance is in.instance_id
with the value of unique numerical ID assigned to the instance.
- The metric must export data at least every 60 seconds. If you export data more often than every 60 seconds, the autoscaler can respond to load changes more quickly. If you export your data less than every 60 seconds, the autoscaler might not respond to load changes quickly enough.
- The metric must be a valid utilization metric, which means that data from the metric can be used to proportionally scale up or down the number of virtual machines.
- The metric must export
int64
ordouble
data values.
For autoscaler to work with your custom metric, you must export data for this custom metric from all the instances in the managed instance group.
curl http://metadata.google.internal/computeMetadata/v1/instance/id -H Metadata-Flavor:GoogleFor more information about using the metadata server, see Storing and retrieving instance metadata.
Configuring autoscaling using per-instance monitoring metrics
The process of setting up an autoscaler for a standard or custom metric is the same. To create an autoscaler that uses Stackdriver Monitoring metrics, you must provide the metric identifier, the desired target utilization level, and the utilization target type. Each of these properties is described briefly below:
Metric identifier: The name of the metric to use. If you use a custom metric, you defined this name when you created the metric. The identifier has the following format:
custom.googleapis.com/path/to/metric
See Using custom metrics for more information about creating, browsing, and reading metrics.
Target utilization level: The level that the autoscaler must maintain. This must be a positive number. For example, both
24.5
and1100
are acceptable values. Note that this is different from CPU and load balancing, which must be a float value between 0.0 and 1.0.Target type: How the autoscaler computes the data collected from the instances. The possible target types are:
GAUGE
. The autoscaler computes the average value of the data collected in the last couple minutes and compares that to the target utilization value of the autoscaler.DELTA_PER_MINUTE
. The autoscaler calculates the average rate of growth per minute and compares that to the target utilization.DELTA_PER_SECOND
. The autoscaler calculates the average rate of growth per second and compares that to the target utilization.
For accurate comparisons, if you set the target utilization in seconds, use
DELTA_PER_SECOND
as the autoscaler target type. Likewise, useDELTA_PER_MINUTE
for a target utilization in minutes.
Console
The instructions for configuring autoscaling are different for regional versus single-zone managed instance groups. Regional managed instance groups do not support filtering for per-instance metrics.
To configure autoscaling for a regional (multi-zone) managed instance group:
- Go to the Instance Groups page.
- If you do not have an instance group, create one. Otherwise, click the name of an instance group from the list to open the instance group details page. The group must be a regional group.
- On the instance group details page, click the Edit Group button.
- Under Autoscaling, select On to enable autoscaling.
- In the Autoscale based on section, select Stackdriver monitoring metric.
- In the Metric identifier section, enter the metric name in the
following format:
example.googleapis.com/path/to/metric
. - In the Target section, specify the target value.
- In the Target type section, specify the target type that corresponds to the metric's granularity of measurement.
- Save your changes when you are ready.
To configure autoscaling for a single-zone managed instance group:
- Go to the Instance Groups page.
- If you do not have an instance group, create one. Otherwise, click the name of an instance group to open the instance group details page. The instance group must be single-zone.
- On the instance group details page, click the Edit Group button.
- Under Autoscaling, select On to enable autoscaling.
- In the Autoscale based on section, select Stackdriver monitoring metric.
- In the Metric export scope section, select Time series per instance to configure autoscaling using per-instance metrics.
- In the Metric identifier section, enter the metric name in the
following format:
example.googleapis.com/path/to/metric
. - In the Additional filter expression section, optionally enter a filter to use individual values from metrics with multiple streams or labels. See Filtering per-instance metrics for more information.
- In the Utilization target section, specify the target value.
- In the Utilization target type section, verify that the target type corresponds to the metric's kind of measurement.
- Save your changes when you are ready.
gcloud
For example, in the gcloud
command-line tool, the following command creates an
autoscaler that
uses the GAUGE
target type. Along with the --custom-metric-utilization
parameter, the --max-num-replicas
parameter is also required when creating
an autoscaler:
gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
--custom-metric-utilization metric=example.googleapis.com/path/to/metric,utilization-target-type=GAUGE,utilization-target=10 \
--max-num-replicas 20 \
--cool-down-period 90
Optionally, you can use the --cool-down-period
flag, which tells the
autoscaler how many seconds to wait after a new virtual machine has started
before the autoscaler starts collecting usage information from it. This
accounts for the amount of time it might take for the virtual machine to
initialize, during which the collected usage is not reliable for
autoscaling. The default cool down period is 60 seconds.
For multi-zonal managed instance groups, use the --region
flag to specify
where to find the instance group. For example:
gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
--custom-metric-utilization metric=example.googleapis.com/path/to/metric,utilization-target-type=GAUGE,utilization-target=10 \
--max-num-replicas 20 \
--cool-down-period 90 \
--region us-central1
To see a full list of available commands and flags for the
gcloud
tool, see the
gcloud
reference.
API
Note: Although autoscaling is a feature of managed instance groups, it is a separate API resource. Keep that in mind when you construct API requests for autoscaling.
In the API, make a POST
request to the following URL, replacing
myproject
with your own project ID and us-central1-f
with the
zone of your choice:
POST https://compute.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers/
Your request body must contain the name
, target
, and autoscalingPolicy
fields. In autoscalingPolicy
, provide the maxNumReplicas
and the
customMetricUtilizations
properties.
Optionally, you can use the coolDownPeriodSec
parameter, which tells the
autoscaler how many seconds to wait after a new instance has started before
it starts to collect usage. After the cool-down period passes, the
autoscaler begins to collect usage information from the new instance and
determines whether the group requires additional instances. This accounts
for
the amount of time it can take for the instance to initialize, during
which the collected usage is not reliable for autoscaling. The
default cool-down period is 60 seconds.
POST https://compute.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers
{
"name": "example-autoscaler",
"target": "zones/us-central1-f/instanceGroupManagers/example-managed-instance-group",
"autoscalingPolicy": {
"maxNumReplicas": 10,
"coolDownPeriodSec": 90,
"customMetricUtilizations": [
{
"metric": "example.googleapis.com/some/metric/name",
"utilizationTarget": 10,
"utilizationTargetType": "GAUGE"
}
]
}
}
Filtering per-instance metrics
You can apply filters to per-instance Stackdriver Monitoring metrics, which lets you scale single-zone managed instance groups using individual values from metrics with multiple streams or labels.
Per-instance metric filtering requirements
Autoscaler filtering is compatible with the Stackdriver Monitoring filter syntax. The filters for per-instance metrics must meet the following requirements:
- You can use only the
AND
operator for joining selectors. - You can use only the
=
direct equality comparison operator, but you cannot use the operator with any functions. For example, you cannot use thestartswith()
function with the=
comparison operator. - You must not set the
resource.type
orresource.label.*
selectors. Per-instance metrics always use all of the instance resources from the group. - For best results, create a filter that is specific enough to return a single time series for each instance. If the filter returns multiple time series, they are added together.
Configuring autoscalers to filter metrics
Use the Google Cloud Console, the
gcloud
command-line tool (beta),
or the
Compute Engine beta API to
add metric filters for autoscaling of a single-zone managed instance group.
Console
The process for creating an autoscaler that filters a per-instance metric is
similar to creating a normal per-instance
autoscaler, but you also specify a metric filter. For example, the
compute.googleapis.com/instance/network/received_bytes_count
metric includes the instance_name
and loadbalanced
labels. To filter
based on the loadbalanced
Boolean value:
- Go to the Instance Groups page.
- If you do not have an instance group, create one. Otherwise, click the name of an instance group to open the instance group details page. The instance group must be single zone.
- On the instance group details page, click the Edit Group button.
- Under Autoscaling, select On to enable autoscaling.
- In the Autoscale based on section, select Stackdriver monitoring metric.
- In the Metric export scope section, select Time series per instance to configure autoscaling using per-instance metrics.
- In the Metric identifier section, enter the metric name. For example,
compute.googleapis.com/instance/network/received_bytes_count
. - In the Additional filter expression section, enter a
filter. For example,
'metric.label.loadbalanced = true'
. - Save your changes when you are ready.
gcloud
The process for creating an autoscaler that filters a per-instance metric is
similar to creating a normal per-instance
autoscaler, but you must specify a metric filter and individual flags for
the utilization target and target type. For example, the
compute.googleapis.com/instance/network/received_bytes_count
metric includes the instance_name
and loadbalanced
labels. To filter
based on the loadbalanced
boolean, specify the
--stackdriver-metric-filter
filter flag with the
'metric.label.loadbalanced = true'
value. Include the
utilization target and target type flags individually.
gcloud beta compute instance-groups managed set-autoscaling example-managed-instance-group \
--update-stackdriver-metric=compute.googleapis.com/instance/network/received_bytes_count \
--stackdriver-metric-utilization-target-utilization-target=10 \
--stackdriver-metric-utilization-target-type=DELTA_PER_SEC \
--stackdriver-metric-filter='metric.label.loadbalanced = true' \
--max-num-replicas 20 \
--cool-down-period 90
This example configures autoscaling to use only the loadbalanced
traffic data as part of the utilization target.
To see a list of available gcloud
commands and flags, see the
gcloud
tool reference (beta).
API
Note: Although autoscaling is a feature of managed instance groups, it is a separate API resource. Keep that in mind when you construct API requests for autoscaling.
The process for creating an autoscaler that filters a per-instance metric is
similar to creating a normal per-instance
autoscaler, but you must specify a metric filter and individual flags for
the utilization target and target type. For example, the
compute.googleapis.com/instance/network/received_bytes_count
metric includes the instance_name
and loadbalanced
labels. To filter
based on the loadbalanced
Boolean value, specify the filter
parameter
with the "metric.label.loadbalanced = true"
value.
In the API, make a POST
request to the following URL, replacing
myproject
with your own project ID and us-central1-f
with the
zone of your choice. The request body must contain the name
, target
,
and autoscalingPolicy
fields. In autoscalingPolicy
, provide the
maxNumReplicas
and the customMetricUtilizations
properties.
POST https://compute.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers
{
"name": "example-autoscaler",
"target": "zones/us-central1-f/instanceGroupManagers/example-managed-instance-group",
"autoscalingPolicy": {
"maxNumReplicas": 10,
"coolDownPeriodSec": 90,
"customMetricUtilizations": [
{
"metric": "compute.googleapis.com/instance/network/received_bytes_count",
"filter": "metric.label.loadbalanced = true",
"utilizationTarget": 10,
"utilizationTargetType": "DELTA_PER_SEC"
}
]
}
}
This example configures autoscaling to use only the loadbalanced
traffic data as part of the utilization target.
Per-group metrics
Per-group metrics allow autoscaling with a standard or custom metric that does not export per-instance utilization data. Instead, the group scales based on a value that applies to the whole group and corresponds to how much work is available for the group or how busy the group is. The group scales based on the fluctuation of that group metric value and the configuration that you define.
When you configure autoscaling on per-group metrics, you must indicate how you want the autoscaler to provision instances relative to the metric:
- Instance assignment: Specify an instance assignment to indicate that you
want the autoscaler to add or remove instances depending on how much work
is available to assign to each instance. Specify a value for this parameter
that represents how much work you expect each instance to handle.
For example, specify
2
to assign two units of work to each instance, or specify0.5
to assign half a unit of work to each instance. The autoscaler adds enough instances to the managed instance group to ensure that there are enough instances to complete the available work as indicated by the metric. If the metric value is10
and you assigned0.5
units of work to each instance, the autoscaler creates 20 instances in the managed instance group. Scaling with instance assignment allows the instance group to shrink to0
instances when the metric value drops down to0
—and back up again when it rises above0
. The following diagram shows the proportional relationship between metric value and number of instances when scaling with an instance assignment policy. - Utilization target: Specify a utilization target to indicate that you
want the autoscaler to add or remove instances to try and maintain the metric
at a specified value. When the metric is above the specified target,
autoscaler gradually adds instances until the metric decreases to the target
value. When the metric is below the specified target value, autoscaler
gradually removes instances until the metric increases to the target value.
Scaling with a utilization target cannot shrink the group to
0
instances. The following diagram shows how autoscaler adds and removes instances in response to a metric value to maintain a utilization target.
Each option has the following use cases:
- Instance assignment: Scale the size of your managed instance groups based on the number of unacknowledged messages in a Pub/Sub subscription or a total QPS rate of a network endpoint.
- Utilization target: Scale the size of your managed instance groups based on a utilization target for a custom metric that does not come from the standard per-instance CPU or memory use metrics. For example, you might scale the group based on a custom latency metric.
When you configure autoscaling with per-group metrics and you specify an instance assignment, your instance groups can scale down to 0 instances. If your metric indicates that there is no work for your instance group to complete, the group scales down to 0 instances until the metric detects that new work is available. In contrast to a per-group instance assignment, per-instance autoscaling requires resource utilization metrics from at least one instance, so the group cannot scale below a size of 1.
Filtering per-group metrics
You can apply filters to per-group Stackdriver Monitoring metrics, which lets you scale managed instance groups using individual values from metrics that have multiple streams or labels.
Per-group metric filtering requirements
Autoscaler filtering is compatible with the Stackdriver Monitoring filter syntax. The filters for per-group metrics must meet the following requirements:
- You can use only the
AND
operator for joining selectors. - You can't use the
=
direct equality comparison operator with any functions for each selector. - You can specify a metric type selector of
metric.type = "..."
in the filter and also include the originalmetric
field. Optionally, you can use only themetric
field. The metric must meet the following requirements:- The metric must be specified in at least in one place.
- The metric can be specified in both places, but must be equal.
- You must specify the
resource.type
selector, but you cannot set it togce_instance
if you want to scale using per-group metrics. - For best results, the filter should be specific enough to return a single time series for the group. If the filter returns multiple time series, they are added together.
Configuring autoscaling using per-group monitoring metrics
Use the Google Cloud Console, the
gcloud
command-line tool (beta),
or the
Compute Engine API (beta)
to configure autoscaling with per-group metrics for a single-zone managed
instance group.
Console
- Go to the Instance groups page.
- If you do not have an instance group, create one. Otherwise, click the name of an instance group to open the instance group details page. The instance group must be in a single zone.
- On the instance group details page, click the Edit Group button.
- Under Autoscaling, select On to enable autoscaling.
- In the Autoscale based on section, select Stackdriver monitoring metric.
- In the Metric export scope section, select Single time series per group.
- In the Metric identifier section, specify the metric name in the
following format:
example.googleapis.com/path/to/metric
. - Specify the Metric resource type.
- Provide an additional filter expression to use individual values from metrics that have multiple streams or labels. The filter must meet the autoscaler filtering requirements.
- In the Scaling policy section, select either Instance assignment
or Utilization target.
- If you select an instance assignment policy, then provide a Single
instance assignment value that represents the amount of work to assign
to each instance in the managed instance group. For example, specify
2
to assign two units of work to each instance. The autoscaler maintains enough instances to complete the available work (as indicated by the metric). If the metric value is10
and you assigned2
units of work to each instance, the autoscaler creates5
instances in the managed instance group. - If you select a utilization target policy:
- Provide a Utilization target value that represents the metric value that the autoscaler should try to maintain.
- Select the Utilization target type that represents the value type for the metric.
- If you select an instance assignment policy, then provide a Single
instance assignment value that represents the amount of work to assign
to each instance in the managed instance group. For example, specify
- Save your changes when you are ready.
gcloud
Create an autoscaler for a managed instance group similarly to the
per-instance autoscaler, but specify the
--update-stackdriver-metric
flag. You can specify how you want the
autoscaler to provision instances by including one of the following
flags:
- Instance assignment: Specify the
--stackdriver-metric-single-instance-assignment
flag. - Utilization target: Specify the
--stackdriver-metric-utilization-target
flag.
Instance assignment:
Specify a metric that you want to measure and specify the
--stackdriver-metric-single-instance-assignment
flag to indicate
the amount of work that you expect each instance to handle. You must also
specify a filter for the metric using the
--stackdriver-metric-filter
flag.
gcloud beta compute instance-groups managed set-autoscaling [GROUP_NAME] \
--zone=[ZONE] \
--max-num-replicas=[MAX_INSTANCES] \
--min-num-replicas=[MIN_INSTANCES] \
--update-stackdriver-metric='[METRIC_URL]' \
--stackdriver-metric-filter='[METRIC_FILTER]' \
--stackdriver-metric-single-instance-assignment=[INSTANCE_ASSIGNMENT]
where:
[GROUP_NAME]
is the name of the managed instance group where you want to add an autoscaler.[ZONE]
is the zone where the managed instance group is located. You cannot specify a region for autoscalers on per-group metrics.[MAX_INSTANCES]
is the limit on the number of instances that the autoscaler can add to the managed instance group.[MIN_INSTANCES]
is the limit on the minimum number of instances that the autoscaler can have in the managed instance group.[METRIC_URL]
is a protocol-free URL of a Monitoring metric.[METRIC_FILTER]
is a Stackdriver Monitoring filter where you specify a monitoring filter with a relevantTimeSeries
and aMonitoredResource
. The filter must meet the autoscaler filtering requirements.[INSTANCE_ASSIGNMENT]
is the amount of work to assign to each instance in the managed instance group. For example, specify2
to assign two units of work to each instance, or specify0.5
to assign half a unit of work to each instance. The autoscaler adds enough instances to the managed instance group to ensure that there are enough instances to complete the available work, which is indicated by the metric. If the metric value is10
and you've assigned0.5
units of work to each instance, the autoscaler provisions20
instances in the managed instance group.
Utilization target:
In some situations, you might want to use utilization targets with
per-group metrics rather than specify a number of instances relative
to the value of the metric that your autoscaler measures. You can
still point the autoscaler to a per-group metric, but the autoscaler
attempts to maintain the specified utilization target. Specify the target
and target type with the --stackdriver-metric-utilization-target
flag.
You must also specify a filter for the metric using the
--stackdriver-metric-filter
flag.
gcloud beta compute instance-groups managed set-autoscaling [GROUP_NAME] \
--zone=[ZONE] \
--max-num-replicas=[MAX_INSTANCES] \
--min-num-replicas=[MIN_INSTANCES] \
--update-stackdriver-metric='[METRIC_URL]' \
--stackdriver-metric-filter='[METRIC_FILTER]' \
--stackdriver-metric-utilization-target=[TARGET_VALUE] \
--stackdriver-metric-utilization-target-type=[TARGET_TYPE]
where:
[GROUP_NAME]
is the name of the managed instance group where you want to add an autoscaler.[ZONE]
is the zone where the managed instance group is located. You cannot specify a region for autoscalers on per-group metrics.[MAX_INSTANCES]
is the limit on the number of instances that the autoscaler can add to the managed instance group.[MIN_INSTANCES]
is the limit on the minimum number of instances that the autoscaler can have in the managed instance group.[METRIC_URL]
is a protocol-free URL of a Monitoring metric.[METRIC_FILTER]
is a Stackdriver Monitoring filter where you specify a monitoring filter with a relevantTimeSeries
and aMonitoredResource
. You must specify aresource.type
value, but you cannot specifygce_instance
if you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.[TARGET_VALUE]
is the metric value that the autoscaler attempts to maintain.[TARGET_TYPE]
is the value type for the metric. You can set the autoscaler to monitor the metric as aGAUGE
, by thedelta-per-minute
of the value, or by thedelta-per-second
of the value.
To see a list of available autoscaler gcloud
command-line tool commands and flags
that work with per-group autoscaling, see the
gcloud
command-line tool reference (beta).
API
Note: Although autoscaling is a feature of managed instance groups, [autoscalers](/compute/docs/reference/beta/autoscalers) are a separate API resource. Keep that in mind when you construct API requests for autoscaling.
Create an autoscaler for a managed instance group. You can specify how you want the autoscaler to provision instances by including one of the following parameters:
- Instance assignment: Specify the
singleInstanceAssignment
parameter. - Utilization target: Specify the
utilizationTarget
parameter.
Instance assignment:
In the API, make a POST
request to create an autoscaler.
In the request body, include the normal parameters that you would use to
create a per-instance autoscaler, but specify the
single-instance-assignment
parameter. The single-instance-assignment
parameter specifies the amount
of work that you expect each instance to handle.
POST https://compute.googleapis.com/compute/beta/projects/[PROJECT_ID]/zones/[ZONE]/autoscalers
{
"name": "example-autoscaler",
"target": "zones/[ZONE]/instanceGroupManagers/[GROUP_NAME]",
"autoscalingPolicy": {
"maxNumReplicas": [MAX_INSTANCES],
"minNumReplicas": [MIN_INSTANCES],
"customMetricUtilizations": [
{
"metric": "[METRIC_URL]",
"filter": "[METRIC_FILTER]",
"singleInstanceAssignment": [INSTANCE_ASSIGNMENT]
}
],
}
}
where:
[PROJECT_ID]
is your project ID.[ZONE]
is the zone where the managed instance group is located.[GROUP_NAME]
is the name of the managed instance group where you want to add an autoscaler.[MAX_INSTANCES]
is the limit on the number of instances that the autoscaler can add to the managed instance group.[MIN_INSTANCES]
is the limit on the minimum number of instances that the autoscaler can have in the managed instance group.[METRIC_URL]
is a protocol-free URL of a Monitoring metric.[METRIC_FILTER]
is a Stackdriver Monitoring filter where you specify a monitoring filter with a relevantTimeSeries
and aMonitoredResource
. You must specify aresource.type
value, but you cannot specifygce_instance
if you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.[INSTANCE_ASSIGNMENT]
is the amount of work to assign to each instance in the managed instance group. For example, specify2
to assign two units of work to each instance, or specify0.5
to assign half a unit of work to each instance. The autoscaler adds enough instances to the managed instance group to ensure that there are enough instances to complete the available work, which is indicated by the metric. If the metric value is10
and you've assigned0.5
units of work to each instance, the autoscaler provisions20
instances in the managed instance group.
Utilization target:
In some situations, you might want to use utilization targets with
per-group metrics rather than specify a number of instances relative
to the value of the metric that your autoscaler measures. You can
still point the autoscaler to a per-group metric, but the autoscaler
attempts to maintain the specified utilization target. Specify
those targets with the utilizationTarget
parameter. You must also
specify a filter for the metric using the filter
parameter.
POST https://compute.googleapis.com/compute/beta/projects/[PROJECT_ID]/zones/[ZONE]/autoscalers
{
"name": "example-autoscaler",
"target": "zones/[ZONE]/instanceGroupManagers/[GROUP_NAME]",
"autoscalingPolicy": {
"maxNumReplicas": [MAX_INSTANCES],
"minNumReplicas": [MIN_INSTANCES],
"customMetricUtilizations": [
{
"metric": "[METRIC_URL]",
"filter": "[METRIC_FILTER]",
"utilizationTarget": [TARGET_VALUE],
"utilizationTargetType": [TARGET_TYPE]
}
],
}
}
where:
[GROUP_NAME]
is the name of the managed instance group where you want to add an autoscaler.[ZONE]
is the zone where the managed instance group is located.[MAX_INSTANCES]
is the limit on the number of instances that the autoscaler can add to the managed instance group.[MIN_INSTANCES]
is the limit on the minimum number of instances that the autoscaler can have in the managed instance group.[METRIC_URL]
is a protocol-free URL of a Monitoring metric.[METRIC_FILTER]
is a Stackdriver Monitoring filter where you specify a monitoring filter with a relevantTimeSeries
and aMonitoredResource
. You must specify aresource.type
value, but you cannot specifygce_instance
if you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.[TARGET_VALUE]
is the metric value that the autoscaler attempts to maintain.[TARGET_TYPE]
is the value type for the metric. You can set the autoscaler to monitor the metric as aGAUGE
, by theDELTA_PER_MINUTE
of the value, or by theDELTA_PER_SECOND
of the value.
Example: Using instance assignment to scale based on a Pub/Sub queue
Assume the following setup:
- An active Pub/Sub topic receives messages from some source.
- An active Pub/Sub subscription is connected to the topic in a
pull configuration. The subscription is named
our-subscription
. - A pool of workers is pulling messages from that subscription and processing
them. The pool is a single-zone managed instance group named
our-instance-group
and is located in zoneus-central1-a
. The pool must not exceed 100 workers, and should scale down to 0 workers when there are no messages in the queue. - On average, a worker processes a single message in one minute.
To determine the optimal instance assignment value, consider several approaches:
- To process all messages in the queue as fast as possible, you can choose
1
as the instance assignment value. This creates one instance for each message in the queue (limited to the maximum number of instances in our group). However, this can cause overprovisioning. In the worst case, an instance is created to process just one message before the autoscaler shuts it down, which consumes resources for much longer than doing actual work.- Note that if the workers are able to process multiple messages concurrently, it makes sense to increase the value to the number of concurrent processes.
- Note that, in this example, it does not make sense to set the value below
1
because one message cannot be processed by more than one worker.
- Alternatively, if processing latency is less important than resource
utilization and overhead costs, you can calculate how many messages each
instance must process within its lifetime to be considered efficiently
utilized. Take into account startup and shutdown time and the fact that
autoscaling does not immediately delete instances. For example, assuming that
startup and shutdown time takes about 5 minutes in total and assuming that
autoscaling deletes instances only after a period of approximately 10 minutes,
you calculate that it is efficient to create an additional instance in the
group as long as it can process at least 15 messages before the autoscaler
shuts it down, which results in, at most, 25% overhead due to the total time
it takes to create, start, and shutdown the instance. In this case, you can
choose
15
as the instance assignment value. - Both approaches can be balanced out, resulting in a number between
1
and15
, depending on which factor takes priority, processing latency versus resource utilization.
Looking at the
available Pub/Sub metrics,
we find a metric that represents the subscription queue length:
subscription/num_undelivered_messages
.
Note that this metric exports the total number of messages in the queue, including messages that are currently being processed but that are not yet acknowledged. Using a metric that does not include the messages being processed is not recommended because such a metric can drop down to 0 when there is still work being done, which prompts autoscaling to scale down and possibly interrupt the actual work.
You can now configure autoscaling for the queue:
gcloud beta compute instance-groups managed set-autoscaling \
our-instance-group \
--zone=us-central1-a \
--max-num-replicas=100 \
--min-num-replicas=0 \
--update-stackdriver-metric=pubsub.googleapis.com/subscription/num_undelivered_messages \
--stackdriver-metric-filter="resource.type = pubsub_subscription AND resource.label.subscription_id = our-subscription" \
--stackdriver-metric-single-instance-assignment=15
Example: Using a utilization target to scale based on average latency
There might be a situation when the metric providing a relevant signal does not represent a total amount of available work or another resource applicable to the group, as in the previous example, but instead an average, a percentile, or some other statistical property. For this example, assume you will scale based on the group's average processing latency.
Assume the following setup:
- A managed instance group named
our-instance-group
is assigned to perform a particular task. The group is located in zoneus-central1-a
. - You have a
Stackdriver Monitoring custom metric
that exports a value that you would like to maintain at a particular level. For
this example, assume the metric represents the average latency of processing
queries assigned to the group.
- The custom metric is named:
custom.googleapis.com/example_average_latency
. - The custom metric has a label with a key named
group_name
and value equal to the instance group's name,our-instance-group
. - The custom metric exports data for the global Monitored Resource, that is, it is not associated with any specific instance.
- The custom metric is named:
You have determined that when the metric value goes above some specific value,
you need to add more instances to the group to handle the load, while when it
goes below that value, you can free up some resources. Autoscaling gradually
adds or removes instances at a rate that is proportional to how much the metric
is above or below the target. For this example, assume that the calculated
target value is 100
.
You can now configure autoscaling for the group using a per-group utilization
target of 100
, which represents the metric value that the autoscaler must
attempt to maintain:
gcloud beta compute instance-groups managed set-autoscaling \
our-instance-group \
--zone=us-central1-a \
--max-num-replicas=100 \
--min-num-replicas=0 \
--update-stackdriver-metric=custom.googleapis.com/example_average_latency \
--stackdriver-metric-filter "resource.type = global AND metric.label.group_name = our-instance-group" \
--stackdriver-metric-utilization-target=100 \
--stackdriver-metric-utilization-target-type=delta-per-second