Scaling based on Cloud Monitoring metrics

This document describes how to scale a managed instance group (MIG) based on Monitoring metrics.

You can also scale a MIG based on the CPU utilization or the serving capacity of an external HTTP(S) load balancer.

When you scale a MIG based on Monitoring metrics, you can scale based on the following metric types:

  • Scale using per-instance metrics where the selected metric provides data for each virtual machine (VM) instance in the MIG indicating resource utilization.
  • Scale using per-group metrics where the group scales based on a metric that provides a value related to the whole managed instance group.

These metrics can be either standard metrics provided by the Cloud Monitoring service, or custom Cloud Monitoring metrics that you create.

Limitations

Scaling based on Cloud Monitoring metrics are restricted by the limitations for all autoscalers as well as the following limitation:

  • You cannot autoscale based on Cloud Monitoring logs-based metrics.

Before you begin

Per-instance metrics

Per-instance metrics provide data for each VM in a MIG separately, indicating resource utilization for each instance. When using per-instance metrics, the MIG cannot scale below a size of 1 VM because the autoscaler requires metrics about at least one running VM in order to operate.

If you need to scale using Cloud Monitoring metrics that aren't specific to individual VMs or if you need to scale your MIG down to zero VMs from time to time, you can configure your MIG to scale using per-group metrics instead.

Standard per-instance metrics

Cloud Monitoring has a set of standard metrics that you can use to monitor your VMs. However, not all standard metrics are a valid utilization metric that the autoscaler can use.

A valid utilization metric for scaling meets the following criteria:

  • The standard metric must contain data for a gce_instance monitored resource. You can use the timeSeries.list API call to verify whether a specific metric exports data for this resource.

  • The standard metric describes how busy an instance is, and the metric value increases or decreases proportionally to the number of VMs in the group.

The following metric is invalid because the value does not change based on usage, and the autoscaler can't use the value to scale proportionally:

compute.googleapis.com/instance/cpu/reserved_cores

After you select a standard metric you want to use for your autoscaler, you can configure autoscaling using that metric.

Custom metrics

You can create custom metrics using Cloud Monitoring and write your own monitoring data to the Monitoring service. This gives you side-by-side access to standard Google Cloud data and your custom monitoring data, with a familiar data structure and consistent query syntax. If you have a custom metric, you can choose to scale based on the data from these metrics.

Prerequisites

To use custom metrics, you must first do the following:

  • Create a custom metric. For information about creating a custom metric, see Using custom metrics.
  • Set up your MIG to export the custom metric from all VMs in the group.

Choose a valid custom metric

Not all custom metrics can be used by the autoscaler. To choose a valid custom metric, the metric must have all of the following properties:

  • The metric must be a per-instance metric. The metric must export data that is relevant to each specific Compute Engine VM instance separately.
  • The exported per-instance values must be associated with a gce_instance monitored resource, which contains the following labels:
    • zone with the name of the zone the instance is in.
    • instance_id with the value of unique numerical ID assigned to the VM.
  • The metric must export data at least every 60 seconds. If you export data more often than every 60 seconds, the autoscaler can respond to load changes more quickly. If you export your data less than every 60 seconds, the autoscaler might not respond to load changes quickly enough.
  • The metric must be a valid utilization metric, which means that data from the metric can be used to proportionally scale out or in the number of VMs.
  • The metric must export int64 or double data values.

For autoscaler to work with your custom metric, you must export data for this custom metric from all the VMs in the MIG.

Configuring autoscaling using per-instance monitoring metrics

The process of setting up an autoscaler for a standard or custom metric is the same. To create an autoscaler that uses Cloud Monitoring metrics, you must provide the metric identifier, the desired target utilization level, and the utilization target type. Each of these properties is described below:

  • Metric identifier: The name of the metric to use. If you use a custom metric, you defined this name when you created the metric. The identifier has the following format:

    custom.googleapis.com/path/to/metric
    

    See Using custom metrics for more information about creating, browsing, and reading metrics.

  • Target utilization level: The level that the autoscaler must maintain. This must be a positive number. For example, both 24.5 and 1100 are acceptable values. Note that this is different from CPU and load balancing, which must be a float value between 0.0 and 1.0.

  • Target type: How the autoscaler computes the data collected from the instances. The possible target types are:

    • GAUGE. The autoscaler computes the average value of the data collected in the last couple minutes and compares that to the target utilization value of the autoscaler.
    • DELTA_PER_MINUTE. The autoscaler calculates the average rate of growth per minute and compares that to the target utilization.
    • DELTA_PER_SECOND. The autoscaler calculates the average rate of growth per second and compares that to the target utilization.

    For accurate comparisons, if you set the target utilization in seconds, use DELTA_PER_SECOND as the autoscaler target type. Likewise, use DELTA_PER_MINUTE for a target utilization in minutes.

Console

  1. In the Google Cloud Console, go to the Instance groups page.

    Go to Instance groups

  2. If you do not have a managed instance group, create one. Otherwise, click the name of a MIG from the list to open the instance group overview page.

  3. On the instance group overview page, click Edit.

  4. If no autoscaling configuration exists, under Autoscaling, click Configure autoscaling.

  5. Under Autoscaling mode, select Autoscale to enable autoscaling.

  6. In the Autoscaling policy section, if an existing metric exists, click it to edit it, or click Add new metric to add another metric.

  7. Set the Metric type to Stackdriver Monitoring metric.

  8. In the Metric export scope section, select Time series per instance to configure autoscaling using per-instance metrics.

  9. In the Metric identifier section, enter the metric name in the following format: example.googleapis.com/path/to/metric.

  10. In the Additional filter expression for metric and resource labels section:

    • For a zonal MIG, optionally enter a filter to use individual values from metrics with multiple streams or labels. For more information, see Filtering per-instance metrics.
    • For a regional MIG, leave this section blank.
  11. In the Utilization target section, specify the target value.

  12. In the Utilization target type section, verify that the target type corresponds to the metric's kind of measurement.

  13. Save your changes when you are finished.

gcloud

For example, in the gcloud command-line tool, the following command creates an autoscaler that uses the GAUGE target type. Along with the --custom-metric-utilization parameter, the --max-num-replicas parameter is also required when creating an autoscaler:

gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
    --custom-metric-utilization metric=example.googleapis.com/path/to/metric,utilization-target-type=GAUGE,utilization-target=10 \
    --max-num-replicas 20 \
    --cool-down-period 90 \
    --region us-west1

You can use the --cool-down-period flag to tell the autoscaler how long it takes for your application to initialize. Specifying an accurate cool down period improves autoscaler decisions. For example, when scaling out, the autoscaler ignores data from VMs that are still initializing because those VMs might not yet represent normal usage of your application. The default cool down period is 60 seconds.

To see a full list of available commands and flags for the gcloud tool, see the gcloud reference.

API

In the API, make a POST request to the following URL, replacing myproject with your own project ID and us-central1-f with the zone of your choice:

POST https://compute.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers/

Your request body must contain the name, target, and autoscalingPolicy fields. In autoscalingPolicy, provide the maxNumReplicas and the customMetricUtilizations properties.

You can use the coolDownPeriodSec field to tell the autoscaler how long it takes for your application to initialize. Specifying an accurate cool down period improves autoscaler decisions. For example, when scaling out, the autoscaler ignores data from VMs that are still initializing because those VMs might not yet represent normal usage of your application. The default cool down period is 60 seconds.

POST https://compute.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers

{
 "name": "example-autoscaler",
 "target": "zones/us-central1-f/instanceGroupManagers/example-managed-instance-group",
 "autoscalingPolicy": {
  "maxNumReplicas": 10,
  "coolDownPeriodSec": 90,
  "customMetricUtilizations": [
   {
    "metric": "example.googleapis.com/some/metric/name",
    "utilizationTarget": 10,
    "utilizationTargetType": "GAUGE"
   }
  ]
 }
}

Filtering per-instance metrics

You can apply filters to per-instance Cloud Monitoring metrics, which lets you scale MIGs using individual values from metrics with multiple streams or labels.

Per-instance metric filtering requirements

Autoscaler filtering is compatible with the Cloud Monitoring filter syntax with some limitations. The filters for per-instance metrics must meet the following requirements:

  • You can use only the AND operator for joining selectors.
  • You can use only the = direct equality comparison operator, but you cannot use the operator with any functions. For example, you cannot use the startswith() function with the = comparison operator.
  • You must wrap the value of a filter in double quotes, for example: metric.labels.state = "used".
  • You cannot use wildcards.
  • You must not set the resource.type or resource.labels.* selectors. Per-instance metrics always use all of the instance resources from the group.
  • For best results, create a filter that is specific enough to return a single time series for each instance. If the filter returns multiple time series, they are added together.

Configuring autoscalers to filter metrics

Use the Google Cloud Console, the gcloud command-line tool, or the Compute Engine API to add metric filters for autoscaling of a MIG.

Console

The process for creating an autoscaler that filters a per-instance metric is similar to creating a normal per-instance autoscaler, but you also specify a metric filter. For example, the compute.googleapis.com/instance/network/received_bytes_count metric includes the instance_name and loadbalanced labels. To filter based on the loadbalanced boolean value:

  1. In the Cloud Console, go to the Instance groups page.

    Go to Instance groups

  2. If you do not have a MIG, create one. Otherwise, click the name of a MIG to open the instance group overview page.

  3. On the instance group overview page, click Edit.

  4. If no autoscaling configuration exists, under Autoscaling, click Configure autoscaling.

  5. Under Autoscaling mode, select Autoscale to enable autoscaling.

  6. In the Autoscaling metrics section, if an existing metric exists, click it to edit it, or click Add new metric to add another metric.

  7. In the Metric type section, select Stackdriver Monitoring metric.

  8. In the Metric export scope section, select Time series per instance to configure autoscaling using per-instance metrics.

  9. In the Metric identifier section, enter the metric name. For example, compute.googleapis.com/instance/network/received_bytes_count.

  10. In the Additional filter expression for metric and resource labels section, enter a filter. For example, 'metric.labels.loadbalanced = true' .

  11. Save your changes when you are finished.

gcloud

To add metric filters for a zonal MIG, use the set-autoscaling command or, for a regional MIG, use the beta set-autoscaling command.

The process for creating an autoscaler that filters a per-instance metric is similar to creating a normal per-instance autoscaler, but you must specify a metric filter and individual flags for the utilization target and target type. For example, the compute.googleapis.com/instance/network/received_bytes_count metric includes the instance_name and loadbalanced labels. To filter based on the loadbalanced boolean, specify the --stackdriver-metric-filter filter flag with the 'metric.labels.loadbalanced = true' value. Include the utilization target and target type flags individually.

gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
    --update-stackdriver-metric=compute.googleapis.com/instance/network/received_bytes_count \
    --stackdriver-metric-utilization-target=10 \
    --stackdriver-metric-utilization-target-type=delta-per-second \
    --stackdriver-metric-filter='metric.labels.loadbalanced = true' \
    --max-num-replicas 20 \
    --cool-down-period 90 \
    --zone us-central1-f

This example configures autoscaling to use only the loadbalanced traffic data as part of the utilization target.

To see a list of available gcloud commands and flags, see the gcloud tool reference.

API

To add metric filters for a zonal MIG, use the autoscalers resource or, for a regional MIG, use the beta regionAutoscalers resource.

The process for creating an autoscaler that filters a per-instance metric is similar to creating a normal per-instance autoscaler, but you must specify a metric filter and individual flags for the utilization target and target type. For example, the compute.googleapis.com/instance/network/received_bytes_count metric includes the instance_name and loadbalanced labels. To filter based on the loadbalanced boolean value, specify the filter parameter with the "metric.labels.loadbalanced = true" value.

In the API, make a POST request to the following URL, replacing myproject with your own project ID and us-central1-f with the zone of your choice. The request body must contain the name, target, and autoscalingPolicy fields. In autoscalingPolicy, provide the maxNumReplicas and the customMetricUtilizations properties.

POST https://compute.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers

{
 "name": "example-autoscaler",
 "target": "zones/us-central1-f/instanceGroupManagers/example-managed-instance-group",
 "autoscalingPolicy": {
  "maxNumReplicas": 10,
  "coolDownPeriodSec": 90,
  "customMetricUtilizations": [
   {
    "metric": "compute.googleapis.com/instance/network/received_bytes_count",
    "filter": "metric.labels.loadbalanced = true",
    "utilizationTarget": 10,
    "utilizationTargetType": "DELTA_PER_SECOND"
   }
  ]
 }
}

This example configures autoscaling to use only the loadbalanced traffic data as part of the utilization target.

Per-group metrics

Per-group metrics allow autoscaling with a standard or custom metric that does not export per-instance utilization data. Instead, the group scales based on a value that applies to the whole group and corresponds to how much work is available for the group or how busy the group is. The group scales based on the fluctuation of that group metric value and the configuration that you define.

When you configure autoscaling on per-group metrics, you must indicate how you want the autoscaler to provision instances relative to the metric:

  • Instance assignment: Specify an instance assignment to indicate that you want the autoscaler to add or remove VMs depending on how much work is available to assign to each VM. Specify a value for this parameter that represents how much work you expect each VM to handle. For example, specify 2 to assign two units of work to each VM, or specify 0.5 to assign half a unit of work to each VM. The autoscaler scales the MIG to ensure that there are enough VMs to complete the available work as indicated by the metric. If the metric value is 10 and you assigned 0.5 units of work to each VM, the autoscaler creates 20 VM in the MIG. Scaling with instance assignment allows the group to shrink to 0 VMs when the metric value drops down to 0—and back up again when it rises above 0. The following diagram shows the proportional relationship between metric value and number of VMs when scaling with an instance assignment policy. The proportional relationship between metric value and number of instances.
  • Utilization target: Specify a utilization target to indicate that you want the autoscaler to add or remove VMs to try and maintain the metric at a specified value. When the metric is above the specified target, autoscaler gradually adds VMs until the metric decreases to the target value. When the metric is below the specified target value, autoscaler gradually removes VMs until the metric increases to the target value. Scaling with a utilization target cannot shrink the group to 0 VMs. The following diagram shows how autoscaler adds and removes VMs in response to a metric value to maintain a utilization target. Autoscaler adding and removing VMs to maintain a target utilization.

Each option has the following use cases:

  • Instance assignment: Scale the size of your MIG based on the number of unacknowledged messages in a Pub/Sub subscription or a total QPS rate of a network endpoint.
  • Utilization target: Scale the size of your MIG based on a utilization target for a custom metric that does not come from the standard per-instance CPU or memory use metrics. For example, you might scale the group based on a custom latency metric.

When you configure autoscaling with per-group metrics and you specify an instance assignment, your MIG can scale in to 0 VMs. If your metric indicates that there is no work for your group to complete, the group scales in to 0 VMs until the metric detects that new work is available. In contrast to scaling based on per-group metrics, per-instance autoscaling requires resource utilization metrics from at least one VM, so the group cannot scale below a size of 1.

Filtering per-group metrics

You can apply filters to per-group Cloud Monitoring metrics, which lets you scale MIGs using individual values from metrics that have multiple streams or labels.

Per-group metric filtering requirements

Autoscaler filtering is compatible with the Cloud Monitoring filter syntax with some limitations. The filters for per-group metrics must meet the following requirements:

  • You can use only the AND operator for joining selectors.
  • You can't use the = direct equality comparison operator with any functions for each selector. For example, you cannot use the startswith() function with the = comparison operator.
  • You cannot use wildcards.
  • You must wrap the value of a filter in double quotes, for example: metric.labels.state = "used".
  • You can specify a metric type selector of metric.type = "..." in the filter and also include the original metric field. Optionally, you can use only the metric field. The metric must meet the following requirements:
    • The metric must be specified in at least in one place.
    • The metric can be specified in both places, but must be equal.
  • You must specify the resource.type selector, but you cannot set it to gce_instance if you want to scale using per-group metrics.
  • For best results, the filter should be specific enough to return a single time series for the group. If the filter returns multiple time series, they are added together.

Configuring autoscaling using per-group monitoring metrics

Use the Google Cloud Console, the gcloud command-line tool, or the Compute Engine API to configure autoscaling with per-group metrics.

Console

  1. In the Cloud Console, go to the Instance groups page.

    Go to Instance groups

  2. If you do not have a managed instance group, create one. Otherwise, click the name of a MIG to open the instance group overview page.

  3. On the instance group overview page, click Edit.

  4. If no autoscaling configuration exists, under Autoscaling, click Configure autoscaling

  5. Under Autoscaling mode, select Autoscale to enable autoscaling.

  6. In the Autoscaling metrics section, if an existing metric exists, click it to edit it, or click Add new metric to add another metric.

  7. Set the Metric type to Stackdriver Monitoring metric.

  8. In the Metric export scope section, select Single time series per group.

  9. In the Metric identifier section, specify the metric name in the following format: example.googleapis.com/path/to/metric.

  10. Specify the Metric resource type.

  11. If you want to use individual values from metrics that have multiple streams or labels, provide an Additional filter expression for metric and resource labels. The filter must meet the autoscaler filtering requirements.

  12. In the Scaling policy section, select either Instance assignment or Utilization target.

    • If you select an instance assignment policy, then provide a Single instance assignment value that represents the amount of work to assign to each VM instance in the MIG. For example, specify 2 to assign two units of work to each VM. The autoscaler maintains enough VMs to complete the available work (as indicated by the metric). If the metric value is 10 and you assigned 2 units of work to each VM, the autoscaler creates 5 VMs in the MIG.
    • If you select a utilization target policy:
      • Provide a Utilization target value that represents the metric value that the autoscaler should try to maintain.
      • Select the Utilization target type that represents the value type for the metric.
  13. Save your changes when you are finished.

gcloud

To configure autoscaling using per-group monitoring metrics for a zonal MIG, use the set-autoscaling command or, for a regional MIG, use the beta set-autoscaling command.

In the command, specify the --update-stackdriver-metric flag to provide the monitoring metric URL. You must also specify how you want the autoscaler to provision instances by including one of the following flags:

  • Instance assignment: Specify the --stackdriver-metric-single-instance-assignment flag.
  • Utilization target: Specify the --stackdriver-metric-utilization-target flag.

Instance assignment:

Specify a metric that you want to measure and specify the --stackdriver-metric-single-instance-assignment flag to indicate the amount of work that you expect each instance to handle. You must also specify a filter for the metric using the --stackdriver-metric-filter flag.

The following command creates an autoscaler based on a per-group metric using the --stackdriver-metric-single-instance-assignment flag.

gcloud compute instance-groups managed set-autoscaling GROUP_NAME \
    --max-num-replicas=MAX_INSTANCES \
    --min-num-replicas=MIN_INSTANCES \
    --update-stackdriver-metric='METRIC_URL' \
    --stackdriver-metric-filter='METRIC_FILTER' \
    --stackdriver-metric-single-instance-assignment=INSTANCE_ASSIGNMENT \
    [--zone=ZONE | --region=REGION]

Replace the following:

  • GROUP_NAME: The name of the MIG where you want to add an autoscaler.
  • MAX_INSTANCES: The maximum number of VMs that the MIG can have.
  • MIN_INSTANCES: The minimum number of VMs that the MIG can have.
  • METRIC_URL: A protocol-free URL of a Monitoring metric.
  • METRIC_FILTER: A Cloud Monitoring filter where you specify a monitoring filter with a relevant TimeSeries and a MonitoredResource. The filter must meet the autoscaler filtering requirements.
  • INSTANCE_ASSIGNMENT: The amount of work to assign to each VM instance in the MIG. For example, specify 2 to assign two units of work to each VM, or specify 0.5 to assign half a unit of work to each VM. The autoscaler scales the MIG to ensure that there are enough VMs to complete the available work, which is indicated by the metric. If the metric value is 10 and you've assigned 0.5 units of work to each VM, the autoscaler provisions 20 VMs in the MIG.
  • ZONE: For zonal MIGs, the zone where the MIG is located.
  • REGION: For regional MIGs, the region where the MIG is located.

Utilization target:

In some situations, you might want to use utilization targets with per-group metrics rather than specify a number of VMs relative to the value of the metric that your autoscaler measures. You can still point the autoscaler to a per-group metric, but the autoscaler attempts to maintain the specified utilization target. Specify the target and target type with the --stackdriver-metric-utilization-target flag. You must also specify a filter for the metric using the --stackdriver-metric-filter flag.

The following command creates an autoscaler based on a per-group metric using the --stackdriver-metric-utilization-target flag.

gcloud compute instance-groups managed set-autoscaling GROUP_NAME \
    --max-num-replicas=MAX_INSTANCES \
    --min-num-replicas=MIN_INSTANCES \
    --update-stackdriver-metric='METRIC_URL' \
    --stackdriver-metric-filter='METRIC_FILTER' \
    --stackdriver-metric-utilization-target=TARGET_VALUE \
    --stackdriver-metric-utilization-target-type=TARGET_TYPE \
    [--zone=ZONE | --region=REGION]

Replace the following:

  • GROUP_NAME: The name of the MIG where you want to add an autoscaler.
  • MAX_INSTANCES: The maximum number of VMs that the MIG can have.
  • MIN_INSTANCES: The minimum number of VMs that the MIG can have.
  • METRIC_URL: A protocol-free URL of a Monitoring metric.
  • METRIC_FILTER: A Cloud Monitoring filter where you specify a monitoring filter with a relevant TimeSeries and a MonitoredResource. You must specify a resource.type value, but you cannot specify gce_instance if you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.
  • TARGET_VALUE: The metric value that the autoscaler attempts to maintain.
  • TARGET_TYPE: The value type for the metric. You can set the autoscaler to monitor the metric as a GAUGE, by the delta-per-minute of the value, or by the delta-per-second of the value.
  • ZONE: For zonal MIGs, the zone where the MIG is located.
  • REGION: For regional MIGs, the region where the MIG is located.

To see a list of available autoscaler gcloud command-line tool commands and flags that work with per-group autoscaling, see the gcloud command-line tool reference.

API

To configure autoscaling using per-group monitoring metrics for a zonal MIG, use the autoscalers resource or, for a regional MIG, use the beta regionAutoscalers resource.

Specify how you want the autoscaler to provision instances by including one of the following parameters:

  • Instance assignment: Specify the singleInstanceAssignment parameter.
  • Utilization target: Specify the utilizationTarget parameter.

Instance assignment:

The singleInstanceAssignment parameter specifies the amount of work that you expect each VM to handle.

For example, make the following call to create an autoscaler that scales a zonal MIG based on the instance assignment of a per-group metric.

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/autoscalers

{
 "name": "example-autoscaler",
 "target": "zones/ZONE/instanceGroupManagers/GROUP_NAME",
 "autoscalingPolicy": {
  "maxNumReplicas": MAX_INSTANCES,
  "minNumReplicas": MIN_INSTANCES,
  "customMetricUtilizations": [
    {
      "metric": "METRIC_URL",
      "filter": "METRIC_FILTER",
      "singleInstanceAssignment": INSTANCE_ASSIGNMENT
    }
  ],
 }
}

Replace the following:

  • PROJECT_ID: Your project ID.
  • ZONE: The zone where the MIG is located.
  • GROUP_NAME: The name of the MIG where you want to add an autoscaler.
  • MAX_INSTANCES: The maximum number of VMs that the MIG can have.
  • MIN_INSTANCES: The minimum number of VMs that the MIG can have.
  • METRIC_URL: A protocol-free URL of a Monitoring metric.
  • METRIC_FILTER: A Cloud Monitoring filter where you specify a monitoring filter with a relevant TimeSeries and a MonitoredResource. You must specify a resource.type value, but you cannot specify gce_instance if you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.
  • INSTANCE_ASSIGNMENT: The amount of work to assign to each VM instance in the MIG. For example, specify 2 to assign two units of work to each VM, or specify 0.5 to assign half a unit of work to each VM. The autoscaler scales the MIG to ensure that there are enough VMs to complete the available work, which is indicated by the metric. If the metric value is 10 and you've assigned 0.5 units of work to each VM, the autoscaler provisions 20 VM in the MIG.

Utilization target:

In some situations, you might want to use utilization targets with per-group metrics rather than specify a number of VMs relative to the value of the metric that your autoscaler measures. You can still point the autoscaler to a per-group metric, but the autoscaler attempts to maintain the specified utilization target. Specify those targets with the utilizationTarget parameter. You must also specify a filter for the metric using the filter parameter.

For example, make the following call to create an autoscaler that scales a zonal MIG based on the utilization target of a per-group metric.

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/autoscalers

    {
     "name": "example-autoscaler",
     "target": "zones/ZONE/instanceGroupManagers/GROUP_NAME",
     "autoscalingPolicy": {
      "maxNumReplicas": MAX_INSTANCES,
      "minNumReplicas": MIN_INSTANCES,
      "customMetricUtilizations": [
        {
          "metric": "METRIC_URL",
          "filter": "METRIC_FILTER",
          "utilizationTarget": TARGET_VALUE,
          "utilizationTargetType": TARGET_TYPE
        }
      ],
     }
    }

Replace the following:

  • PROJECT_ID: Your project ID.
  • ZONE: The zone where the MIG is located.
  • GROUP_NAME: The name of the MIG where you want to add an autoscaler.
  • MAX_INSTANCES: The maximum number of VMs that the MIG can have.
  • MIN_INSTANCES: The minimum number of VMs that the MIG can have.
  • METRIC_URL: A protocol-free URL of a Monitoring metric.
  • METRIC_FILTER: A Cloud Monitoring filter where you specify a monitoring filter with a relevant TimeSeries and a MonitoredResource. You must specify a resource.type value, but you cannot specify gce_instance if you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.
  • TARGET_VALUE: The metric value that the autoscaler attempts to maintain.
  • TARGET_TYPE: The value type for the metric. You can set the autoscaler to monitor the metric as a GAUGE, by the DELTA_PER_MINUTE of the value, or by the DELTA_PER_SECOND of the value.

Example: Using instance assignment to scale based on a Pub/Sub queue

Assume the following setup:

  • An active Pub/Sub topic receives messages from some source.
  • An active Pub/Sub subscription is connected to the topic in a pull configuration. The subscription is named our-subscription.
  • A pool of workers is pulling messages from that subscription and processing them. The pool is a zonal MIG named our-instance-group and is located in zone us-central1-a. The pool must not exceed 100 workers, and should scale in to 0 workers when there are no messages in the queue.
  • On average, a worker processes a single message in one minute.

To determine the optimal instance assignment value, consider several approaches:

  • To process all messages in the queue as fast as possible, you can choose 1 as the instance assignment value. This creates one VM instance for each message in the queue (limited to the maximum number of VMs in our group). However, this can cause overprovisioning. In the worst case, a VM is created to process just one message before the autoscaler shuts it down, which consumes resources for much longer than doing actual work.
    • Note that if the workers are able to process multiple messages concurrently, it makes sense to increase the value to the number of concurrent processes.
    • Note that, in this example, it does not make sense to set the value below 1 because one message cannot be processed by more than one worker.
  • Alternatively, if processing latency is less important than resource utilization and overhead costs, you can calculate how many messages each VM must process within its lifetime to be considered efficiently utilized. Take into account startup and shutdown time and the fact that autoscaling does not immediately delete VMs. For example, assuming that startup and shutdown time takes about 5 minutes in total and assuming that autoscaling deletes VMs only after a period of approximately 10 minutes, you calculate that it is efficient to create an additional VM in the group as long as it can process at least 15 messages before the autoscaler shuts it down, which results in, at most, 25% overhead due to the total time it takes to create, start, and shutdown the VM. In this case, you can choose 15 as the instance assignment value.
  • Both approaches can be balanced out, resulting in a number between 1 and 15, depending on which factor takes priority, processing latency versus resource utilization.

Looking at the available Pub/Sub metrics, we find a metric that represents the subscription queue length: subscription/num_undelivered_messages.

Note that this metric exports the total number of messages in the queue, including messages that are currently being processed but that are not yet acknowledged. Using a metric that does not include the messages being processed is not recommended because such a metric can drop down to 0 when there is still work being done, which prompts autoscaling to scale in and possibly interrupt the actual work.

You can now configure autoscaling for the queue:

gcloud compute instance-groups managed set-autoscaling \
    our-instance-group \
    --zone=us-central1-a \
    --max-num-replicas=100 \
    --min-num-replicas=0 \
    --update-stackdriver-metric=pubsub.googleapis.com/subscription/num_undelivered_messages \
    --stackdriver-metric-filter="resource.type = pubsub_subscription AND resource.labels.subscription_id = our-subscription" \
    --stackdriver-metric-single-instance-assignment=15

Example: Using a utilization target to scale based on average latency

There might be a situation when the metric providing a relevant signal does not represent a total amount of available work or another resource applicable to the group, as in the previous example, but instead an average, a percentile, or some other statistical property. For this example, assume you will scale based on the group's average processing latency.

Assume the following setup:

  • A MIG named our-instance-group is assigned to perform a particular task. The group is located in zone us-central1-a.
  • You have a Cloud Monitoring custom metric that exports a value that you would like to maintain at a particular level. For this example, assume the metric represents the average latency of processing queries assigned to the group.
    • The custom metric is named: custom.googleapis.com/example_average_latency.
    • The custom metric has a label with a key named group_name and value equal to the MIG's name, our-instance-group.
    • The custom metric exports data for the global monitored resource, that is, it is not associated with any specific VM.

You have determined that when the metric value goes above some specific value, you need to add more VMs to the group to handle the load, while when it goes below that value, you can free up some resources. Autoscaling gradually adds or removes VMs at a rate that is proportional to how much the metric is above or below the target. For this example, assume that the calculated target value is 100.

You can now configure autoscaling for the group using a per-group utilization target of 100, which represents the metric value that the autoscaler must attempt to maintain:

gcloud compute instance-groups managed set-autoscaling \
    our-instance-group \
    --zone=us-central1-a \
    --max-num-replicas=100 \
    --min-num-replicas=0 \
    --update-stackdriver-metric=custom.googleapis.com/example_average_latency \
    --stackdriver-metric-filter "resource.type = global AND metric.labels.group_name = our-instance-group" \
    --stackdriver-metric-utilization-target=100 \
    --stackdriver-metric-utilization-target-type=delta-per-second

What's next