Scaling Based on a Queue-based Workload

This document describes how to set up autoscaling based on the workload of a queuing system, such as Google Cloud Pub/Sub.

Services like Google Cloud Pub/Sub allow you to send and receive messages between independent applications. In these queue-based systems, a publisher application sends messages to a topic. Subscriber applications create a subscription to the topic to receive messages from the publisher application. In a busy system, messages for a single topic might queue up while the subscriber application is still processing previous messages.

To dynamically speed up message processing, you can enable queue-based autoscaling so that the system will automatically add or remove VM instances to and from the subscriber application. In this way, your application can speed up processing messages when there is a long queue (because there are now more workers), or to save costs by removing instances when the queue is short. For common queue-based workload scenarios, see the Pub/Sub documentation.

For example, if you set the queue-based autoscaling target to 10, then the autoscaler will ensure that the number of tasks in the queue does not exceed 10 per instance; the autoscaler adds and removes instances from the instance group to maintain this target. In addition, autoscaler will accommodate for steady incoming task traffic.

Currently, this feature supports scaling based only on Google Cloud Pub/Sub queue.

Before you begin

Queue-based scaling specifications

Queue-based autoscaling has the following semantics:

  • The autoscaler will aim to make sure the number of tasks in the queue divided by the number of workers (autoscaled instances) is never above the specified target, while maintaining the maxNumReplicas and minNumReplicas in the instance group.

  • Effectively, specifying a target of 1 will result in the group being autoscaled whenever there’s any work in the queue not actively being processed.

  • Specifying less than 1 is not allowed, because the available count of messages includes the ones currently being processed.

  • If you know how long a worker processes tasks on average, you can calculate the approximate acceptable backlog number using this formula:

    acceptable backlog = acceptable latency / average time to process a task
    

Limitations

  • Currently, only topics with a constant message flow (at least 1 per minute) are supported. This issue is being addressed in future releases.
  • Queue-based scaling setup is only available through the gcloud command-line tool and the Alpha API. it is not yet available in the Cloud Platform Console.
  • Queue-based scaling does not support regional managed instance groups.

Setting up queue-based autoscaling

gcloud

Run the following command to set up queue-based autoscaling:

gcloud alpha compute instance-groups managed set-autoscaling \
    [MANAGED_INSTANCE_GROUP] --zone [ZONE] \
    --min-num-replicas 0 --max-num-replicas [MAX_NUM_REPLICAS] \
    --queue-scaling-cloud-pub-sub \
     topic=[TOPIC_NAME],subscription=[SUBSCRIPTION_NAME] \
    --queue-scaling-acceptable-backlog-per-instance [NUMBER_OF_ACCEPTABLE_BACKLOG]

where:

  • [MANAGED_INSTANCE_GROUP] is the group you want to scale.
  • [ZONE] is the zone for the managed instance group.
  • [MAX_NUM_REPLICAS] is the maximum number of instances that the managed instance group can have.
  • [TOPIC_NAME] is the name of the Pub/Sub topic, without the projects/.../... prefix.
  • [SUBSCRIPTION_NAME] is the name of the Pub/Sub subscription, without the projects/.../... prefix.
  • [NUMBER_OF_ACCEPTABLE_BACKLOG] is the target number of acceptable tasks in the queue per virtual manchine instance, as an integer. For example, if the number is 5, then the autoscaler will make sure that there are enough virtual machine instances in the managed instance group to handle 5 tasks per virtual machine instance.

API

Make a POST request to the following URI to create an autoscaler:

POST https://www.googleapis.com/compute/v1/projects/[PROJECT]/zones/[ZONE]/autoscalers/

Your request body must contain the autoscaler name, target (the managed instance group), and autoscalingPolicy fields. autoscalingPolicy must define queueBasedScaling.

{
  "autoscalingPolicy": {
    "minNumReplicas": 0,
    "maxNumReplicas": [MAX_NUM_REPLICAS],
    "queueBasedScaling": {
      "acceptableBacklogPerInstance": [NUMBER_OF_ACCEPTABLE_BACKLOG],
      "cloudPubSub": {
        "topic": [TOPIC_NAME],
        "subscription": [SUBSCRIPTION_NAME]
      }
    }
  },
  "name": [AUTOSCALER_NAME],
  "target": [URL_TO_MANAGED_INSTANCE_GROUP],
  "zone": [ZONE]
}

where:

  • [MAX_NUM_REPLICAS] is the maximum number of instances the autoscaler can scale to.
  • [NUMBER_OF_ACCEPTABLE_BACKLOG] is the target number of acceptable tasks in the queue per virtual manchine instance as an integer. For example, if the number is 5, then the autoscaler will make sure that there are enough virtual machine instances in the managed instance group to handle 5 tasks per virtual machine instance.
  • [TOPIC_NAME] name is the name of the Pub/Sub topic, without the projects/.../... prefix.
  • [SUBSCRIPTION_NAME] name is the name of the Pub/Sub subscription, without the projects/.../... prefix.
  • [URL_TO_MANAGED_INSTANCE_GROUP] is the URL to the group you want to scale. For example, zones/[ZONE]/instanceGroupManagers/[MANAGED_INSTANCE_GROUP].
  • [AUTSOCALER_NAME] is the name you want to give this autoscaler.
  • [ZONE] is the zone for this autoscaler (must be the same zone as the managed instance group).

What's next

Send feedback about...

Compute Engine Documentation