Autoscaler tool overview

This page introduces the Autoscaler tool for Spanner (Autoscaler), an open source tool that you can use as a companion tool to Spanner. This tool lets you automatically increase or reduce the compute capacity in one or more Spanner instances based on how much capacity is in use.

For more information about scaling in Spanner, see Autoscaling Spanner. For information about deploying the Autoscaler tool, see the following:

This page presents the features, architecture, configuration, and deployment topologies of the Autoscaler. The topics that continue this series guide you through the deployment of Autoscaler in each of the different topologies.

Autoscaler

The Autoscaler tool is useful for managing the utilization and performance of your Spanner deployments. To help you to balance cost control with performance needs, the Autoscaler tool monitors your instances and automatically adds or removes nodes or processing units to help ensure that they stay within the following parameters:

Autoscaling Spanner deployments enables your infrastructure to automatically adapt and scale to meet load requirements with little to no intervention. Autoscaling also right-sizes the provisioned infrastructure, which can help you to reduce costs.

Architecture

This section describes the components of Autoscaler and their respective purposes in more detail.

The Autoscaler tool architecture consists of Cloud Scheduler , two Pub/Sub topics, two Cloud Run functions , and Firestore . The Cloud Monitoring API is used to obtain CPU utilization and storage metrics for Spanner instances.

Cloud Scheduler

Using Cloud Scheduler , you define how often the Autoscaler tool verifies your Spanner instances scaling metrics thresholds. A Cloud Scheduler job can check single or multiple instances at the same time. You can define as many job schedules as you require.

Poller Cloud Run function

The Poller Cloud Run function is responsible for collecting and processing the time-series metrics for one or more Spanner instances. The Poller preprocesses the metrics data for each Spanner instance so that only the most relevant data points are evaluated and sent to the Scaler Cloud Run function. The preprocessing done by the Poller Cloud Run function also simplifies the process of evaluating thresholds for regional, dual-region, and multi-regional Spanner instances.

Scaler Cloud Run function

The Scaler Cloud Run function evaluates the data points received from the Poller Cloud Run function and determines whether you need to adjust the number of nodes or processing units and if so, by how much. The Cloud Run function compares the metric values to the threshold, plus or minus an allowed margin , and adjusts the number of nodes or processing units based on the configured scaling method. For more details on scaling methods, see Autoscaler features .

Operational flow

This section details the operational model of the Autoscaler tool, as shown in the following architectural diagram.

Autoscaler operational model.

  1. You define the schedule, time, and frequency of your autoscaling jobs in Cloud Scheduler.
  2. On the schedule that you define, Cloud Scheduler pushes a message containing a JSON payload with the Autoscaler tool configuration parameters for one or more Spanner instances into the Polling Pub/Sub topic.
  3. When the message is published into the Polling topic, an instance of the Poller Cloud Run function is created to handle the message.
  4. The Poller Cloud Run function reads the message payload and queries the Cloud Monitoring API to retrieve the utilization metrics for each Spanner instance.
  5. For each Spanner instance enumerated in the message, the Poller function pushes one message into the Scaling Pub/Sub topic, containing the metrics and configuration parameters to assess for the specific Spanner instance.
  6. For each message pushed into the Scaler topic, the Scaler Cloud Run function does the following:

    1. Compares the Spanner instance metrics against the configured thresholds, plus or minus a configurable margin.

    You can configure the margin yourself, or use the default value. 1. Determines whether the instance should be scaled. 1. Calculates the number of nodes or processing units that the instance should be scaled to based on the chosen scaling method.

  7. The Scaler Cloud Run function retrieves the time when the instance was last scaled from Firestore and compares it with the current time, to determine if scaling up or down is allowed based on the cooldown periods.

  8. If the configured cooldown period has passed, the Scaler Cloud Run function sends a request to the Spanner Instance to scale up or down.

Throughout the flow, the Autoscaler tool writes a summary of its recommendations and actions to Cloud Logging for tracking and auditing.

Regardless of the deployment topology that you choose, the overall operation of the Autoscaler tool remains the same.

Autoscaler features

This section describes the main features of the Autoscaler tool.

Manage multiple instances

The Autoscaler tool is able to manage multiple Spanner instances across multiple projects. Multi-regional, dual-region, and regional instances all have different utilization thresholds that are used when scaling. For example, multi-regional and dual-region deployments are scaled at 45% high-priority CPU utilization, whereas regional deployments are scaled at 65% high-priority CPU utilization, both plus or minus an allowed margin. For more information on the different thresholds for scaling, see Alerts for high CPU utilization.

Independent configuration parameters

Each autoscaled Spanner instance can have one or more polling schedules. Each polling schedule has its own set of configuration parameters.

These parameters determine the following factors:

  • The minimum and maximum number of nodes or processing units that control how small or large your instance can be, helping you to control costs.
  • The scaling method used to adjust your Spanner instance specific to your workload.
  • The cooldown periods to let Spanner manage data splits.

Different scaling methods for different workloads

The Autoscaler tool provides three different scaling methods for up and down scaling your Spanner instances: stepwise, linear, and direct. Each method is designed to support different types of workloads. You can apply one or more methods to each Spanner instance being autoscaled when you create independent polling schedules.

Stepwise

Stepwise scaling is useful for workloads that have small or multiple peaks. It provisions capacity to smooth them all out with a single autoscaling event.

The following chart shows a load pattern with multiple load plateaus or steps, where each step has multiple small peaks. This pattern is well suited for the stepwise method.

Load pattern with multiple steps.

When the load threshold is crossed, this method provisions and removes nodes or processing units using a fixed but configurable number. For example, three nodes are added or removed for each scaling action. By changing the configuration, you can allow for larger increments of capacity to be added or removed at any time.

Linear

Linear scaling is best used with load patterns that change more gradually or have a few large peaks. The method calculates the minimum number of nodes or processing units required to keep utilization below the scaling threshold. The number of nodes or processing units added or removed in each scaling event is not limited to a fixed step amount.

The sample load pattern in the following chart shows larger sudden increases and decreases in load. These fluctuations are not grouped in discernible steps as they are in the previous chart. This pattern is more easily handled using linear scaling.

Load pattern with fluctuations.

The Autoscaler tool uses the ratio of the observed utilization over the utilization threshold to calculate whether to add or subtract nodes or processing units from the current total number.

The formula to calculate the new number of nodes or processing units is as follows:

newSize = currentSize * currentUtilization / utilizationThreshold

Direct

Direct scaling provides an immediate increase in capacity. This method is intended to support batch workloads where a predetermined higher node count is periodically required on a schedule with a known start time. This method scales the instance up to the maximum number of nodes or processing units specified in the schedule, and is intended to be used in addition to a linear or stepwise method.

The following chart depicts the large planned increase in load, which Autoscaler pre-provisioned capacity for using the direct method.

Load pattern with direct scaling pre-provisioned.

Once the batch workload has completed and utilization returns to normal levels, depending on your configuration, either linear or stepwise scaling is applied to scale the instance down automatically.

Deployment methods

The Autoscaler tool can be deployed either in an individual project or alongside the Spanner instances it manages. The Autoscaler tool is designed to allow for flexibility and it can accommodate the existing separation of responsibilities between your operation and application teams. The responsibility to configure the autoscaling of Spanner instances can be centralized with a single operations team, or it can be distributed to the teams closer to the applications served by those Spanner instances.

The different deployment models are discussed in more detail in Deployment topologies.

Serverless for ease of deployment and management

The Autoscaler tool is built using only serverless and low management Google Cloud tools, such as Cloud Run functions, Pub/Sub, Cloud Scheduler, and Firestore. This approach minimizes the cost and operational overhead of running the Autoscaler tool.

By using built-in Google Cloud tools, the Autoscaler tool can take full advantage of IAM (IAM) for authentication and authorization.

Configuration

The Autoscaler tool has different configuration options that you can use to manage the scaling of your Spanner deployments. The next sections describe the base configuration options and more advanced configuration options.

Base configuration

The Autoscaler tool manages Spanner instances through the configuration defined in Cloud Scheduler. If multiple Spanner instances need to be polled with the same interval, we recommend that you configure them in the same Cloud Scheduler job. The configuration of each instance is represented as a JSON object. The following is an example of a configuration where two Spanner instances are managed with one Cloud Scheduler job:

   [
    {
        "projectId": "my-spanner-project", "instanceId": "spanner1",
        "scalerPubSubTopic": "projects/my-spanner-project/topics/spanner-scaling",
        "units": "NODES", "minSize": 1, "maxSize": 3
     },
     {
        "projectId":
        "different-project", "instanceId": "another-spanner1", "scalerPubSubTopic":
        "projects/my-spanner-project/topics/spanner-scaling", "units":
        "PROCESSING_UNITS", "minSize": 500, "maxSize": 3000, "scalingMethod": "DIRECT"
    }
   ]

Spanner instances can have multiple configurations on different Cloud Scheduler jobs. For example, an instance can have one Autoscaler configuration with the linear method for normal operations, but also have another Autoscaler configuration with the direct method for planned batch workloads.

When the Cloud Scheduler job runs, it sends a Pub/Sub message to the Polling Pub/Sub topic. The payload of this message is the JSON array of the configuration objects for all the instances configured in the same job. See the complete list of configuration options in the Poller README file.

Advanced configuration

The Autoscaler tool has advanced configuration options that let you more finely control when and how your Spanner instances are managed. The following sections introduce a selection of these controls.

Custom thresholds

The Autoscaler tool determines the number of nodes or processing units to be added or subtracted to an instance using the recommended Spanner thresholds for the following load metrics:

  • High priority CPU
  • 24-hour rolling average CPU
  • Storage utilization

We recommend that you use the default thresholds as described in Creating alerts for Spanner metrics . However, in some cases you might want to modify the thresholds used by the Autoscaler tool. For example, you could use lower thresholds to make the Autoscaler tool react more quickly than for higher thresholds. This modification helps to prevent alerts being triggered at higher thresholds.

Custom metrics

While the default metrics in the Autoscaler tool address most performance and scaling scenarios, there are some instances when you might need to specify your own metrics used for determining when to scale in and out. For these scenarios, you define custom metrics in the configuration using the metrics property.

Margins

A margin defines an upper and a lower limit around the threshold. The Autoscaler tool only triggers an autoscaling event if the value of the metric is more than the upper limit or less than the lower limit.

The objective of this parameter is to avoid autoscaling events being triggered for small workload fluctuations around the threshold, reducing the amount of fluctuation in Autoscaler actions. The threshold and margin together define the following range, according to what you want the metric value to be:

[threshold - margin, threshold + margin]
. The smaller the margin, the narrower the range, resulting in a higher probability that an autoscaling event is triggered.

Specifying a margin parameter for a metric is optional, and it defaults to five percentage points both preceding and below the parameter.

Deployment topologies

To deploy the Autoscaler tool, decide which of the following topologies is best to fulfill your technical and operational needs:

  • Per-project topology: The Autoscaler infrastructure is deployed in the same project as Spanner that needs to be autoscaled.
  • Centralized topology: The Autoscaler tool is deployed in one project and manages one or more Spanner instances in different projects.
  • Distributed topology:: Most of the Autoscaler infrastructure is deployed in one project but some infrastructure components are deployed with the Spanner instances being autoscaled in different projects.

Per-project topology

In a per-project topology deployment, each project with a Spanner instance needing to be autoscaled also has its own independent deployment of the Autoscaler components. We recommend this topology for independent teams who want to manage their own Autoscaler configuration and infrastructure. It's also a good starting point for testing the capabilities of the Autoscaler tool.

The following diagram shows a high-level conceptual view of a per-project deployment.

Conceptual per-project deployment.

The per-project deployments depicted in the preceding diagram have these characteristics:

  • Two applications, Application 1 and Application 2, each use their own Spanner instances.
  • Spanner instances (A) live in respective Application 1 and Application 2 projects.
  • An independent Autoscaler (B) is deployed into each project to control the autoscaling of the instances within a project.

For a more detailed diagram of a per-project deployment, see the Architecture section.

A per-project deployment has the following advantages and disadvantages.

Advantages:

  • Simplest design: The per-project topology is the simplest design of the three topologies since all the Autoscaler components are deployed alongside the Spanner instances that are being autoscaled.
  • Configuration: The control over scheduler parameters belongs to the team that owns the Spanner instance, which gives the team more freedom to adapt the Autoscaler tool to its needs than a centralized or distributed topology.
  • Clear boundary of infrastructure responsibility: The design of a per-project topology establishes a clear boundary of responsibility and security over the Autoscaler infrastructure because the team owner of the Spanner instances is also the owner of the Autoscaler infrastructure.

Disadvantages:

  • More overall maintenance: Each team is responsible for the Autoscaler configuration and infrastructure so it might become difficult to make sure that all of the Autoscaler tools across the company follow the same update guidelines.
  • More complex audit: Because each team has a high level of control, a centralized audit may become more complex.

To learn how to set up Autoscaler using a per-project topology, see Deploy a per-project or centralized Autoscaler tool for Spanner.

Centralized topology

As in the per-project topology, in a centralized topology deployment all of the components of the Autoscaler tool reside in the same project. However, the Spanner instances are located in different projects. This deployment is suited for a team managing the configuration and infrastructure of several Spanner instances from a single deployment of the Autoscaler tool in a central place.

The following diagram shows a high-level conceptual view of a centralized-project deployment:

Conceptual centralized-project deployment.

The centralized deployment shown in the preceding diagram has the following characteristics:

  • Two applications, Application 1 and Application 2, each use their own Spanner instances.
  • Spanner instances (A) are in respective Application 1 and Application 2 projects.
  • Autoscaler (B) is deployed into a separate project to control the autoscaling of the Spanner instances in both the Application 1 and Application 2 projects.

For a more detailed diagram of a centralized-project deployment, see Deploy a per-project or centralized Autoscaler tool for Spanner.

A centralized deployment has the following advantages and disadvantages.

Advantages:

  • Centralized configuration and infrastructure: A single team controls the scheduler parameters and the Autoscaler infrastructure. This approach can be useful in heavily regulated industries.
  • Less overall maintenance: Maintenance and setup are generally less effort to maintain compared to a per-project deployment.
  • Centralized policies and audit: Best practices across teams might be easier to specify and enact. Audits might be easier to execute.

Disadvantages:

  • Centralized configuration: Any change to the Autoscaler parameters needs to go through the centralized team, even though the team requesting the change owns the Spanner instance.
  • Potential for additional risk: The centralized team itself might become a single point of failure even if the Autoscaler infrastructure is designed with high availability in mind.

For a step-by-step tutorial to set up the Autoscaler tool using this option, see the Deploy a per-project or centralized Autoscaler tool for Spanner.

Distributed topology

In a distributed topology deployment, the Cloud Scheduler and Spanner instances that need to be autoscaled reside in the same project. The remaining components of the Autoscaler tool reside in a centrally managed project. This deployment is a hybrid deployment. Teams that own the Spanner instances manage only the Autoscaler configuration parameters for their instances, and a central team manages the remaining Autoscaler infrastructure.

The following diagram shows a high-level conceptual view of a distributed-project deployment.

Conceptual distributed-project deployment.

The hybrid deployment depicted in the preceding diagram has the following characteristics:

  • Two applications, Application 1 and Application 2, use their own Spanner instances.
  • The Spanner instances (A) are in both Application 1 and Application 2 projects.
  • An independent Cloud Scheduler component (C) is deployed into each project: Application 1 and Application 2.
  • The remaining Autoscaler components (B) are deployed into a separate project.
  • The Autoscaler tool autoscales the Spanner instances in both the Application 1 and Application 2 projects using the configurations sent by the independent Cloud Scheduler components in each project.

For a more detailed diagram of the centralized-project deployment, see Deploy a distributed Autoscaler tool for Spanner.

A distributed deployment has the following advantages and disadvantages.

Advantages:

  • Application teams control configuration and schedules: Cloud Scheduler is deployed alongside the Spanner instances that are being autoscaled, giving application teams more control over configuration and scheduling.
  • Operations team controls infrastructure: Core components of the Autoscaler tool are centrally deployed giving operations teams control over the Autoscaler infrastructure.
  • Centralized maintenance: Scaler infrastructure is centralized, reducing overhead.

Disadvantages:

  • More complex configuration: Application teams need to provide service accounts to write to the polling topic.
  • Potential for additional risk: The shared infrastructure might become a single point of failure even if the infrastructure is designed with high availability in mind.

To learn how to set up the Autoscaler tool in a distributed deployment, see Deploy a distributed Autoscaler tool for Spanner.

Data splits

Spanner assigns ranges of data called splits to nodes or subdivisions of a node called processing units. The node or processing units independently manage and serve the data in the apportioned splits. Data splits are created based on several factors, including data volume and access patterns. For more details, see Spanner - schema and data model .

Data is organized into splits and Spanner automatically manages the splits. So, when the Autoscaler tool adds or removes nodes or processing units, it needs to allow the Spanner backend sufficient time to reassign and reorganize the splits as new capacity is added or removed from instances.

The Autoscaler tool uses cooldown periods on both scale-up and scale-down events to control how quickly it can add or remove nodes or processing units from an instance. This method allows the instance the necessary time to reorganize the relationships between compute notes or processing units and data splits. By default, the scale-up and scale-down cooldown periods are set to the following minimum values:

  • Scale-up value: 5 minutes
  • Scale-down value: 30 minutes

For more information about scaling recommendations and cooldown periods, see Scaling Spanner Instances.

Costs

The Autoscaler tool resource consumption is minimal so for most use cases, costs are negligible. There is zero cost when Autoscaler is used on Google Cloud. For example , running an Autoscaler tool to manage 3 Spanner instances with a polling interval of 5 minutes for each instance is available at no cost. This estimate includes the following:

  • 3 Cloud Scheduler Jobs
  • 0.15 GB of Pub/Sub messages
  • 51840 Cloud Run function 500ms invocations
  • Less than 10 MB of data in Firestore

The estimate does not include the Spanner database operation costs. Use the Pricing Calculator to generate a cost estimate based on your projected usage.

What's next