Deploy a per-project or centralized Autoscaler tool for Spanner


This tutorial shows you how to set up the infrastructure of the Autoscaler tool for Spanner. This tutorial demonstrates two ways that you can set up the infrastructure, according to your requirements:

  • A per-project deployment topology. We recommend this topology for independent teams who want to manage their own Autoscaler configuration and infrastructure. A per-project deployment topology is also a good starting point for testing the capabilities of Autoscaler.
  • A centralized deployment topology. We recommend this topology for teams who manage the configuration and infrastructure of one or more Spanner instances while keeping the components and configuration for Autoscaler in a central place. In the centralized topology, in addition to an Autoscaler project, you set up a second project, which in this tutorial is referred to as the Application project. The Application project holds the application resources, including Spanner. You set up and enable billing and APIs for these two projects separately in this tutorial.

This document is part of a series:

This series is intended for IT, Operations, and Site Reliability Engineering (SRE) teams who want to reduce operational overhead and to optimize the cost of Spanner deployments.

Objectives

  • Deploy Autoscaler using a per-project or centralized deployment topology.
  • Import existing Spanner instances into Terraform state.
  • Configure Autoscaler.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

The costs associated with the operation of Autoscaler components when you implement this tutorial should be zero or close to zero. However, this estimate does not include the costs for the Spanner instances. For an example of how to calculate the costs of Spanner instances, see Autoscaling Spanner.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.

Before you begin

  1. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

  2. In Cloud Shell, clone the following GitHub repository:

    git clone https://github.com/cloudspannerecosystem/autoscaler
    
  3. Export variables for the working directories where the Terraform configuration files for each topology reside:

    export AUTOSCALER_DIR="$(pwd)/autoscaler/terraform/cloud-functions/per-project"
    

Preparing the Autoscaler project

In this section, you prepare your Autoscaler project for deployment.

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Google Cloud project.

  3. Enable the Identity and Access Management (IAM), Resource Manager, App Engine Admin, Firestore, Spanner, Pub/Sub, Cloud Functions, Cloud Build, and Cloud Scheduler APIs.

    Enable the APIs

  4. In Cloud Shell, set environment variables with the ID of your Autoscaler project:

    export PROJECT_ID=INSERT_YOUR_PROJECT_ID
    gcloud config set project "${PROJECT_ID}"
    
  5. Set the region and zone and App Engine location (for Cloud Scheduler and Firestore for Autoscaler infrastructure:

    export REGION=us-central1
    export ZONE=us-central1-c
    export APP_ENGINE_LOCATION=us-central
    
  6. Create a service account for Terraform to use to create all the resources in your infrastructure:

    gcloud iam service-accounts create terraformer --display-name "Terraform service account"
    
  7. Give the project owner role (roles/owner) to the service account:

    gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
        --member "serviceAccount:terraformer@${PROJECT_ID}.iam.gserviceaccount.com" \
        --role roles/owner
    
  8. Create a service account key file:

    gcloud iam service-accounts keys create \
        --iam-account "terraformer@${PROJECT_ID}.iam.gserviceaccount.com" "${AUTOSCALER_DIR}/key.json"
    
  9. If your project does not have a Firestore instance yet, create one:

    gcloud app create --region="${APP_ENGINE_LOCATION}"
    gcloud alpha firestore databases create --region="${APP_ENGINE_LOCATION}"
    

Preparing the Application project

If you are deploying Autoscaler in per-project mode, you can skip to Deploying Autoscaler.

In the centralized deployment topology all the components of Autoscaler reside in the same project. The Spanner instances can be located in different projects.

In this section, you configure the Application project where your Spanner instance resides. The Spanner instance serves one or more specific applications. In this tutorial, the teams responsible for these applications are assumed to be separate from the team responsible for the Autoscaler infrastructure and configuration.

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Google Cloud project.

  3. Enable the Spanner API.

    Enable the API

  4. In Cloud Shell, set the environment variables:

    export APP_PROJECT_ID=INSERT_YOUR_APP_PROJECT_ID
    

    Replace INSERT_YOUR_APP_PROJECT_ID with the ID of the Application project.

  5. Give the terraformer service account that you created the owner role (roles/owner) in the Application project:

    gcloud projects add-iam-policy-binding "${APP_PROJECT_ID}" \
        --member "serviceAccount:terraformer@${PROJECT_ID}.iam.gserviceaccount.com" \
        --role roles/owner
    

    Granting this role to the service account enables it to create resources.

  6. Set the Application project ID in the corresponding Terraform environment variable:

    export TF_VAR_app_project_id="${APP_PROJECT_ID}"
    

Deploying the Autoscaler

In this section, you deploy the components that make up Autoscaler using pre-configured Terraform modules. The Terraform files that define these modules are in the following directories:

Directory Directory contents
terraform/ Top-level configuration, which includes each of the deployment options and the reusable modules.
terraform/cloud-functions/per-project/ Instructions for the per-project deployment option.
terraform/modules/autoscaler-functions/ Configuration of the Poller and Scaler Cloud Functions, and Pub/Sub topics
terraform/modules/scheduler/ Configuration of Cloud Scheduler for triggering polling.
terraform/modules/spanner/ Configuration of the Spanner database
terraform/cloud-functions/centralized/ Instructions for the centralized deployment option.
  1. In Cloud Shell, set the project ID, region, and zone in the corresponding Terraform environment variables:

    export TF_VAR_project_id="${PROJECT_ID}"
    export TF_VAR_region="${REGION}"
    export TF_VAR_zone="${ZONE}"
    
  2. In this step, you set up an existing instance for the Autoscaler to monitor, or create and set up a new instance.

    If you have an existing Spanner instance, set the name of your instance in the following variable:

    export TF_VAR_spanner_name=INSERT_YOUR_SPANNER_INSTANCE_NAME
    

    If you want to create a new Spanner instance for testing Autoscaler, set the following variable:

    export TF_VAR_terraform_spanner=true
    

    The Spanner instance that Terraform creates is named autoscale-test.

    For more information about how to set up Terraform to manage your Spanner instance, see Importing your Spanner instances.

  3. Change your working directory into the Terraform per-project directory:

    cd "${AUTOSCALER_DIR}"
    terraform init
    

    This command also initializes the Terraform per-project directory.

  4. Import the existing App Engine application into Terraform state:

    terraform import module.scheduler.google_app_engine_application.app "${PROJECT_ID}"
    
  5. Create the Autoscaler infrastructure:

    terraform apply -parallelism=2
    

    You see the following message asking you to verify that the list of resources for Terraform to create is correct:

       Do you want to perform these actions?
       Terraform will perform the actions described above.
       Only 'yes' will be accepted to approve.
       Enter a value:
       

    After you verify the resources, type yes when prompted.

    When you run this command in Cloud Shell, you might encounter the following error message:

    "Error: cannot assign requested address"

    This error is a known issue in the Terraform Google provider. In this case, retry with the following command: terraform apply -parallelism=1.

Importing your Spanner instances

If you have existing Spanner instances that you want to import for Terraform to manage them, follow the instructions in this section. Otherwise, skip to Configuring Autoscaler.

  1. In Cloud Shell, list your Spanner instances:

    gcloud spanner instances list
    
  2. Set the following variable with the instance name that you want to be autoscaled:

    SPANNER_INSTANCE_NAME=YOUR_SPANNER_INSTANCE_NAME
    
  3. Create a Terraform configuration file with an empty google_spanner_instance resource:

    echo "resource \"google_spanner_instance\" \"${SPANNER_INSTANCE_NAME}\" {}" > "${SPANNER_INSTANCE_NAME}.tf"
    
  4. Import the Spanner instance into the Terraform state:

    terraform import "google_spanner_instance.${SPANNER_INSTANCE_NAME}" "${SPANNER_INSTANCE_NAME}"
    
  5. When the import completes, update the Terraform configuration file for your instance with the actual instance attribute:

    terraform state show -no-color "google_spanner_instance.${SPANNER_INSTANCE_NAME}" \
        | grep -vE "(id|num_nodes|state|timeouts).*(=|\{)" \
        > "${SPANNER_INSTANCE_NAME}.tf"
    

    If you have additional Spanner instances to import, repeat the importing process.

Configuring the Autoscaler

After you deploy Autoscaler, you configure its parameters.

  1. In the Google Cloud console, go to the Cloud Scheduler page.

    Go to Cloud Scheduler

  2. Select the checkbox next to the poll-main-instance-metrics job that was created by the Autoscaler deployment.

  3. Click Edit.

  4. Modify the parameters for the Autoscaler shown in the payload field.

    The following is an example of a payload:

        [
            {
                "projectId": "my-spanner-project",
                "instanceId": "spanner1",
                "scalerPubSubTopic": "projects/my-spanner-project/topics/spanner-scaling",
                "units": "NODES",
                "minSize": 1,
                "maxSize": 3
            },{
                "projectId": "different-project",
                "instanceId": "another-spanner1",
                "scalerPubSubTopic": "projects/my-spanner-project/topics/spanner-scaling",
                "units": "PROCESSING_UNITS",
                "minSize": 500,
                "maxSize": 3000,
                "scalingMethod": "DIRECT"
            }
        ]
       

    The payload is defined using a JSON array. Each element in the array represents a Spanner instance that shares the same Autoscaler job schedule.

    For more details about the parameters and their default values, see the README for the Poller component.

  5. To save the changes, click Update.

    The Autoscaler is now configured and ready to start monitoring and scaling your instances in the next scheduled job run.

    If there are syntax errors in your JSON payload, you can examine them in the Google Cloud console on the Logs Explorer page as log entries from the tf-poller-function function.

    Go to Logs Explorer

    The following is an example of an error that you might see:

    SyntaxError: Unexpected token errortext in JSON at position 15 JSON.parse

    To avoid syntax errors, use an editor that can reformat and validate JSON.

Monitoring the Autoscaler

In this step, you set up monitoring on the Poller and Scaler Cloud Functions.

  1. In the Google Cloud console, open the Logs Explorer page.

    Go to Logs Explorer

  2. Click Query preview and enter the following filter into Query builder:

    resource.type="cloud_function"
    resource.labels.function_name=~"tf-.*-function"
    
  3. Click Run Query.

    Under Query results, you can see all the messages from Autoscaler functions. As the poller only runs every 2 minutes, you may need to re-run the query to receive the log messages.

  4. To only see messages from the Scaler Cloud Functions, click the Query preview box and replace the previous filter in the Query builder text box with the following:

    resource.type="cloud_function"
    resource.labels.function_name="tf-scaler-function"
    
  5. Click Run Query.

    Under Query results, because of the filter applied to the text payload, you see only the messages from the Scaler function related to scaling suggestions and decisions.

    Using filter query or similar filters you can create logs-based metrics. These metrics are useful for functions such as recording the frequency of autoscaling events, or in Cloud Monitoring charts and alerting policies.

Testing the Autoscaler

In this section, you verify the operation of Autoscaler by changing the minimum instance size and monitoring the logs.

When you deploy Autoscaler with a test database, Autoscaler is configured to use NODES as the unit for compute capacity. You can verify whether the tool is functioning by changing the setting for the minimum size (minSize) to 2. If the tool is running as expected, the Spanner instance scales out to 2 nodes. If you used an existing database for this tutorial, you might see different values.

  1. In the Google Cloud console, go to the Cloud Scheduler page.

    Go to Cloud Scheduler

  2. Select the checkbox next to the poll-main-instance-metrics job that was created by the Autoscaler deployment.

  3. Click Edit.

  4. In the Job payload field, change the minSize value from 1 to 2:

    "minSize": 2
    
  5. To save the changes, click Update.

  6. Go to the Logs Explorer page.

    Open Logs Explorer

  7. Click Query preview and enter the following filter into Query builder:

    resource.type="cloud_function"
    resource.labels.function_name="tf-scaler-function"
    
  8. Click Run Query.

  9. Click Jump to Now until you see the following log message:

    Scaling spanner instance to 2 NODES

  10. To verify that Spanner has scaled out to 2 nodes, in the Google Cloud console go to the Spanner console page.

    Go to Spanner

  11. Click the autoscale-test instance.

    Under overview, verify that the number of nodes is now 2. This quick test demonstrates a scaling out event by modifying Autoscaler parameters. You can perform a load test with a tool such as YCSB to simulate Autoscaler triggering a scaling event based on utilization.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next