Deploy a per-project or centralized Autoscaler tool for Spanner

This tutorial shows you how to set up the infrastructure of the Autoscaler tool for Spanner. This tutorial demonstrates two ways that you can set up the infrastructure, according to your requirements:

A per-project deployment topology. We recommend this topology for independent teams who want to manage their own Autoscaler configuration and infrastructure. A per-project deployment topology is also a good starting point for testing the capabilities of Autoscaler.
A centralized deployment topology. We recommend this topology for teams who manage the configuration and infrastructure of one or more Spanner instances while keeping the components and configuration for Autoscaler in a central place. In the centralized topology, in addition to an Autoscaler project, you set up a second project, which in this tutorial is referred to as the Application project. The Application project holds the application resources, including Spanner. You set up and enable billing and APIs for these two projects separately in this tutorial.

This document is part of a series:

Autoscaling Spanner
Deploy a per-project or centralized Autoscaler tool for Spanner (this document)
Deploy a distributed Autoscaler tool for Spanner

This series is intended for IT, Operations, and Site Reliability Engineering (SRE) teams who want to reduce operational overhead and to optimize the cost of Spanner deployments.

Objectives

Deploy Autoscaler using a per-project or centralized deployment topology.
Import existing Spanner instances into Terraform state.
Configure Autoscaler.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

The costs associated with the operation of Autoscaler components when you implement this tutorial should be zero or close to zero. However, this estimate does not include the costs for the Spanner instances. For an example of how to calculate the costs of Spanner instances, see Autoscaling Spanner.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.

Before you begin

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

In Cloud Shell, clone the following GitHub repository:

git clone https://github.com/cloudspannerecosystem/autoscaler

Export variables for the working directories where the Terraform configuration files for each topology reside:
```
export AUTOSCALER_DIR="$(pwd)/autoscaler/terraform/cloud-functions/per-project"
```

Preparing the Autoscaler project

In this section, you prepare your Autoscaler project for deployment.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector
Make sure that billing is enabled for your Google Cloud project.
Enable the Identity and Access Management (IAM), Resource Manager, App Engine Admin, Firestore, Spanner, Pub/Sub, Cloud Run functions, Cloud Build, and Cloud Scheduler APIs.
Enable the APIs

In Cloud Shell, set environment variables with the ID of your Autoscaler project:

export PROJECT_ID=INSERT_YOUR_PROJECT_ID
gcloud config set project "${PROJECT_ID}"

Set the region and zone and App Engine location (for Cloud Scheduler and Firestore for Autoscaler infrastructure:
```
export REGION=us-central1
export ZONE=us-central1-c
export APP_ENGINE_LOCATION=us-central
```
Create a service account for Terraform to use to create all the resources in your infrastructure:
```
gcloud iam service-accounts create terraformer --display-name "Terraform service account"
```

Give the project owner role (roles/owner) to the service account:

gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
    --member "serviceAccount:terraformer@${PROJECT_ID}.iam.gserviceaccount.com" \
    --role roles/owner

Create a service account key file:

gcloud iam service-accounts keys create \
    --iam-account "terraformer@${PROJECT_ID}.iam.gserviceaccount.com" "${AUTOSCALER_DIR}/key.json"

If your project does not have a Firestore instance yet, create one:

gcloud app create --region="${APP_ENGINE_LOCATION}"
gcloud alpha firestore databases create --region="${APP_ENGINE_LOCATION}"

Preparing the Application project

If you are deploying Autoscaler in per-project mode, you can skip to Deploying Autoscaler.

In the centralized deployment topology all the components of Autoscaler reside in the same project. The Spanner instances can be located in different projects.

In this section, you configure the Application project where your Spanner instance resides. The Spanner instance serves one or more specific applications. In this tutorial, the teams responsible for these applications are assumed to be separate from the team responsible for the Autoscaler infrastructure and configuration.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector
Make sure that billing is enabled for your Google Cloud project.
Enable the Spanner API.
Enable the API
In Cloud Shell, set the environment variables:
```
export APP_PROJECT_ID=INSERT_YOUR_APP_PROJECT_ID
```
Replace INSERT_YOUR_APP_PROJECT_ID with the ID of the Application project.

Give the terraformer service account that you created the owner role (roles/owner) in the Application project:

gcloud projects add-iam-policy-binding "${APP_PROJECT_ID}" \
    --member "serviceAccount:terraformer@${PROJECT_ID}.iam.gserviceaccount.com" \
    --role roles/owner

Granting this role to the service account enables it to create resources.

Set the Application project ID in the corresponding Terraform environment variable:
```
export TF_VAR_app_project_id="${APP_PROJECT_ID}"
```

Deploying the Autoscaler

In this section, you deploy the components that make up Autoscaler using pre-configured Terraform modules. The Terraform files that define these modules are in the following directories:

Directory	Directory contents
`terraform/`	Top-level configuration, which includes each of the deployment options and the reusable modules.
`terraform/cloud-functions/per-project/`	Instructions for the per-project deployment option.
`terraform/modules/autoscaler-functions/`	Configuration of the Poller and Scaler Cloud Run functions, and Pub/Sub topics
`terraform/modules/scheduler/`	Configuration of Cloud Scheduler for triggering polling.
`terraform/modules/spanner/`	Configuration of the Spanner database
`terraform/cloud-functions/centralized/`	Instructions for the centralized deployment option.

In Cloud Shell, set the project ID, region, and zone in the corresponding Terraform environment variables:

export TF_VAR_project_id="${PROJECT_ID}"
export TF_VAR_region="${REGION}"
export TF_VAR_zone="${ZONE}"

In this step, you set up an existing instance for the Autoscaler to monitor, or create and set up a new instance.

If you have an existing Spanner instance, set the name of your instance in the following variable:
```
export TF_VAR_spanner_name=INSERT_YOUR_SPANNER_INSTANCE_NAME
```
If you want to create a new Spanner instance for testing Autoscaler, set the following variable:
```
export TF_VAR_terraform_spanner=true
```
The Spanner instance that Terraform creates is named autoscale-test.

For more information about how to set up Terraform to manage your Spanner instance, see Importing your Spanner instances.
Change your working directory into the Terraform per-project directory:
```
cd "${AUTOSCALER_DIR}"
terraform init
```
This command also initializes the Terraform per-project directory.

Import the existing App Engine application into Terraform state:

terraform import module.scheduler.google_app_engine_application.app "${PROJECT_ID}"

Create the Autoscaler infrastructure:
```
terraform apply -parallelism=2
```
You see the following message asking you to verify that the list of resources for Terraform to create is correct:
```
   Do you want to perform these actions?
   Terraform will perform the actions described above.
   Only 'yes' will be accepted to approve.
   Enter a value:
   
```
After you verify the resources, type yes when prompted.

When you run this command in Cloud Shell, you might encounter the following error message:

"Error: cannot assign requested address"

This error is a known issue in the Terraform Google provider. In this case, retry with the following command: terraform apply -parallelism=1.

Importing your Spanner instances

If you have existing Spanner instances that you want to import for Terraform to manage them, follow the instructions in this section. Otherwise, skip to Configuring Autoscaler.

In Cloud Shell, list your Spanner instances:
```
gcloud spanner instances list
```
Set the following variable with the instance name that you want to be autoscaled:
```
SPANNER_INSTANCE_NAME=YOUR_SPANNER_INSTANCE_NAME
```

Create a Terraform configuration file with an empty google_spanner_instance resource:

echo "resource \"google_spanner_instance\" \"${SPANNER_INSTANCE_NAME}\" {}" > "${SPANNER_INSTANCE_NAME}.tf"

Import the Spanner instance into the Terraform state:

terraform import "google_spanner_instance.${SPANNER_INSTANCE_NAME}" "${SPANNER_INSTANCE_NAME}"

When the import completes, update the Terraform configuration file for your instance with the actual instance attribute:
```
terraform state show -no-color "google_spanner_instance.${SPANNER_INSTANCE_NAME}" \
    | grep -vE "(id|num_nodes|state|timeouts).*(=|\{)" \
    > "${SPANNER_INSTANCE_NAME}.tf"
```
If you have additional Spanner instances to import, repeat the importing process.

Configuring the Autoscaler

After you deploy Autoscaler, you configure its parameters.

In the Google Cloud console, go to the Cloud Scheduler page.

Go to Cloud Scheduler
Select the checkbox next to the poll-main-instance-metrics job that was created by the Autoscaler deployment.
Click Edit.

Modify the parameters for the Autoscaler shown in the payload field.

The following is an example of a payload:

    [
        {
            "projectId": "my-spanner-project",
            "instanceId": "spanner1",
            "scalerPubSubTopic": "projects/my-spanner-project/topics/spanner-scaling",
            "units": "NODES",
            "minSize": 1,
            "maxSize": 3
        },{
            "projectId": "different-project",
            "instanceId": "another-spanner1",
            "scalerPubSubTopic": "projects/my-spanner-project/topics/spanner-scaling",
            "units": "PROCESSING_UNITS",
            "minSize": 500,
            "maxSize": 3000,
            "scalingMethod": "DIRECT"
        }
    ]

The payload is defined using a JSON array. Each element in the array represents a Spanner instance that shares the same Autoscaler job schedule.

For more details about the parameters and their default values, see the README for the Poller component.

To save the changes, click Update.

The Autoscaler is now configured and ready to start monitoring and scaling your instances in the next scheduled job run.

If there are syntax errors in your JSON payload, you can examine them in the Google Cloud console on the Logs Explorer page as log entries from the tf-poller-function function.

Go to Logs Explorer

The following is an example of an error that you might see:

SyntaxError: Unexpected token errortext in JSON at position 15 JSON.parse

To avoid syntax errors, use an editor that can reformat and validate JSON.

Monitoring the Autoscaler

In this step, you set up monitoring on the Poller and Scaler Cloud Run functions.

In the Google Cloud console, open the Logs Explorer page.

Go to Logs Explorer

Click Query preview and enter the following filter into Query builder:

resource.type="cloud_function"
resource.labels.function_name=~"tf-.*-function"

Click Run Query.

Under Query results, you can see all the messages from Autoscaler functions. As the poller only runs every 2 minutes, you may need to re-run the query to receive the log messages.
To only see messages from the Scaler Cloud Run functions, click the Query preview box and replace the previous filter in the Query builder text box with the following:
```
resource.type="cloud_function"
resource.labels.function_name="tf-scaler-function"
```
Click Run Query.

Under Query results, because of the filter applied to the text payload, you see only the messages from the Scaler function related to scaling suggestions and decisions.

Using filter query or similar filters you can create logs-based metrics. These metrics are useful for functions such as recording the frequency of autoscaling events, or in Cloud Monitoring charts and alerting policies.

Testing the Autoscaler

In this section, you verify the operation of Autoscaler by changing the minimum instance size and monitoring the logs.

When you deploy Autoscaler with a test database, Autoscaler is configured to use NODES as the unit for compute capacity. You can verify whether the tool is functioning by changing the setting for the minimum size (minSize) to 2. If the tool is running as expected, the Spanner instance scales out to 2 nodes. If you used an existing database for this tutorial, you might see different values.

In the Google Cloud console, go to the Cloud Scheduler page.

Go to Cloud Scheduler
Select the checkbox next to the poll-main-instance-metrics job that was created by the Autoscaler deployment.
Click Edit.
In the Job payload field, change the minSize value from 1 to 2:
```
"minSize": 2
```
To save the changes, click Update.
Go to the Logs Explorer page.

Open Logs Explorer

Click Query preview and enter the following filter into Query builder:

resource.type="cloud_function"
resource.labels.function_name="tf-scaler-function"

Click Run Query.
Click Jump to Now until you see the following log message:

Scaling spanner instance to 2 NODES
To verify that Spanner has scaled out to 2 nodes, in the Google Cloud console go to the Spanner console page.

Go to Spanner
Click the autoscale-test instance.

Under overview, verify that the number of nodes is now 2. This quick test demonstrates a scaling out event by modifying Autoscaler parameters. You can perform a load test with a tool such as YCSB to simulate Autoscaler triggering a scaling event based on utilization.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

Read about the Autoscaler Architecture.
Read about how to deploy Autoscaler in distributed mode.
Read more about Spanner recommended thresholds.
Read more about Spanner CPU utilization metrics and Latency metrics.
Read more about Spanner schema design best practices.
Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.