Jump to Content
Developers & Practitioners

Tracking index backfill operation progress in Cloud Spanner

April 16, 2021
Shirdon Gorse

Software Engineer, Cloud Spanner

One of the cool things about Google Cloud Spanner, a horizontally scalable relational database, is that you can do an online schema update. Your database is never down for schema update operations. Cloud Spanner continues to serve data during ongoing schema updates. 

Imagine your application is querying data from a large table in the Cloud Spanner database and you want to add a secondary index on a column to make the data lookup more efficient. Cloud Spanner automatically starts backfilling, or populating, the index to reflect an up-to-date view of the data being indexed. Depending on the size of the dataset, load on the instance etc, it can take several minutes to many hours for that index backfill to complete. While the database continues to serve the traffic, you may want to check the progress of index backfill to plan for deploying application changes that rely on the new indexes once the backfill is complete.

Here is some good news for you. Cloud Spanner now provides an ability to track the progress of index backfill. Let us dive deep to understand how you can use the index backfill progress reporting feature. 

Index Creation

Suppose you want to speed up the queries against an example Singers table, and we realize that it is common for queries to also specify both the FirstName and LastName. The schema for the Singers table is shown below:

https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_5xZ4PON.max-900x900.png

This problem could be solved by creating a secondary index that contains the FirstName and LastName as part of the index key. Let us say you issue the following index creation statement for the index SingersByFirstLastNames through the GCP Console:

https://storage.googleapis.com/gweb-cloudblog-publish/images/image3_sj8jruQ.max-1000x1000.png

This statement will trigger the index backfill operation for a non-interleaved index. The primary key for the secondary index will now contain SingerId, FirstName, and LastName. Once the schema update operation is initiated, you go back to the Indexes tab, and see a spinning wheel next to the SingerByFirstLastNames index. 

https://storage.googleapis.com/gweb-cloudblog-publish/images/image2_ELdwmdU.max-800x800.png

A few minutes go by, and you are confused as to when the SingerByFirstLastNames index will be available to use for your queries. How can you understand how much progress has been made on the creation of the secondary index?

Tracking Index Backfill Progress

You can use gcloud command line tool, REST API, or RPC API to monitor the Index Backfill progress. We are also in the process of adding support for this field

The next steps will be to monitor the progress of index backfill, and this will be done using the gcloud command in our example. You can view the progress of the index backfill using the OPERATION_ID. If you don't have the OPERATION_ID, find it by using gcloud spanner operations list:

Loading...

Output of the “operations list” command:

https://storage.googleapis.com/gweb-cloudblog-publish/images/Screen_Shot_2021-04-16_at_10.33.00_AM.max-1100x1100.png

To track the progress of the secondary index backfill operation, use gcloud spanner operations describe:

Loading...

Output of the “operations describe” command when the index backfill has not completed:



https://storage.googleapis.com/gweb-cloudblog-publish/images/image5_SvbwMup.max-700x700.png


Here you can observe that the Index Backfill process triggered due to the Index Creation statement has progressed 64%. Once the process is completed, output of the “operations describe” command shows the progress percent as 100% as shown below.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_RLZICGw.max-800x800.png


The “progress” array is where you will find information related to the progress of the index backfill operation. It contains the “startTime”, “progressPercent”, and “endTime” when available for each schema change statement. This example shows only one index creation statement for simplicity, but there can be multiple schema change statements per schema update operation. For more information on interpreting the index backfill progress for multi-statement schema change operations, please refer to the official documentation. 

You can then periodically track the progress made on the secondary index backfill operation by invoking the “gcloud spanner operations describe” command until the operation is complete.

Summary

New introspection feature “Index Backfill progress reporting” helps you to get visibility into the progress of the index backfill long-running operation. Similarly you can also get visibility into the progress of Backup/Restore operations as described in official documentation.

References

Managing Long-Running Operations

Secondary Indexes Documentation

Posted in