In a Cloud Spanner database, Cloud Spanner automatically creates an index for each
table's primary key. For example, you don't need to do anything to index
the primary key of Singers
, because it's automatically indexed for you.
You can also create secondary indexes for other columns. Adding a secondary
index on a column makes it more efficient to look up data in that column. For
example, if you need to quickly look up a set of SingerId
values for a given
range of LastName
values, you should create a secondary index on LastName
,
so Cloud Spanner does not need to scan the entire table.
If the lookup in the example above is done within a read-write transaction, the
more efficient lookup also avoids holding locks on the entire table, which
allows concurrent inserts and updates to the table for rows outside of the
LastName
lookup range.
Cloud Spanner stores the following data in each secondary index:
- All key columns from the base table
- All columns that are included in the index
- All columns specified in the optional
STORING
clause of the index definition
Over time, Cloud Spanner analyzes your tables to ensure that your secondary indexes are used for the appropriate queries.
Adding a secondary index
The most efficient time to add a secondary index is when you create the table. To create a table and its indexes at the same time, send the DDL statements for the new table and the new indexes in a single request to Cloud Spanner.
In Cloud Spanner, you can also add a new secondary index to an existing table while the database continues to serve traffic. Like any other schema changes in Cloud Spanner, adding an index to an existing database does not require taking the database offline and does not lock entire columns or tables.
Whenever a new index is added to an existing table, Cloud Spanner automatically backfills, or populates, the index to reflect an up-to-date view of the data being indexed. Cloud Spanner manages this backfill process for you, and it uses additional resources during the index backfill.
Index creation can take from several minutes to many hours. Because index creation is a schema update, it is bound by the same performance constraints as any other schema update. The time needed to create a secondary index depends on several factors:
- The size of the data set
- The number of nodes in the instance
- The load on the instance
To view the progress made for an index backfill process, refer to the progress section.
Be aware that using the commit timestamp column as the first part of the secondary index can create hotspots and reduce write performance.
Use the CREATE INDEX
statement to define a secondary index
in your schema. Here are some examples:
To index all Singers
in the database by their first and last name:
CREATE INDEX SingersByFirstLastName ON Singers(FirstName, LastName)
To create an index of all Songs
in the database by the value of SongName
:
CREATE INDEX SongsBySongName ON Songs(SongName)
To index only the songs for a particular singer, use the
INTERLEAVE IN
clause to interleave the index in
the table Singers
:
CREATE INDEX SongsBySingerSongName ON Songs(SingerId, SongName),
INTERLEAVE IN Singers
To index only the songs on a particular album:
CREATE INDEX SongsBySingerAlbumSongName ON Songs(SingerId, AlbumId, SongName),
INTERLEAVE IN Albums
To index by descending order of SongName
:
CREATE INDEX SongsBySingerAlbumSongNameDesc ON Songs(SingerId, AlbumId, SongName DESC),
INTERLEAVE IN Albums
Note that the DESC
annotation above applies only to SongName
. To index by
descending order of other index keys, annotate them with DESC
as well:
SingerId DESC, AlbumId DESC
.
Also note that PRIMARY_KEY
is a reserved word and cannot be used as the name
of an index. It is the name given to the pseudo-index
that is created when a table with PRIMARY KEY specification is created
For more details and best practices for choosing non-interleaved indexes and interleaved indexes, see Index options and Use an interleaved index on a column whose value monotonically increases or decreases.
Viewing index backfill progress
Steps to view index backfill progress
An index backfill process is part of a schema-update long-running operation since the addition of a secondary index requires a schema update. You can view the progress of the index backfill using the operation id. If you don't have the operation id, find it by using gcloud spanner operations list:
gcloud spanner operations list --instance=INSTANCE --database=DATABASE
Usage notes:
To limit the list of operations returned by this command, specify the
--filter
flag. For example, use the following filter to return schema-update operations.--filter="@TYPE:UpdateDatabaseDdlMetadata"
For information on filter syntax, see gcloud topic filters. For information on filtering database operations, see the
filter
field in ListDatabaseOperationsRequest.
Here is an example of the output:
OPERATION_ID STATEMENTS DONE @TYPE _auto_op_123456 CREATE INDEX SingersByFirstLastName ON Singers(FirstName, LastName) False UpdateDatabaseDdlMetadata CREATE INDEX SongsBySingerAlbumSongName ON Songs(SingerId, AlbumId, SongName), INTERLEAVE IN Albums _auto_op_234567 True CreateDatabaseMetadata
To track the progress of one or multiple secondary index backfill processes, use gcloud spanner operations describe:
gcloud spanner operations describe _auto_op_123456 \ --instance=INSTANCE \ --database=DATABASE
Here is an example of the output of a schema-update long-running operation that contains two index backfill processes:
done: true metadata: '@type': type.googleapis.com/google.spanner.admin.database.v1.UpdateDatabaseDdlMetadata commitTimestamps: - '2021-01-22T21:58:42.912540Z' database: projects/my-project/instances/test-instance/databases/example-db progress: - endTime: '2021-01-22T21:58:42.912540Z' progressPercent: 100 startTime: '2021-01-22T21:58:11.053996Z' - progressPercent: 67 startTime: '2021-01-22T21:58:11.053996Z' statements: - CREATE INDEX SingersByFirstLastName ON Singers(FirstName, LastName) - CREATE INDEX SongsBySingerAlbumSongName ON Songs(SingerId, AlbumId, SongName), INTERLEAVE IN Albums name: projects/my-project/instances/test-instance/databases/example-db/operations/_auto_op_123456 response: '@type': type.googleapis.com/google.protobuf.Empty
The progress of each index backfill statement can be found in the
progress
field. For each statement in the statement array, there is a corresponding field in the progress array. And thestartTime
andendTime
fields are populated when available. The progress array order corresponds to the order of the statements array.
Scenarios when viewing index backfill progress
There are different scenarios that a user could face when trying to check the progress of an index backfill. Index creation statements that require an index backfill are part of schema update operations, and there can be several statements that are part of a schema update operation.
The first scenario is the simplest, which is when the index creation statement
is the first statement in the schema update operation. The startTime
field for
the index creation statement will be populated with the start time of the schema
update operation since it is the first statement, and will be the first to be
processed and executed due to the
order of execution.
The index creation statement's progressPercent
field will be populated when
the progress of the index backfill is above 0%. The endTime
field is populated
once the statement has been committed.
The second scenario is when the index creation statement is not the first
statement in the schema update operation. No fields related to the index
creation statement will be populated until the previous statement(s) have been
committed due to the
order of execution.
Once the previous statement(s) have been committed, the startTime
of the index
creation statement's progress field will be the first attribute populated. Then
the index creation statement's progressPercent
field will be populated when
the progress of the index backfill is above 0%. The endTime
field is populated
once the statement has been committed.
Canceling index creation
You can use the Cloud SDK to cancel index creation. To retrieve a list of
schema-update operations for a Cloud Spanner database, use the
gcloud spanner operations list
command, and include
the --filter
option:
gcloud spanner operations list \
--instance=INSTANCE \
--database=DATABASE \
--filter="@TYPE:UpdateDatabaseDdlMetadata"
Find the OPERATION_ID
for the operation you want to cancel, then use the
gcloud spanner operations cancel
command to cancel
it:
gcloud spanner operations cancel OPERATION_ID \
--instance=INSTANCE \
--database=DATABASE
Viewing existing indexes
To view information about existing indexes in a database, you can use the
Google Cloud Console or the gcloud
command-line tool:
Console
Go to the Cloud Spanner Instances page in the Cloud Console.
Click the name of the instance you want to view.
In the left pane, click the database you want to view, then click the table you want to view.
Click the Indexes tab. The Cloud Console shows a list of indexes.
Optional: To get details about an index, such as the columns that it includes, click the name of the index.
gcloud
Use the gcloud spanner databases ddl describe
command:
gcloud spanner databases ddl describe DATABASE \
--instance=INSTANCE
The gcloud
tool prints the Data Definition Language (DDL)
statements to create the database's tables and indexes. The CREATE
INDEX
statements describe the existing indexes. For
example:
--- |-
CREATE TABLE Singers (
SingerId INT64 NOT NULL,
FirstName STRING(1024),
LastName STRING(1024),
SingerInfo BYTES(MAX),
) PRIMARY KEY(SingerId)
---
CREATE INDEX SingersByFirstLastName ON Singers(FirstName, LastName)
Querying with a specific index
The following sections explain how to specify an index in a SQL statement and
with the read interface for Cloud Spanner. The examples in these sections
assume that you added a MarketingBudget
column to the Albums
table and
created an index called AlbumsByAlbumTitle
:
CREATE TABLE Albums (
SingerId INT64 NOT NULL,
AlbumId INT64 NOT NULL,
AlbumTitle STRING(MAX),
MarketingBudget INT64,
) PRIMARY KEY (SingerId, AlbumId),
INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
CREATE INDEX AlbumsByAlbumTitle ON Albums(AlbumTitle);
Specifying an index in a SQL statement
When you use SQL to query a Cloud Spanner table, Cloud Spanner automatically uses any indexes that are likely to make the query more efficient. As a result, you typically don't need to specify an index for SQL queries.
In a few cases, though, Cloud Spanner might choose an index that causes query latency to increase. If you've followed the troubleshooting steps for performance regressions and confirmed that it makes sense to try a different index for the query, you can specify the index as part of your query.
To specify an index in a SQL statement, use FORCE_INDEX
to
provide an index directive. Index directives use the following syntax:
FROM MyTable@{FORCE_INDEX=MyTableIndex}
You can also use an index directive to tell Cloud Spanner to scan the base table instead of using an index:
FROM MyTable@{FORCE_INDEX=_BASE_TABLE}
The following example shows a SQL query that specifies an index:
SELECT AlbumId, AlbumTitle, MarketingBudget
FROM Albums@{FORCE_INDEX=AlbumsByAlbumTitle}
WHERE AlbumTitle >= "Aardvark" AND AlbumTitle < "Goo";
An index directive might force Cloud Spanner's query processor to read
additional columns that are required by the query but not stored in the index.
The query processor retrieves these columns by joining the index and the base
table. To avoid this extra join, use a STORING
clause to
store the additional columns in the index.
For instance, in the example shown above, the MarketingBudget
column is not
stored in the index, but the SQL query selects this column. As a result,
Cloud Spanner must look up the MarketingBudget
column in the base table,
then join it with data from the index, to return the query results.
Cloud Spanner raises an error if the index directive has any of the following issues:
- The index does not exist.
- The index is on a different base table.
- The query is missing a required
NULL
filtering expression for aNULL_FILTERED
index.
The following examples show how to write and execute queries that fetch the
values of AlbumId
, AlbumTitle
, and MarketingBudget
using the index
AlbumsByAlbumTitle
:
C++
C#
Go
Java
Node.js
PHP
Python
Ruby
Specifying an index in the read interface
When you use the read interface to Cloud Spanner, and you want Cloud Spanner to use an index, you must specify the index. The read interface does not select the index automatically.
In addition, your index must contain all of the data that appears in the query results, excluding columns that are part of the primary key. This restriction exists because the read interface does not support joins between the index and the base table. If you need to include other columns in the query results, you have a few options:
- Use a
STORING
clause to store the additional columns in the index. - Query without including the additional columns, then use the primary keys to send another query that reads the additional columns.
Cloud Spanner returns values from the index in ascending sort order by index key. To retrieve values in descending order, complete these steps:
Annotate the index key with
DESC
. For example:CREATE INDEX AlbumsByAlbumTitle ON Albums(AlbumTitle DESC);
The
DESC
annotation applies to a single index key. If the index includes more than one key, and you want the query results to appear in descending order based on all keys, include aDESC
annotation for each key.If the read specifies a key range, ensure that the key range is also in descending order. In other words, the value of the start key must be greater than the value of the end key.
The following example shows how to retrieve the values of AlbumId
and
AlbumTitle
using the index AlbumsByAlbumTitle
:
C++
C#
Go
Java
Node.js
PHP
Python
Ruby
STORING clause
Optionally, you can use the STORING
clause to store a copy of a column in the
index. This type of index provides advantages for queries and read calls using
the index, at the cost of using extra storage:
- SQL queries that use the index and select columns stored in the
STORING
clause do not require an extra join to the base table. - Read calls that use the index can read columns stored in the
STORING
clause.
For example, suppose you created an alternate version of AlbumsByAlbumTitle
that stores a copy of the MarketingBudget
column in the index (note the
STORING
clause in bold):
CREATE INDEX AlbumsByAlbumTitle2 ON Albums(AlbumTitle) STORING (MarketingBudget);
With the old AlbumsByAlbumTitle
index, Cloud Spanner must join the index
with the base table, then retrieve the column from the base table. With the new
AlbumsByAlbumTitle2
index, Cloud Spanner reads the column directly from the
index, which is more efficient.
If you use the read interface instead of SQL, the new AlbumsByAlbumTitle2
index also lets you read the MarketingBudget
column directly:
C++
C#
Go
Java
Node.js
PHP
Python
Ruby
Indexing of NULL values
By default, Cloud Spanner indexes NULL
values. For example, recall the
definition of the index SingersByFirstLastName
on the table Singers
:
CREATE INDEX SingersByFirstLastName ON Singers(FirstName, LastName);
All rows of Singers
are indexed even if either FirstName
or LastName
, or
both, are NULL
.
When NULL
values are indexed, you can perform efficient SQL queries and reads
over data that includes NULL
values. For example, use this SQL query statement
to find all Singers
with a NULL
FirstName
:
SELECT s.SingerId, s.FirstName, s.LastName
FROM Singers@{FORCE_INDEX=SingersByFirstLastName} AS s
WHERE s.FirstName IS NULL;
Sort order for NULL values
Cloud Spanner sorts NULL
as the smallest value for any given type. For a
column in ascending (ASC
) order, NULL
values sort first. For a column in
descending (DESC
) order, NULL
values sort last.
Disabling indexing of NULL values
To disable the indexing of nulls, add the NULL_FILTERED
keyword to the index
definition. NULL_FILTERED
indexes are particularly useful for indexing sparse
columns, where most rows contain a NULL
value. In these cases, the
NULL_FILTERED
index can be considerably smaller and more efficient to maintain
than a normal index that includes NULL
values.
Here's an alternate definition of SingersByFirstLastName
that does
not index NULL
values:
CREATE NULL_FILTERED INDEX SingersByFirstLastNameNoNulls
ON Singers(FirstName, LastName);
The NULL_FILTERED
keyword applies to all index key columns. You cannot specify
NULL
filtering on a per-column basis.
Making an index NULL_FILTERED
prevents Cloud Spanner from using it for some
queries. For example, Cloud Spanner does not use the index for this query,
because the index omits any Singers
rows for which LastName
is NULL
; as a
result, using the index would prevent the query from returning the correct rows:
FROM Singers@{FORCE_INDEX=SingersByFirstLastNameNoNulls}
WHERE FirstName = "John";
To enable Cloud Spanner to use the index, you must rewrite the query so it excludes the rows that are also excluded from the index:
SELECT FirstName, LastName
FROM Singers@{FORCE_INDEX=SingersByFirstLastNameNoNulls}
WHERE FirstName = "John" AND LastName IS NOT NULL;
Unique indexes
Indexes can be declared UNIQUE
. UNIQUE
indexes add a constraint to the
data being indexed that prohibits duplicate entries for a given index key.
This constraint is enforced by Cloud Spanner at transaction commit time.
Specifically, any transaction that would cause multiple index entries for the
same key to exist will fail to commit.
If a table contains non-UNIQUE
data in it to begin with, attempting to
create a UNIQUE
index on it will fail.
A note about UNIQUE NULL_FILTERED indexes
A UNIQUE NULL_FILTERED
index does not enforce index key uniqueness when at
least one of the index's key parts is NULL.
For example, suppose that you created the following table and index:
CREATE TABLE ExampleTable (
Key1 INT64 NOT NULL,
Key2 INT64,
Key3 INT64,
Col1 INT64,
) PRIMARY KEY (Key1, Key2, Key3);
CREATE UNIQUE NULL_FILTERED INDEX ExampleIndex ON ExampleTable (Key1, Key2, Col1);
The following two rows in ExampleTable
have the same values for the secondary
index keys Key1
, Key2
and Col1
:
1, NULL, 1, 1
1, NULL, 2, 1
Because Key2
is NULL
and the index is NULL_FILTERED
, the rows will not be
present in the index ExampleIndex
. Because they are not inserted into the
index, the index will not reject them for violating uniqueness on (Key1, Key2,
Col1)
.
If you want the index to enforce the uniqueness of values of the tuple (Key1
,
Key2
, Col1
), then you must annotate Key2
with NOT NULL
in the table
definition or create the index without NULL_FILTERED
.
Dropping an index
Use the DROP INDEX
statement to drop a secondary index from
your schema.
To drop the index named SingersByFirstLastName
:
DROP INDEX SingersByFirstLastName;
What's next
- Learn about SQL best practices for Cloud Spanner.
- Understand query execution plans for Cloud Spanner.
- Find out how to troubleshoot performance regressions in SQL queries.