Cloud Spanner allows you to make schema updates with no downtime. You can update the schema of an existing database in several ways:
- In the Cloud Console
- Using the
- Using the client libraries
Supported schema updates
Cloud Spanner supports the following schema updates of an existing database:
- Create a new table. Columns in new tables can be
- Delete a table, if no other tables are interleaved within it, and it has no secondary indexes.
- Create or delete a table with a foreign key.
- Add or remove a foreign key from an existing table.
- Add a non-key column to any table. New non-key columns cannot be
- Drop a non-key column from any table, unless it is used by a secondary index, foreign key, stored generated column, or check constraint.
NOT NULLto a non-key column, excluding
NOT NULLfrom a non-key column.
- Change a
STRINGcolumn to a
BYTEScolumn or a
BYTEScolumn to a
- Increase or decrease the length limit for a
BYTEStype (including to
MAX), unless it is a primary key column inherited by one or more child tables.
- Enable or disable commit timestamps in value and primary key columns.
- Add or remove a secondary index.
- Add or remove a check constraint from an existing table.
- Add or remove a stored generated column from an existing table.
Schema update performance
Schema updates in Cloud Spanner do not require downtime. When you issue a batch of DDL statements to a Cloud Spanner database, you can continue writing and reading from the database without interruption while Cloud Spanner applies the update as a long-running operation.
The time it takes to execute a DDL statement depends on whether the update
requires validation of the existing data or backfill of any data. For example,
if you add the
NOT NULL annotation to an existing column, Cloud Spanner must
read all the values in the column to make sure that the column does not contain
NULL values. This step can take a long time if there is a lot of data to
validate. Another example is if you're adding an index to a database:
Cloud Spanner backfills the index using existing data, and that process can
take a long time depending on how the index's definition and the size of the
corresponding base table. However, if you add a new column to a table, there is
no existing data to validate, so Cloud Spanner can make the update more
In summary, schema updates that do not require Cloud Spanner to validate existing data can happen in minutes. Schema updates that require validation can take longer, depending on the amount of existing data that needs to be validated, but data validation happens in the background at a lower priority than production traffic. Schema updates that require data validation are discussed in more detail in the next section.
Schema updates validated against view definitions
When you make a schema update, Cloud Spanner validates that the update will not invalidate the queries used to define existing views. If validation is successful, the schema update succeeds. If validation is not successful, the schema update fails. Check Best practices when creating views for details.
Schema updates that require data validation
You can make schema updates that require validating that the existing data meets the new constraints. When a schema update requires data validation, Cloud Spanner disallows conflicting schema updates to the affected schema entities and validates the data in the background. If validation is successful, the schema update succeeds. If validation is not successful, the schema update does not succeed. Validation operations are executed as long-running operations. You can check the status of these operations to determine if they succeeded or failed.
For example, suppose you have defined a
Songwriters table in your schema:
CREATE TABLE Songwriters ( Id INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), Nickname STRING(MAX), OpaqueData BYTES(MAX), ) PRIMARY KEY (Id);
The following schema updates are allowed, but they require validation and might take longer to complete, depending on the amount of existing data:
NOT NULLannotation to a non-key column. For example:
ALTER TABLE Songwriters ALTER COLUMN Nickname STRING(MAX) NOT NULL;
Reducing the length of a column. For example:
ALTER TABLE Songwriters ALTER COLUMN FirstName STRING(10);
STRING. For example:
ALTER TABLE Songwriters ALTER COLUMN OpaqueData STRING(MAX);
Enabling commit timestamps on an existing
TIMESTAMPcolumn. For example:
ALTER TABLE Albums ALTER COLUMN LastUpdateTime SET OPTIONS (allow_commit_timestamp = true);
Adding a check constraint to an existing table.
Adding a stored generated column to an existing table.
Creating a new table with a foreign key.
Adding a foreign key to an existing table.
These schema updates fail if the underlying data does not satisfy the new
constraints. For example, the
ALTER TABLE Songwriters ALTER COLUMN Nickname
STRING(MAX) NOT NULL statement above fails if any value in the
NULL, because the existing data does not meet the
constraint of the new definition.
Data validation can take from several minutes to many hours. The time to complete data validation depends on:
- The size of the data set
- The compute capacity of the instance
- The load on the instance
Some schema updates can change the behavior of requests to the database before
the schema update completes. For example, if you're adding
NOT NULL to a
column, Cloud Spanner almost immediately begins rejecting writes for new
requests that use
NULL for the column. If the new schema update ultimately
fails for data validation, there will have been a period of time when writes
were blocked, even if they would have been accepted by the old schema.
Order of execution of statements in batches
If you use the gcloud tool, REST API, or the RPC API, you can issue a batch of
one or more
Cloud Spanner applies statements from the same batch in order, stopping at the first error. If applying a statement results in an error, that statement is rolled back. The results of any previously applied statements in the batch are not rolled back.
Cloud Spanner might combine and reorder statements from different batches,
potentially mixing statements from different batches into one atomic change that
is applied to the database. Within each atomic change, statements from different
batches happen in an arbitrary order. For example, if one batch of statements
ALTER TABLE MyTable ALTER COLUMN MyColumn STRING(50) and another
batch of statements contains
ALTER TABLE MyTable ALTER COLUMN MyColumn
STRING(20), Cloud Spanner will leave that column in one of those two states,
but it's not specified which.
Schema versions created during schema updates
Cloud Spanner uses schema versioning so that there is no downtime during a schema update to a large database. Cloud Spanner maintains the older schema version to support reads while the schema update is processed. Cloud Spanner then creates one or more new versions of the schema to process the schema update. Each version contains the result of a collection of statements in a single atomic change, as described above.
The schema versions don't necessarily correspond one-to-one with either batches of DDL statements or individual DDL statements. Some individual DDL statements, such as index creation for existing base tables or statements that require data validation, result in multiple schema versions. In other cases, several DDL statements can be batched together in a single version. Old schema versions can consume significant server and storage resources, and they are retained until they expire (no longer needed to serve reads of older versions of data).
The following table shows how long it takes Cloud Spanner to update a schema.
|Schema operation||Estimated duration|
Minutes to hours, if the base table is created before the index.
Minutes, if the statement is executed at the same time as the
Minutes to hours, if background validation is required.
Minutes, if background validation is not required.
Best practices for schema updates
The following sections describe best practices for updating schemas.
Procedures before you issue the schema update
Before you issue a schema update:
Verify that all of the existing data in the database that you're changing meets the constraints that the schema update is imposing. Because the success of some types of schema updates depends on the data in the database and not just its current schema, a successful schema update of a test database does not guarantee a successful schema update of a production database. Here are some common examples:
- If you're adding a
NOT NULLannotation to an existing column, check that the column does not contain any existing
- If you're shortening the allowed length of a
BYTEScolumn, check that all existing values in that column meet the desired length constraint.
- If you're adding a
If you're writing to a column, table, or index that is undergoing a schema update, ensure that the values that you're writing meet the new constraints.
If you're dropping a column, table, or index, make sure you are not still writing to or reading from it.
Limit the frequency of schema updates
If you perform too many schema updates in a short period of time, Cloud Spanner
processing of queued schema updates. This is because Cloud Spanner limits the
amount of space for storing schema versions. Your schema update may be throttled
if there are too many old schema versions within the retention period. The
maximum rate of schema changes depends on many factors,
one of them being the total number of columns in the database. For example, a
database with 2000 columns (roughly 2000 rows in
is able to perform at most 1500 simple schema changes (fewer if the schema
change requires multiple versions) within the retention period. To see the state
of ongoing schema updates, use the
gcloud spanner operations list
command and filter by operations of type
DATABASE_UPDATE_DDL. To cancel an
ongoing schema update, use the
gcloud spanner operations cancel
command and specify the operation ID.
How your DDL statements are batched, and their order within each batch, can affect the number of schema versions that result. To maximize the number of schema updates you can perform over any given period of time, you should use batching that minimizes the number of schema versions. Some rules of thumb are described in large updates.
As described in schema versions, some DDL statements will create multiple schema versions, and these ones are important when considering batching and order within each batch. There are two main types of statements that might create multiple schema versions:
- Statements that might need to backfill index data, like
- Statements that might need to validate existing data, like adding
These types of statements don't always create multiple schema versions,
though. Cloud Spanner will try to detect when these types of statements can be
optimized to avoid using multiple schema versions, which depends on batching.
For example, a
CREATE INDEX statement that occurs in the same batch as a
CREATE TABLE statement for the index's base table, without any intervening
statements for other tables, can avoid needing to backfill the index data
because Cloud Spanner can guarantee that the base table is empty at the time
the index is created. The large updates section describes how
to use this property to create many indexes efficiently.
If you cannot batch your DDL statements to avoid creating many schema versions, you should limit the number of schema updates to a single database's schema within its retention period. Increase the time window in which you make schema updates to allow Cloud Spanner to remove old versions of the schema before new versions are created.
- For some relational database management systems, there are software packages that make a long series of upgrade and downgrade schema updates to the database on every production deployment. These types of processes are not recommended for Cloud Spanner.
- Cloud Spanner is optimized to use primary keys to partition data for multitenancy solutions. Multitenancy solutions that use separate tables for each customer can result in a large backlog of schema update operations that take a long time to complete.
- Schema updates that require validation or index backfill use more server resources because each statement creates multiple versions of the schema internally.
Options for large schema updates
The best way to create a table and a large number of indexes on that table is to create all of them at the same time, so that there will be only a single schema version created. It's best practice to create the indexes immediately following the table in the list of DDL statements. You can create the table and its indexes when you create the database, or in a single large batch of statements. If you need to create many tables, each with many indexes, you can include all the statements in a single batch. You can include several thousand statements in a single batch when all the statements can be executed together using a single schema version.
When a statement requires backfilling index data or performing data validation,
it can't be executed in a single schema version. This happens for
statements when the index's base table already exists (either because it was
created in a previous batch of DDL statements, or because there was a statement
in the batch between the
CREATE TABLE and
CREATE INDEX statements that
required multiple schema versions). Cloud Spanner requires that there are no
more than 10 such statements in a single batch. Index creation that requires
backfilling, in particular, uses several schema versions per index, and so it is
a good rule of thumb to create no more than 3 new
indexes requiring backfilling per day (no matter how they are batched, unless
such batching can avoid backfilling).
For example, this batch of statements will use a single schema version:
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), ) PRIMARY KEY (SingerId); CREATE INDEX SingersByFirstName ON Singers(FirstName); CREATE INDEX SingersByLastName ON Singers(LastName); CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId); CREATE INDEX AlbumsByTitle ON Albums(AlbumTitle);
In contrast, this batch will use many schema versions, because
requires backfilling (since its base table must have already existed), and that
forces all the following indexes to also require backfilling (even though
they're in the same batch as their base tables):
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId); CREATE INDEX UnrelatedIndex ON UnrelatedTable(UnrelatedIndexKey); CREATE INDEX SingersByFirstName ON Singers(FirstName); CREATE INDEX SingersByLastName ON Singers(LastName); CREATE INDEX AlbumsByTitle ON Albums(AlbumTitle);
It would be better to move the creation of
UnrelatedIndex to the end of the
batch, or to a different batch, to minimize schema versions.
Wait for API requests to complete
UpdateDatabaseDdl (RPC API) requests, use
projects.instances.databases.operations.get (REST API)
GetOperation (RPC API), respectively, to wait for
each request to complete before starting a new request. Waiting for each request
to complete allows your application to track the progress of your schema
updates. It also keeps the backlog of pending schema updates to a manageable
If you are bulk loading data into your tables after they are created, it is usually more efficient to create indexes after the data is loaded. If you are adding several indexes, it might be more efficient to create the database with all tables and indexes in the initial schema, as described in the options for large updates.