Schema Updates

Cloud Spanner allows you to make schema updates with no downtime. You can update the schema of an existing database in several ways:

Supported schema updates

Cloud Spanner supports the following schema updates of an existing database:

  • Create new tables. Columns in new tables can be NOT NULL.
  • Delete a table, if no other tables are interleaved within it, and it has no secondary indexes.
  • Add a non-key column to any table. New non-key columns can't be NOT NULL.
  • Add NOT NULL to a non-key column.
  • Remove NOT NULL from a non-key column.
  • Drop a non-key column from any table, unless it is used by a secondary index.
  • Change a STRING column to a BYTES column or a BYTES column to a STRING column.
  • Increase or decrease the length limit for a STRING or BYTES type (including to MAX), unless it is a primary key column inherited by one or more child tables.
  • Enable or disable commit timestamps in value and primary key columns.
  • Add or remove any secondary index.

Schema update performance

Schema updates in Cloud Spanner do not require downtime. When you issue a batch of DDL statements to a Cloud Spanner database, you can continue writing and reading from the database without interruption while Cloud Spanner applies the update as a long-running operation.

The time it takes to execute a DDL statement depends on whether the update requires validation of the existing data or backfill of any data. For example, if you add the NOT NULL annotation to an existing column, Cloud Spanner must read all the values in the column to make sure that the column does not contain any NULL values. This step can take a long time if there is a lot of data to validate. Another example is if you're adding an index to a database: Cloud Spanner backfills the index using existing data, and that process can take a long time depending on how the index's definition and the size of the corresponding base table. However, if you add a new column to a table, there is no existing data to validate, so Cloud Spanner can make the update more quickly.

In summary, schema updates that do not require Cloud Spanner to validate existing data can happen in minutes. Schema updates that require validation can take longer, depending on the amount of existing data that needs to be validated, but data validation happens in the background at a lower priority than production traffic. Schema updates that require data validation are discussed in more detail in the next section.

Schema updates that require data validation

You can make schema updates that require validating that the existing data meets the new constraints. When a schema update requires data validation, Cloud Spanner disallows conflicting schema updates to the affected schema entities and validates the data in the background. If validation is successful, the schema update succeeds. If validation is not successful, the schema update does not succeed. Validation operations are executed as long-running operations. You can check the status of these operations to determine if they succeeded or failed.

For example, suppose you have defined a Songwriters table in your schema:

CREATE TABLE Songwriters (
  Id         INT64 NOT NULL,
  FirstName  STRING(1024),
  LastName   STRING(1024),
  Nickname   STRING(MAX),
  OpaqueData BYTES(MAX),
) PRIMARY KEY (Id);

The following schema updates are allowed, but they require validation and might take longer to complete, depending on the amount of existing data:

  • Adding the NOT NULL annotation to a non-key column.

    Example: ALTER TABLE Songwriters ALTER COLUMN Nickname STRING(MAX) NOT NULL

  • Reducing the length of a column.

    Example: ALTER TABLE Songwriters ALTER COLUMN FirstName STRING(10)

  • Altering from BYTES to STRING.

    Example: ALTER TABLE Songwriters ALTER COLUMN OpaqueData STRING(MAX)

  • Enabling commit timestamps on an existingTIMESTAMP column.

    Example: ALTER TABLE Albums ALTER COLUMN LastUpdateTime SET OPTIONS (allow_commit_timestamp = true)

These schema updates fail if the underlying data does not satisfy the new constraints. For example, the ALTER TABLE Songwriters ALTER COLUMN Nickname STRING(MAX) NOT NULL statement above fails if any value in the Nickname column is NULL, because the existing data does not meet the NOT NULL constraint of the new definition.

Data validation can take from several minutes to many hours. The time to complete data validation depends on:

  • The size of the data set
  • The number of nodes in the instance
  • The load on the instance

Some schema updates can change the behavior of requests to the database before the schema update completes. For example, if you're adding NOT NULL to a column, Cloud Spanner almost immediately begins rejecting writes for new requests that use NULL for the column. If the new schema update ultimately fails for data validation, there will have been a period of time when writes were blocked, even if they would have been accepted by the old schema.

You can cancel a long-running data validation operation using the projects.instances.databases.operations.cancel method or using gcloud spanner operations.

Order of execution of statements in batches

If you use the gcloud tool, REST API, or the RPC API, you can issue a batch of one or more CREATE, ALTER, or DROP statements.

Cloud Spanner applies statements from the same batch in order, stopping at the first error. If applying a statement results in an error, that statement is rolled back. The results of any previously applied statements in the batch are not rolled back.

Cloud Spanner might combine and reorder statements from different batches, potentially mixing statements from different batches into one atomic change that is applied to the database. Within each atomic change, statements from different batches happen in an arbitrary order. For example, if one batch of statements contains ALTER TABLE MyTable ALTER COLUMN MyColumn STRING(50) and another batch of statements contains ALTER TABLE MyTable ALTER COLUMN MyColumn STRING(20), Cloud Spanner will leave that column in one of those two states, but it's not specified which.

Schema versions created during schema updates

Cloud Spanner uses schema versioning so that there is no downtime during a schema update to a large database. Cloud Spanner maintains the older schema version to support reads while the schema update is processed. Cloud Spanner then creates one or more new versions of the schema to process the schema update. Each version contains the result of a collection of statements in a single atomic change, as described above. The schema versions don't necessarily correspond one-to-one with either batches of DDL statements or individual DDL statements. Some individual DDL statements, for example non-interleaved index creation or statements that require data validation, result in multiple schema versions. In other cases, several DDL statements can be batched together in a single version. Schema versions can consume significant server and storage resources, and they persist for up to a week.

The following table shows the duration of schema update operations.

Schema operation Estimated duration
CREATE TABLE Minutes
CREATE INDEX

Minutes to hours, if the index is non-interleaved. Creating the index can take a long time even for small or empty tables.

Minutes, if the index is interleaved and executed at the same time as CREATE TABLE for the base table.

DROP TABLE Minutes
DROP INDEX Minutes
ALTER TABLE ... ADD COLUMN Minutes
ALTER TABLE ... ALTER COLUMN

Several hours, if background validation is required.

Minutes, if background validation is not required.

ALTER TABLE ... DROP COLUMN Minutes

Best practices for schema updates

The following sections describe best practices for updating schemas.

Procedures before you issue the schema update

Before you issue a schema update:

  • Verify that all of the existing data in the database that you're changing meets the constraints that the schema update is imposing. Because the success of some types of schema updates depends on the data in the database and not just its current schema, a successful schema update of a test database does not guarantee a successful schema update of a production database. Here are some common examples:
    • If you're adding a NOT NULL annotation to an existing column, check that the column does not contain any existing NULL values.
    • If you're shortening the allowed length of a STRING or BYTES column, check that all existing values in that column meet the desired length constraint.
  • If you're writing to a column, table, or index that is undergoing a schema update, ensure that the values that you're writing meet the new constraints.
  • If you're dropping a column, table, or index, make sure you are not still writing to or reading from it.

Limit the number of schema updates in a 7-day period

Avoid making many schema updates to a single database's schema in a given 7-day period. Increase the time window in which you make schema updates to allow Cloud Spanner to remove old versions of the schema before new versions are created.

  • For some relational database management systems, there are software packages that make a long series of upgrade and downgrade schema updates to the database on every production deployment. These types of processes are not recommended for Cloud Spanner.
  • Cloud Spanner is optimized to use primary keys to partition data for multitenancy solutions. Multitenancy solutions that use separate tables for each customer can result in a large backlog of schema update operations that take a long time to complete.

Avoid more than 30 DDL statements that require validation or index backfill in a given 7-day period, because each statement creates multiple versions of the schema internally.

Options for large schema updates

If you are creating a database with a large number of indexes, especially if the indexes are non-interleaved, use one of these methods:

  • Create both the database and its schema in the database creation request. Creating tables and indexes at database creation is a fast operation, because Cloud Spanner does not need to validate data, store multiple schema versions, or coordinate schema changes with concurrent transactions.
  • Create the database and then add new indexes at an average rate of 3 per day.

If you have a schema update with a large number of DDL statements, do not execute them a few at a time using projects.instances.databases.updateDdl (REST API) or UpdateDatabaseDdl (RPC API) requests. Instead, batch them in larger groups. You can batch several thousand DDL statements in a single request, though no more than 10 of the statements can require validation or backfill. The validation and backfill limit is due to Cloud Spanner's ability to combine the statements as well as the number of schema versions that Cloud Spanner creates. Even with batching, limit the number of indexes that you add to an approximately 3 per day.

Wait for API requests to complete

When making projects.instances.databases.updateDdl (REST API) or UpdateDatabaseDdl (RPC API) requests, use projects.instances.databases.operations.get (REST API) or GetOperation (RPC API), respectively, to wait for each request to complete before starting a new request. Waiting for each request to complete allows your application to track the progress of your schema updates. It also keeps the backlog of pending schema updates to a manageable size.

Bulk loading

If you are bulk loading data into your tables after they are created, it is usually more efficient to create indexes after the data is loaded. If you are adding several indexes, it might be more efficient to create the database with all tables and indexes in the initial schema, as described in Options for large updates.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Spanner Documentation