Data model summary
A Cloud Spanner database can contain one or more tables. Tables look like relational database tables in that they are structured with rows, columns, and values, and they contain primary keys. Data in Cloud Spanner is strongly typed: you must define a schema for each database and that schema must specify the data types of each column of each table. Allowable data types include scalar and array types, which are explained in more detail in Data Types. You can also define one or more secondary indexes on a table, as described in Secondary Indexes.
Parent-child table relationships
You can define multiple tables in a database, and you can optionally define
parent-child relationships between tables if you want Cloud Spanner to
physically co-locate their rows for efficient retrieval. For example, if you
Customers table and an
Invoices table, and your application
frequently fetches all the invoices for a given customer, you can define
Invoices as a child table of
Customers. In doing so, you're declaring a data
locality relationship between two logically independent tables: you're telling
Cloud Spanner to physically store one or more rows of
Invoices with one
How do you tell Cloud Spanner which
Invoices rows to store with which
Customers rows? You do so using the primary key of these tables. Every table
must have a primary key, and that primary key can be composed of zero or more
columns of that table. If you declare a table to be a child of another table,
the primary key column(s) of the parent table must be the prefix of the primary
key of the child table. This means if a parent table's primary key is composed
of N columns, the primary key of each of its child tables must also be composed
of those same N columns, in the same order and starting with the same column.
Cloud Spanner stores rows in sorted order by primary key values, with child rows inserted between parent rows that share the same primary key prefix. This insertion of child rows between parent rows along the primary key dimension is called interleaving, and child tables are also called interleaved tables. (See an illustration of interleaved rows in the physical view of schema example 3 below.)
In summary, primary keys in Cloud Spanner allow you to physically co-locate rows of related tables. The schema examples below show what this physical layout looks like.
Choosing a primary key
If your primary key is composed of one or more columns, you'll need a unique
primary key for each row. Often your application already has a field that's a
natural fit for use as the primary key. For example, in the
example above, there might be an application-supplied
CustomerId that serves
well as the primary key. In other cases, you may need to generate a primary key
when inserting the row, like a unique
INT64 value that you generate using a
In all cases, you should be careful not to create hotspots with the choice of your primary key. For example, if you insert records with a monotonically increasing integer as the key, you'll always insert at the end of your key space. This is undesirable because Cloud Spanner divides data among servers by key ranges, which means your inserts will be directed at a single server, creating a hotspot. A common technique for spreading the load across multiple servers is to hash the actual unique key, and use the hash (or the hash + the unique key) as the primary key. This will help ensure that inserted records are spread across your key space to avoid hotspots. Read more about this technique in Best Practices.
You can define hierarchies of parent-child relationships between tables up to seven layers deep, which means you can co-locate rows of seven logically independent tables. If the size of the data in your tables is small, your database can probably be handled by a single Cloud Spanner server. But what happens when your related tables grow and start reaching the resource limits of an individual server? Cloud Spanner is a distributed database, which means that as your database grows, Cloud Spanner divides your data into chunks called "splits", where individual splits can move independently from each other and get assigned to different servers, which can be in different physical locations. A split is defined as a range of rows in a top-level (in other words, non-interleaved) table, where the rows are ordered by primary key. The start and end keys of this range are called "split boundaries". Cloud Spanner automatically adds and removes split boundaries, which changes the number of splits in the database.
Cloud Spanner splits data based on load: it adds split boundaries automatically when it detects high read or write load spread among many keys in a split. You have some control over how your data is split because Cloud Spanner can only draw split boundaries between rows of tables that are at the root of a hierarchy (i.e. tables that are not interleaved in a parent table). Additionally, rows of an interleaved table cannot be split from their corresponding row in their parent table because the rows of the interleaved table are stored in sorted primary key order together with the row from their parent table that shares the same primary key prefix. (See an illustration of this in the physical view of schema example 4 below.) Thus, the parent-child table relationships that you define, along with the primary key values that you set for rows of related tables, give you control over how data is sharded under the hood.
As an example of how Cloud Spanner performs load-based splitting to mitigate read hotspots, suppose your database contains a table with 10 rows that are read more frequently than all of the other rows in the table. As long as that table is at the root of the database hierarchy (in other words, it's not an interleaved table), Cloud Spanner can add split boundaries between each of those 10 rows so that they're each handled by a different server, rather than allowing all the reads of those rows to consume the resources of a single server.
As a rule of thumb, if you follow best practices for schema design, Cloud Spanner can mitigate hotspots on reads that target rows of a non-interleaved table such that the read throughput should improve every few minutes until you saturate the resources in your instance, or you run into cases where no new split boundaries can be added (because you have a split that covers just a single row and its interleaved children).
Split size limits
As a rule of thumb, the size of every set of related rows in a hierarchy of parent-child tables should be less than a few GiB. A set of related rows in a hierarchy of parent-child tables is defined as: (a single row of a table at the root of a database hierarchy) + (all rows of that table's descendent tables that share the row's primary key) + (all rows of interleaved indexes that share the row's primary key).
The schema examples below show how to create Cloud Spanner tables with and without parent-child relationships and illustrate the corresponding physical layouts of data.
Creating a table
Suppose you're creating a music application and you need a simple table that stores rows of singer data:
Logical view of rows in a simple Singers table. The primary key column appears to the left of the bolded line.
Note that the table contains one primary key column,
SingerId, which appears
to the left of the bolded line, and that tables are organized by rows, columns,
You can define the table with a Cloud Spanner schema like this:
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), ) PRIMARY KEY (SingerId);
Schema example 1 defines the Singers table
Note the following about the example schema:
Singersis a table at the root of the database hierarchy (because it's not defined as a child of another table).
- Primary key columns are usually annotated with
NOT NULL(though you can omit this annotation if you want to allow
NULLvalues in key columns; see more in Notes About Key Columns).
- Columns that are not included in the primary key are called non-key columns,
and they can have an optional
- Columns that use the
BYTEStype must be defined with a length, which represents the maximum number of Unicode characters that can be stored in the field. (More details in Scalar Data Types.)
What does the physical layout of the rows in the
Singers table look like? The
diagram below shows rows of the Singers table stored by contiguous (aka sorted
order of) primary key (i.e. "Singers(1)", then "Singers(2)", and so on, where
"Singers(1)" represents the row in the Singers table keyed by 1.).
Physical layout of rows in the Singers table, with an example split boundary that results in splits handled by different servers.
The diagram also illustrates possible split boundaries, which can occur between any
Singers is at the root of the database
hierarchy. It also illustrates an example split boundary between the rows keyed
Singers(4), with the data from the resulting splits
assigned to different servers. This means that as this table grows,
it's possible for rows of
Singers data to be stored in different locations.
Creating multiple tables
Assume you now want to add some basic data about each singer's albums to the music application:
Logical view of rows in an Albums table. Primary key columns appear to the left of the bolded line
Note that the primary key of
Albums is composed of two columns:
AlbumId, to associate each album with its singer. The following example schema
defines both the
Singers tables at the root of the database
hierarchy, which makes them sibling tables:
-- Schema hierarchy: -- + Singers (sibling table of Albums) -- + Albums (sibling table of Singers)
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId);
Schema example 2 defines the Singers and Albums tables, both at the root of the database hierarchy.
The physical layout of the rows of
Albums looks like this, with
rows of the
Albums table stored by contiguous primary key, then rows of
Singers stored by contiguous primary key:
Physical layout of rows of Singers and Albums tables, both at the root of the database hierarchy.
One important note about the schema above is that Cloud Spanner assumes no
data locality relationships between the
Albums tables, because
they are sibling tables, i.e. at the same level of hierarchy in the database. As
the database grows, Cloud Spanner can add split boundaries between any of the
rows shown above. This means the rows of the
Albums table could end up in a
different split from the rows of the
Singers table, and the two splits could
move independently from each other.
Depending on your application's needs, it might be fine to allow
to be located on different splits from
Singers data. However, if your
application frequently needs to retrieve information about all the albums for a
given singer, then you should create
Albums as a child table of
which co-locates rows from the two tables along the primary key dimension. The
next example explains this in more detail.
Creating interleaved tables
As you're designing your music application, suppose you realize that the app
needs to frequently access rows from both
Albums tables for a
given primary key (e.g. each time you access the row
Singers(1), you also need
to access the rows
Albums(1, 1) and
Albums(1, 2)). In other words,
Albums need to have a strong data locality relationship.
You can declare this data locality relationship by creating
Albums as a child,
or "interleaved", table of
Singers. As mentioned above, an
interleaved table is a table that you declare to be a child of another table
because you want the rows of the child table to be physically stored together
with the associated parent row. As mentioned above, the prefix of the primary
key of a child table must be the primary key of the parent table.
The bolded line in the schema below shows how to create
Albums as an
interleaved table of
-- Schema hierarchy: -- + Singers -- + Albums (interleaved table, child table of Singers)
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId), INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
Schema example 3 defines Albums as a child of Singers, with Singers at the root of the database hierarchy.
Notes about this schema:
SingerId, which is the prefix of the primary key of the child table
Albums, is also the primary key of its parent table
Singers. This is not required if
Albumsare at the same level of the hierarchy, but is required in this schema because
Albumsis declared to be a child table of
ON DELETE CASCADEannotation signifies that when a row from the parent table is deleted, its child rows in this table will automatically be deleted as well (i.e., all rows that start with the same primary key). If a child table does not have this property, then you must delete the child rows before you can delete the parent row.
- Interleaved rows are ordered first by rows of the parent table, then by contiguous rows of the child table that share the parent's primary key, i.e. "Singers(1)", then "Albums(1, 1)", then "Albums(1, 2)", and so on.
- The data locality relationship of each singer and his or her album data would
be preserved if this database splits, because splits can only be inserted
between rows of the
Physical layout of rows of Singers and its child table Albums.
Creating a hierarchy of interleaved tables
The parent-child relationship between
Albums can be extended to
more descendant tables. For example, you could create an interleaved table
Songs as a child of
Albums to store the track list of each album:
Logical view of rows in an Songs table. Primary key columns appear to the left of the bolded line
Songs must have a primary key that's composed of all the primary keys of the
tables above it in the hierarchy, i.e.
-- Schema hierarchy: -- + Singers -- + Albums (interleaved table, child table of Singers) -- + Songs (interleaved table, child table of Albums)
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId), INTERLEAVE IN PARENT Singers ON DELETE CASCADE; CREATE TABLE Songs ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, TrackId INT64 NOT NULL, SongName STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId, TrackId), INTERLEAVE IN PARENT Albums ON DELETE CASCADE;
Schema example 4 defines Songs as a child of Albums, which is a child of Singers. Singers is at the root of the database hierarchy.
The physical view of interleaved rows shows that the data locality relationship is preserved between a singer and his or her albums and songs data:
Physical layout of rows of Singers, Albums, and Songs tables, which form a hierarchy of interleaved tables.
In summary, a parent table along with all of its child and descendant tables forms a hierarchy of tables in the schema. Although each table in the hierarchy is logically independent, physically interleaving them this way can improve performance, effectively pre-joining the tables and allowing you to access related rows together while minimizing disk accesses.
Interleaving tables in Cloud Spanner is not required, but is recommended for tables with strong data locality relationships. Avoid interleaving tables if there is a chance that the size of a single row and its descendents will become larger than a few GiB.
Notes about key columns
The keys of a table can't change; you can't add a key column to an existing table or remove a key column from an existing table.
Primary key columns can be defined to store NULLs. If you would like to store
NULLs in a primary key column, omit the
NOT NULL clause for that column in the
Here's an example of omitting the
NOT NULL clause on the primary key column
SingerId. Note that because
SingerId is the primary key, there can be at
most only one row in the
Singers table that stores
NULL in that column.
CREATE TABLE Singers ( SingerId INT64, FirstName STRING(1024), LastName STRING(1024), ) PRIMARY KEY (SingerId);
The nullable property of the primary key column must match between the parent
and the child table declarations. In this example,
Albums.SingerId INT64 NOT
NULL is not allowed. The key declaration must omit the
NOT NULL clause
Singers.SingerId omits it.
CREATE TABLE Singers ( SingerId INT64, FirstName STRING(1024), LastName STRING(1024), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId), INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
These cannot be of type
- A table's key columns.
- An index's key columns.
Cloud Spanner allows you to do online schema changes with no downtime. To
update the schema of an existing database, use
projects.instances.databases.updateDdl (REST API) or UpdateDatabaseDdl
(RPC API) to issue a batch of one or more of the
statements that are documented in Data Definition Language.
Cloud Spanner applies statements from the same batch in order, stopping at the first error. If applying a statement results in an error, that statement is rolled back, but the results of any previously-applied statements in a batch persist.
Cloud Spanner might interleave statements from different batches,
potentially mixing statements from different batches into the same atomic change
that is applied to the database. Within each atomic change, statements from
different batches happen in an arbitrary order. For example, if one batch of
ALTER TABLE MyTable ALTER COLUMN MyColumn STRING(50) and
another batch of statements contains
ALTER TABLE MyTable ALTER COLUMN MyColumn
STRING(20), Cloud Spanner will leave that column in one of those two states,
but it's not specified which.
Supported schema updates
Cloud Spanner supports the following updates to the schema of an existing database:
- Add new tables. (Columns in such tables can be
- Entirely drop a table, as long as no other tables or indexes are interleaved within it.
- Add a non-primary-key column to any table. (Such columns cannot be
- Remove a non-primary-key column from any table, unless it is used by a secondary index.
- Add or remove any secondary index.
- Change a
STRINGtype to a
BYTEStype or a
BYTEStype to a
- Increase or decrease the length limit for a
BYTEStype (including to
MAX), unless it is a primary key column inherited by one or more child tables.
- Add or remove the
NOT NULLannotation to or from a non-primary-key column.
- Enable commit timestamps in value and primary key columns. For details, see Commit Timestamps.
Schema change performance
As mentioned above, schema changes in Cloud Spanner do not require downtime. This means that when you issue a batch of update statements in DDL to a Cloud Spanner database, you can continue writing and reading from the database without interruption while Cloud Spanner applies the changes as a long running operation.
The amount of time it takes for Cloud Spanner to apply the changes that result
from any individual statement depends on if that change requires Cloud Spanner
to validate existing data or backfill any data. For example, if you add the
NULL annotation to an existing column, Cloud Spanner must read all values of
existing data for that column to make sure that it does not contain any
values. This step can potentially take a long time if there is a lot of data to
validate. Another example is if you're adding an index to a database:
Cloud Spanner backfills the index using existing data, and that process can
take a long time depending on how the index is defined and the size of the
corresponding base table. However, if you add a new column to a table, there is
no existing data to validate so Cloud Spanner can make the change within
In summary, schema changes that do not require Cloud Spanner to validate existing data can happen in minutes. Schema changes that do require validation can take longer, depending on the amount of existing data that needs to be validated, but data validation happens in the background without impacting live traffic. Schema changes that require data validation are discussed in more detail in the next section.
Schema changes that require data validation
As described above, Cloud Spanner allows you to change schemas in ways that require validating that the existing data meets the new constraints. When such a schema change is executed, Cloud Spanner disallows conflicting schema changes to the affected schema entities and validates the data in the background. If validation is successful, the schema change completes successfully. If the validation is not successful, the schema change is cancelled with an error message.
For example, suppose you have defined a table
Songwriters in your schema:
CREATE TABLE Songwriters ( Id INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), Nickname STRING(MAX), OpaqueData BYTES(MAX), ) PRIMARY KEY (Id);
Schema example for schema changes that require validation.
Then the following schema changes are allowed, but require validation and might take time to complete, depending on the amount of existing data:
NOT NULLannotation to a non-primary-key column. Example:
ALTER TABLE Songwriters ALTER COLUMN Nickname STRING(MAX) NOT NULL
Reducing the length of a column. Example:
ALTER TABLE Songwriters ALTER COLUMN FirstName STRING(10)
ALTER TABLE Songwriters ALTER COLUMN OpaqueData STRING(MAX)
It's important to remember that these schema changes fail if the underlying data
does not satisfy the new constraints. For example, the
ALTER TABLE Songwriters
ALTER COLUMN Nickname STRING(MAX) NOT NULL statement above fails if any
Nickname columns contains a
NULL, because the existing data does not meet
the constraints of the new definition.
Data validation can take from several minutes to many hours. The time to complete data validation depends on:
- the size of the data set
- the number of nodes in the instance
- the load on the instance
Some schema changes can alter the behavior of requests to the database before
the change completes. For example, if you're adding
NOT NULL to a column,
Cloud Spanner almost immediately begins rejecting writes for new requests that
NULL for the column. If the new schema change ultimately fails for data
validation, there will have been a period of time when writes were blocked,
even if they would have been accepted by the old schema.
You can cancel a long-running data validation operation using the projects.instances.databases.operations.cancel method.
Tips for updating schemas
Before you issue a schema update:
- Verify that all of the existing data in the database that you're making
changes to meet the constraints that the schema change is imposing. Here are
some common examples:
- If you're adding a
NOT NULLannotation to an existing column, check that the column does not contain any existing
- If you're shortening the allowed length of a string or bytes column, check that all existing values in that column meet the desired length constraint.
- If you're adding a
- Remember that the success of a schema update can depend on all of the existing data in the database meeting the new constraint, not only on the actual schema itself. This means if you have a test database that uses the same schema as a production database, a successful update of the test database does not guarantee a successful update of the production database.
- If you're writing to a column, table, or index that is undergoing a schema change, ensure that the values that you're writing meet the new constraints.
- If you're dropping a column, table, or index, make sure you are not still writing to or reading from it.