Data model summary
A Cloud Spanner database can contain one or more tables. Tables look like relational database tables in that they are structured with rows, columns, and values, and they contain primary keys. Data in Cloud Spanner is strongly typed: you must define a schema for each database and that schema must specify the data types of each column of each table. Allowable data types include scalar and array types, which are explained in more detail in Data types. You can also define one or more secondary indexes on a table.
Parent-child table relationships
There are two ways to define parent-child relationships in Cloud Spanner, table interleaving and foreign keys.
Cloud Spanner's table interleaving is a good choice for many parent-child
relationships where the child table's primary key includes the parent table's
primary key columns. The co-location of child rows with their parent rows can
significantly improve performance. For example, if you have a
Invoices table, and your application frequently fetches all the
invoices for a given customer, you can define
Invoices as a child table of
Customers. In doing so, you're declaring a data locality relationship between
two logically independent tables: you're telling Cloud Spanner to physically
store one or more rows of
Invoices with one
For a deeper discussion about interleaving, refer to Creating interleaved tables below.
Foreign keys are a more general parent-child solution and address additional use cases. They are not limited to primary key columns, and tables can have multiple foreign key relationships, both as a parent in some relationships and a child in others. However, a foreign key relationship does not imply co-location of the tables in the storage layer.
Google recommends that you choose to represent parent-child relationships either as interleaved tables or as foreign keys, but not both. For more information on foreign keys and their comparison to interleaved tables, see Foreign keys overview.
How do you tell Cloud Spanner which
Invoices rows to store with which
Customers rows? You do so using the primary key of these tables. Every table
must have a primary key, and that primary key can be composed of zero or more
columns of that table. If you declare a table to be a child of another table,
the primary key column(s) of the parent table must be the prefix of the primary
key of the child table. This means if a parent table's primary key is composed
of N columns, the primary key of each of its child tables must also be composed
of those same N columns, in the same order and starting with the same column.
Cloud Spanner stores rows in sorted order by primary key values, with child rows inserted between parent rows that share the same primary key prefix. This insertion of child rows between parent rows along the primary key dimension is called interleaving, and child tables are also called interleaved tables. (See an illustration of interleaved rows in the Creating interleaved tables below.)
In summary, Cloud Spanner can physically co-locate rows of related tables. The schema examples below show what this physical layout looks like.
Choosing a primary key
The primary key uniquely identifies each row in a table. If you want to update
or delete existing rows in a table, then the table must have a primary key
composed of one or more columns. (A table with no primary key columns can
have only one row.) Often your application already has a field that's a natural
fit for use as the primary key. For example, in the
Customers table example
above, there might be an application-supplied
CustomerId that serves well as
the primary key. In other cases, you may need to generate a primary key when
inserting the row, like a unique
INT64 value that you generate.
In all cases, you should be careful not to create hotspots with the choice of your primary key. For example, if you insert records with a monotonically increasing integer as the key, you'll always insert at the end of your key space. This is undesirable because Cloud Spanner divides data among servers by key ranges, which means your inserts will be directed at a single server, creating a hotspot. There are techniques that can spread the load across multiple servers and avoid hotspots:
- Hash the key and store it in a column. Use the hash column (or the hash column and the unique key columns together) as the primary key.
- Swap the order of the columns in the primary key.
- Use a Universally Unique Identifier (UUID). Version 4 UUID is recommended, because it uses random values in the high-order bits. Don't use a UUID algorithm (such as version 1 UUID) that stores the timestamp in the high order bits.
- Bit-reverse sequential values.
Adding secondary indexes based on primary keys
In certain circumstances, your database usage can benefit from adding secondary indexes based on primary keys. This is particularly true if you frequently run queries that require reverse-order scans of a table's primary key.
You can define hierarchies of parent-child relationships between tables up to seven layers deep, which means you can co-locate rows of seven logically independent tables. If the size of the data in your tables is small, your database can probably be handled by a single Cloud Spanner server. But what happens when your related tables grow and start reaching the resource limits of an individual server? Cloud Spanner is a distributed database, which means that as your database grows, Cloud Spanner divides your data into chunks called "splits", where individual splits can move independently from each other and get assigned to different servers, which can be in different physical locations. A split holds a range of contiguous rows. The start and end keys of this range are called "split boundaries". Cloud Spanner automatically adds and removes split boundaries based on size and/or load, which changes the number of splits in the database.
As an example of how Cloud Spanner performs load-based splitting to mitigate read hotspots, suppose your database contains a table with 10 rows that are read more frequently than all of the other rows in the table. Cloud Spanner can add split boundaries between each of those 10 rows so that they're each handled by a different server, rather than allowing all the reads of those rows to consume the resources of a single server.
As a general rule, if you follow best practices for schema design, Cloud Spanner can mitigate hotspots such that the read throughput should improve every few minutes until you saturate the resources in your instance or run into cases where no new split boundaries can be added (because you have a split that covers just a single row with no interleaved children).
The schema examples below show how to create Cloud Spanner tables with and without parent-child relationships and illustrate the corresponding physical layouts of data.
Creating a table
Suppose you're creating a music application and you need a simple table that stores rows of singer data:
Logical view of rows in a simple Singers table. The primary key column appears to the left of the bolded line.
Note that the table contains one primary key column,
SingerId, which appears
to the left of the bolded line, and that tables are organized by rows, columns,
You can define the table with a Cloud Spanner schema like this:
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), ) PRIMARY KEY (SingerId);
Note the following about the example schema:
Singersis a table at the root of the database hierarchy (because it's not defined as a child of another table).
- Primary key columns are usually annotated with
NOT NULL(though you can omit this annotation if you want to allow
NULLvalues in key columns; see more in Key Columns).
- Columns that are not included in the primary key are called non-key columns,
and they can have an optional
- Columns that use the
BYTEStype must be defined with a length, which represents the maximum number of Unicode characters that can be stored in the field. (More details in Scalar Data Types.)
What does the physical layout of the rows in the
Singers table look like? The
diagram below shows rows of the
Singers table stored by contiguous (that is,
sorted order of) primary key (that is, "Singers(1)", then "Singers(2)", and so
on, where "Singers(1)" represents the row in the Singers table keyed by 1).
Physical layout of rows in the Singers table, with an example split boundary that results in splits handled by different servers.
The above diagram illustrates an example split boundary between the rows keyed
Singers(4), with the data from the resulting splits
assigned to different servers. This means that as this table grows, it's
possible for rows of
Singers data to be stored in different locations.
Creating multiple tables
Assume you now want to add some basic data about each singer's albums to the music application:
Logical view of rows in an Albums table. Primary key columns appear to the left of the bolded line
Note that the primary key of
Albums is composed of two columns:
AlbumId, to associate each album with its singer. The following example schema
defines both the
Singers tables at the root of the database
hierarchy, which makes them sibling tables:
-- Schema hierarchy: -- + Singers (sibling table of Albums) -- + Albums (sibling table of Singers)
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId);
The physical layout of the rows of
Albums looks like the
diagram, with rows of the
Albums table stored by contiguous primary key, then
Singers stored by contiguous primary key:
Physical layout of rows of Singers and Albums tables, both at the root of the database hierarchy.
One important note about the schema above is that Cloud Spanner assumes no
data locality relationships between the
Albums tables, because
they are top-level tables. As the database grows, Cloud Spanner can add split
boundaries between any of the rows shown above. This means the rows of the
Albums table could end up in a different split from the rows of the
table, and the two splits could move independently from each other.
Depending on your application's needs, it might be fine to allow
to be located on different splits from
Singers data. However, if your
application frequently needs to retrieve information about all the albums for a
given singer, then you should create
Albums as a child table of
which co-locates rows from the two tables along the primary key dimension. The
next example explains this in more detail.
Creating interleaved tables
As you're designing your music application, suppose you realize that the app
needs to frequently access rows from both
Albums tables for a
given primary key (e.g. each time you access the row
Singers(1), you also need
to access the rows
Albums(1, 1) and
Albums(1, 2)). In other words,
Albums need to have a strong data locality relationship.
You can declare this data locality relationship by creating
Albums as a child,
or "interleaved", table of
Singers. As mentioned in
Primary keys, an interleaved table is a table that you
declare to be a child of another table because you want the rows of the child
table to be physically stored together with the associated parent row. As
mentioned above, the prefix of the primary key of a child table must be the
primary key of the parent table.
The bolded line in the schema below shows how to create
Albums as an
interleaved table of
-- Schema hierarchy: -- + Singers -- + Albums (interleaved table, child table of Singers)
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId), INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
Notes about this schema:
SingerId, which is the prefix of the primary key of the child table
Albums, is also the primary key of its parent table
Singers. This is not required if
Albumsare at the same level of the hierarchy, but is required in this schema because
Albumsis declared to be a child table of
ON DELETE CASCADEannotation signifies that when a row from the parent table is deleted, its child rows in this table will automatically be deleted as well (that is, all rows that start with the same primary key). If a child table does not have this annotation, or the annotation is
ON DELETE NO ACTION, then you must delete the child rows before you can delete the parent row.
- Interleaved rows are ordered first by rows of the parent table, then by contiguous rows of the child table that share the parent's primary key, i.e. "Singers(1)", then "Albums(1, 1)", then "Albums(1, 2)", and so on.
- The data locality relationship of each singer and their album data would
be preserved if this database splits, as long as the size of a singer
row and all its
Albumsrows stay below the 8 GB and there is no hotspot in any of these
- The parent row must exist before you can insert child rows. The parent row can either already exist in the database or can be inserted before the insertion of the child rows in the same transaction.
Physical layout of rows of Singers and its child table Albums.
Creating a hierarchy of interleaved tables
The parent-child relationship between
Albums can be extended to
more descendant tables. For example, you could create an interleaved table
Songs as a child of
Albums to store the track list of each album:
Logical view of rows in an Songs table. Primary key columns appear to the left of the bolded line
Songs must have a primary key that's composed of all the primary keys of the
tables above it in the hierarchy, that is,
-- Schema hierarchy: -- + Singers -- + Albums (interleaved table, child table of Singers) -- + Songs (interleaved table, child table of Albums)
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId), INTERLEAVE IN PARENT Singers ON DELETE CASCADE; CREATE TABLE Songs ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, TrackId INT64 NOT NULL, SongName STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId, TrackId), INTERLEAVE IN PARENT Albums ON DELETE CASCADE;
The following diagram represents a physical view of interleaved rows. In this example, as the number of singers grows, Cloud Spanner adds split boundaries between singers to preserve data locality between a singer and its album and song data. However, if the size of a singer row and its child rows exceeds the 8 GB limit, or a hotspot is detected in the child rows, Cloud Spanner will attempt to add split boundaries to interleaved tables in order to isolate that hotspot row along with all child rows below it.
Below is a physical view of interleaved rows. Unless the size of a single singer row and all its child rows grows beyond the 8 GB limit, or a hotspot is detected in the child rows, Cloud Spanner prefers adding split boundaries between singers to preserve data locality between a singer and its albums and songs data.
Physical layout of rows of Singers, Albums, and Songs tables, which form a hierarchy of interleaved tables.
In summary, a parent table along with all of its child and descendant tables forms a hierarchy of tables in the schema. Although each table in the hierarchy is logically independent, physically interleaving them this way can improve performance, effectively pre-joining the tables and allowing you to access related rows together while minimizing disk accesses.
If possible, join data in interleaved tables by primary key. Because each
interleaved row is usually stored physically in the same split as its parent
row, Cloud Spanner can perform joins by primary key locally, minimizing disk
access and network traffic. In the following example,
joined on the primary key,
SELECT s.FirstName, a.AlbumTitle FROM Singers AS s JOIN Albums AS a ON s.SingerId = a.SingerId;
Interleaving tables in Cloud Spanner is recommended for one-to-many related data that is frequently accessed together.
The keys of a table can't change; you can't add a key column to an existing table or remove a key column from an existing table.
Primary key columns can be defined to store NULLs. If you would like to store
NULLs in a primary key column, omit the
NOT NULL clause for that column in the
Here's an example of omitting the
NOT NULL clause on the primary key column
SingerId. Note that because
SingerId is the primary key, there can be at
most only one row in the
Singers table that stores
NULL in that column.
CREATE TABLE Singers ( SingerId INT64, FirstName STRING(1024), LastName STRING(1024), ) PRIMARY KEY (SingerId);
The nullable property of the primary key column must match between the parent
and the child table declarations. In this example,
Albums.SingerId INT64 NOT
NULL is not allowed. The key declaration must omit the
NOT NULL clause
Singers.SingerId omits it.
CREATE TABLE Singers ( SingerId INT64, FirstName STRING(1024), LastName STRING(1024), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, -- NOT ALLOWED! AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId), INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
These cannot be of type
- A table's key columns.
- An index's key columns.
Designing for multi-tenancy
You might want to provide multi-tenancy if you are storing data that belongs to different customers. For example, a music service might want to store each individual record label's separately.
The classic way to design for multi-tenancy is to create a separate database
for each customer. In this example, each database has its own
Using a schema data management pattern for Cloud Spanner multi-tenancy
Another way to design for multi-tenancy in Cloud Spanner is to use a
different primary key value for each customer. You include a
or similar key, column in your tables. If you make
CustomerId the first key
column, then the data for each customer has good locality. Cloud Spanner
can then effectively use database splits to maximize
performance based on data size and load patterns. In this example, there is a
Singers table for all customers:
If you must have separate databases for each tenant, there are constraints to be aware of:
- There are limits on the number of databases per instance and tables per database. Depending on the number of customers, it might not be possible to have separate databases or tables.
- Adding new tables and non-interleaved indexes can take a long time. You might not be able to get the performance you want if your schema design depends on adding new tables and indexes.
If you want to create separate databases, you might have more success if you distribute your tables across databases in such a way that each database has a low number of schema changes per week.
If you create separate tables and indexes for each customer of your application, do not put all of the tables and indexes in the same database. Instead, split them across many databases, to mitigate the performance issues with creating a large number of indexes. There are also limits on the number of tables and indexes per database.
To learn more about other data management patterns and application design for multi-tenancy, see Implementing Multi-Tenancy in Cloud Spanner