This page discusses schemas and introduces interleaved tables, which can improve query performance when querying tables in a parent-child relationship.
Cloud Spanner databases contain one or more tables. Tables are structured as rows and columns. One or more of the columns are defined as the table's primary key, which uniquely identifies each row. Primary keys are always indexed for quick row lookup, and you can define secondary indexes on one or more columns.
Data in Cloud Spanner is strongly typed. You must define a schema for each database, and that schema must specify the data type of each column of each table. Data types include scalar and complex types, which are described in Data types in Google Standard SQL for Google Standard SQL-dialect databases and PostgreSQL data types for PostgreSQL-dialect databases.
Parent-child table relationships
There are two ways to define parent-child relationships in Cloud Spanner: table interleaving and foreign keys.
Cloud Spanner's table interleaving is a good choice for many parent-child
relationships. With interleaving, Cloud Spanner physically co-locates child
rows with parent rows in storage. Co-location can
significantly improve performance. For example, if you have a Customers
table
and an Invoices
table, and your application frequently fetches all the
invoices for a customer, you can define Invoices
as an interleaved child
table of Customers
. In doing so, you're declaring a data locality
relationship between two independent tables. You're telling Cloud Spanner
to store one or more rows of Invoices
with one Customers
row.
You associate a child table with a parent table by using DDL that declares the child table as interleaved in the parent, and by including the parent table primary key as the first part of the child table composite primary key. For more information about interleaving, see Create interleaved tables later in this topic.
Foreign keys are a more general parent-child solution and address additional use cases. They are not limited to primary key columns, and tables can have multiple foreign key relationships, both as a parent in some relationships and a child in others. However, a foreign key relationship does not imply co-location of the tables in the storage layer.
Google recommends that you choose to represent parent-child relationships either as interleaved tables or as foreign keys, but not both. For more information on foreign keys and their comparison to interleaved tables, see Foreign keys overview.
Primary keys in interleaved tables
How do you tell Cloud Spanner which Invoices
rows to store with which
Customers
rows? You do so by using the primary key of these tables. For
interleaving, every table
must have a primary key. If you declare a table to be an
interleaved child of another table,
the table must have a composite primary key that includes all of the components
of the parent's primary key, in the same order, and, typically, one or more
additional child table columns.
Cloud Spanner stores rows in sorted order by primary key values, with child rows inserted between parent rows. See an illustration of interleaved rows in Create interleaved tables later in this topic.
In summary, Cloud Spanner can physically co-locate rows of related tables. The schema examples show what this physical layout looks like.
Choose a primary key
The primary key uniquely identifies each row in a table. If you want to update or delete existing rows in a table, then the table must have a primary key composed of one or more columns. A table with no primary key columns can have only one row. Only Google Standard SQL-dialect databases can have tables without a primary key.
Often your application already has a field that's a natural
fit for use as the primary key. For example, for a Customers
table,
there might be an application-supplied CustomerId
that serves well as
the primary key. In other cases, you may need to generate a primary key when
inserting the row. This would typically be a unique integer value with no
business significance (a surrogate primary key).
In all cases, you should be careful not to create hotspots with the choice of your primary key. For example, if you insert records with a monotonically increasing integer as the key, you'll always insert at the end of your key space. This is undesirable because Cloud Spanner divides data among servers by key ranges, which means your inserts will be directed at a single server, creating a hotspot. There are techniques that can spread the load across multiple servers and avoid hotspots:
- Hash the key and store it in a column. Use the hash column (or the hash column and the unique key columns together) as the primary key.
- Swap the order of the columns in the primary key.
- Use a Universally Unique Identifier (UUID). Version 4 UUID is recommended, because it uses random values in the high-order bits. Don't use a UUID algorithm (such as version 1 UUID) that stores the timestamp in the high order bits.
- Bit-reverse sequential values.
Add secondary indexes based on primary keys
In certain circumstances, your database usage can benefit from adding secondary indexes based on primary keys. This is particularly true if you frequently run queries that require reverse-order scans of a table's primary key.
Database splits
You can define hierarchies of interleaved parent-child relationships up to seven layers deep, which means that you can co-locate rows of seven independent tables. If the size of the data in your tables is small, a single Cloud Spanner server can probably handle your database. But what happens when your related tables grow and start reaching the resource limits of an individual server? Cloud Spanner is a distributed database, which means that as your database grows, Cloud Spanner divides your data into chunks called "splits." Individual splits can move independently from each other and get assigned to different servers, which can be in different physical locations. A split holds a range of contiguous rows. The start and end keys of this range are called "split boundaries". Cloud Spanner automatically adds and removes split boundaries based on size and load, which changes the number of splits in the database.
Load-based splitting
As an example of how Cloud Spanner performs load-based splitting to mitigate read hotspots, suppose your database contains a table with 10 rows that are read more frequently than all of the other rows in the table. Cloud Spanner can add split boundaries between each of those 10 rows so that they're each handled by a different server, rather than allowing all the reads of those rows to consume the resources of a single server.
As a general rule, if you follow best practices for schema design, Cloud Spanner can mitigate hotspots such that the read throughput should improve every few minutes until you saturate the resources in your instance or run into cases where no new split boundaries can be added (because you have a split that covers just a single row with no interleaved children).
Schema examples
The schema examples below show how to create parent and child tables with and without interleaving, and illustrate the corresponding physical layouts of data.
Create a parent table
Suppose you're creating a music application and you need a simple table that stores rows of singer data:
Note that the table contains one primary key column, SingerId
, which appears
to the left of the bolded line, and that tables are organized by rows
and columns.
You can define the table with a Cloud Spanner schema like this:
Google Standard SQL
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), ) PRIMARY KEY (SingerId);
PostgreSQL
CREATE TABLE singers ( singer_id BIGINT PRIMARY KEY, first_name VARCHAR(1024), last_name VARCHAR(1024), singer_info BYTEA );
Note the following about the example schema:
Singers
is a table at the root of the database hierarchy (because it's not defined as an interleaved child of another table).- For Google Standard SQL-dialect databases, primary key columns are usually annotated with
NOT NULL
(though you can omit this annotation if you want to allowNULL
values in key columns. For more information, see Key Columns). - Columns that are not included in the primary key are called non-key columns,
and they can have an optional
NOT NULL
annotation. - Columns that use the
STRING
orBYTES
type in Google Standard SQL must be defined with a length, which represents the maximum number of Unicode characters that can be stored in the field. The length specification is optional for the PostgreSQLvarchar
andcharacter varying
types. For more information, see Scalar Data Types for Google Standard SQL-dialect databases and PostgreSQL data types for PostgreSQL-dialect databases.
What does the physical layout of the rows in the Singers
table look like? The
following diagram shows rows of the Singers
table stored by primary key
("Singers(1)", then "Singers(2)", and so
on, where "Singers(1)" represents the row in the Singers table keyed by 1).
The preceding diagram illustrates an example split boundary between the rows keyed
by Singers(3)
and Singers(4)
, with the data from the resulting splits
assigned to different servers. As this table grows, it's
possible for rows of Singers
data to be stored in different locations.
Create parent and child tables
Assume that you now want to add some basic data about each singer's albums to the music application.
Note that the primary key of Albums
is composed of two columns: SingerId
and
AlbumId
, to associate each album with its singer. The following example schema
defines both the Albums
and Singers
tables at the root of the database
hierarchy, which makes them sibling tables.
-- Schema hierarchy: -- + Singers (sibling table of Albums) -- + Albums (sibling table of Singers)
Google Standard SQL
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId);
PostgreSQL
CREATE TABLE singers ( singer_id BIGINT PRIMARY KEY, first_name VARCHAR(1024), last_name VARCHAR(1024), singer_info BYTEA ); CREATE TABLE albums ( singer_id BIGINT, album_id BIGINT, album_title VARCHAR, PRIMARY KEY (singer_id, album_id) );
The physical layout of the rows of Singers
and Albums
looks like the following
diagram, with rows of the Albums
table stored by contiguous primary key, then
rows of Singers
stored by contiguous primary key:
One important note about the schema is that Cloud Spanner assumes no
data locality relationships between the Singers
and Albums
tables, because
they are top-level tables. As the database grows, Cloud Spanner can add split
boundaries between any of the rows. This means the rows of the
Albums
table could end up in a different split from the rows of the Singers
table, and the two splits could move independently from each other.
Depending on your application's needs, it might be fine to allow Albums
data
to be located on different splits from Singers
data. However, this might incur
a performance penalty due to the need to coordinate reads and updates across
distinct resources. If your
application frequently needs to retrieve information about all the albums for a
particular singer, then you should create Albums
as an interleaved child table
of Singers
, which co-locates rows from the two tables along the primary key
dimension. The next example explains this in more detail.
Create interleaved tables
An interleaved table is a table that you declare to be an interleaved child of another table because you want the rows of the child table to be physically stored with the associated parent row. As mentioned earlier, the parent table primary key must be the first part of the child table composite primary key.
As you're designing your music application, suppose you realize that the app
needs to frequently access child rows from the Albums
table when it accesses a
Singers
row. For example, when you access the row Singers(1)
, you also need
to access the rows Albums(1, 1)
and Albums(1, 2)
). In this case, Singers
and Albums
need to have a strong data locality relationship.
You can declare this data locality relationship by creating Albums
as an
interleaved child table of Singers
.
The bolded line in the schema below shows how to create Albums
as an
interleaved table of Singers
.
-- Schema hierarchy: -- + Singers -- + Albums (interleaved table, child table of Singers)
Google Standard SQL
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId), INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
PostgreSQL
CREATE TABLE singers ( singer_id BIGINT PRIMARY KEY, first_name VARCHAR(1024), last_name VARCHAR(1024), singer_info BYTEA ); CREATE TABLE albums ( singer_id BIGINT, album_id BIGINT, album_title VARCHAR, PRIMARY KEY (singer_id, album_id) ) INTERLEAVE IN PARENT singers ON DELETE CASCADE;
Notes about this schema:
SingerId
, which is the first part of the primary key of the child tableAlbums
, is also the primary key of its parent tableSingers
. This is not required ifSingers
andAlbums
are at the same level of the hierarchy, but is required in this schema becauseAlbums
is declared to be an interleaved child table ofSingers
.- The
ON DELETE CASCADE
annotation signifies that when a row from the parent table is deleted, its child rows are automatically deleted as well. If a child table doesn't have this annotation, or the annotation isON DELETE NO ACTION
, then you must delete the child rows before you can delete the parent row. - Interleaved rows are ordered first by rows of the parent table, then by contiguous rows of the child table that share the parent's primary key. For example, "Singers(1)", then "Albums(1, 1)", then "Albums(1, 2)", and so on.
- The data locality relationship of each singer and their album data is
preserved if this database splits, provided that the size of a
Singers
row and all itsAlbums
rows stays below the split size limit and that there is no hotspot in any of theseAlbums
rows. - The parent row must exist before you can insert child rows. The parent row can either already exist in the database or can be inserted before the insertion of the child rows in the same transaction.
Create a hierarchy of interleaved tables
The parent-child relationship between Singers
and Albums
can be extended to
more descendant tables. For example, you could create an interleaved table
called Songs
as a child of Albums
to store the track list of each album:
Songs
must have a primary key that includes all the primary keys of the
tables above it in the hierarchy, that is, SingerId
and AlbumId
.
-- Schema hierarchy: -- + Singers -- + Albums (interleaved table, child table of Singers) -- + Songs (interleaved table, child table of Albums)
Google Standard SQL
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId), INTERLEAVE IN PARENT Singers ON DELETE CASCADE; CREATE TABLE Songs ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, TrackId INT64 NOT NULL, SongName STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId, TrackId), INTERLEAVE IN PARENT Albums ON DELETE CASCADE;
PostgreSQL
CREATE TABLE singers ( singer_id BIGINT PRIMARY KEY, first_name VARCHAR(1024), last_name VARCHAR(1024), singer_info BYTEA ); CREATE TABLE albums ( singer_id BIGINT, album_id BIGINT, album_title VARCHAR, PRIMARY KEY (singer_id, album_id) ) INTERLEAVE IN PARENT singers ON DELETE CASCADE; CREATE TABLE songs ( singer_id BIGINT, album_id BIGINT, track_id BIGINT, song_name VARCHAR, PRIMARY KEY (singer_id, album_id, track_id) ) INTERLEAVE IN PARENT albums ON DELETE CASCADE;
The following diagram represents a physical view of interleaved rows.
In this example, as the number of singers grows, Cloud Spanner adds split boundaries between singers to preserve data locality between a singer and its album and song data. However, if the size of a singer row and its child rows exceeds the split size limit, or a hotspot is detected in the child rows, Cloud Spanner attempts to add split boundaries to isolate that hotspot row along with all child rows below it.
In summary, a parent table along with all of its child and descendant tables forms a hierarchy of tables in the schema. Although each table in the hierarchy is logically independent, physically interleaving them this way can improve performance, effectively pre-joining the tables and allowing you to access related rows together while minimizing storage accesses.
Joins with interleaved tables
If possible, join data in interleaved tables by primary key. Because each
interleaved row is usually stored physically in the same split as its parent
row, Cloud Spanner can perform joins by primary key locally, minimizing
storage access and network traffic. In the following example,
Singers
and Albums
are joined on the primary key SingerId
.
Google Standard SQL
SELECT s.FirstName, a.AlbumTitle FROM Singers AS s JOIN Albums AS a ON s.SingerId = a.SingerId;
PostgreSQL
SELECT s.first_name, a.album_title FROM singers AS s JOIN albums AS a ON s.singer_id = a.singer_id;
Key columns
The keys of a table can't change; you can't add a key column to an existing table or remove a key column from an existing table.
Store NULLs
In Google Standard SQL, if you
would like to store NULL in a primary key column, omit the NOT NULL
clause
for that column in the schema. (PostgreSQL-dialect databases don't support NULLs in a primary
key column.)
Here's an example of omitting the NOT NULL
clause on the primary key column
SingerId
. Note that because SingerId
is the primary key, there can be at
most only one row in the Singers
table that stores NULL
in that column.
CREATE TABLE Singers ( SingerId INT64, FirstName STRING(1024), LastName STRING(1024), ) PRIMARY KEY (SingerId);
The nullable property of the primary key column must match between the parent
and the child table declarations. In this example, Albums.SingerId INT64 NOT
NULL
is not allowed. The key declaration must omit the NOT NULL
clause
because Singers.SingerId
omits it.
CREATE TABLE Singers ( SingerId INT64, FirstName STRING(1024), LastName STRING(1024), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, -- NOT ALLOWED! AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId), INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
Disallowed types
These cannot be of type ARRAY
:
- A table's key columns.
- An index's key columns.
Design for multi-tenancy
You might want to provide multi-tenancy if you are storing data that belongs to different customers. For example, a music service might want to store each individual record label's content separately.
Classic multi-tenancy
The classic way to design for multi-tenancy is to create a separate database
for each customer. In this example, each database has its own Singers
table:
SingerId | FirstName | LastName |
---|---|---|
1 | Marc | Richards |
2 | Catalina | Smith |
SingerId | FirstName | LastName |
---|---|---|
1 | Alice | Trentor |
2 | Gabriel | Wright |
SingerId | FirstName | LastName |
---|---|---|
1 | Benjamin | Martinez |
2 | Hannah | Harris |
Schema-managed multi-tenancy
Another way to design for multi-tenancy in Cloud Spanner is to
have all customers in a single table in a single database, and to use a
different primary key value for each customer. For example, you could include a
CustomerId
key column in your tables. If you make CustomerId
the first key
column, then the data for each customer has good locality. Cloud Spanner
can then effectively use database splits to maximize
performance based on data size and load patterns. In the following example,
there is a single Singers
table for all customers:
CustomerId | SingerId | FirstName | LastName |
---|---|---|---|
1 | 1 | Marc | Richards |
1 | 2 | Catalina | Smith |
2 | 1 | Alice | Trentor |
2 | 2 | Gabriel | Wright |
3 | 1 | Benjamin | Martinez |
3 | 2 | Hannah | Harris |
If you must have separate databases for each tenant, there are constraints to be aware of:
- There are limits on the number of databases per instance and the number of tables and indexes per database. Depending on the number of customers, it might not be possible to have separate databases or tables.
- Adding new tables and non-interleaved indexes can take a long time. You might not be able to get the performance you want if your schema design depends on adding new tables and indexes.
If you want to create separate databases, you might have more success if you distribute your tables across databases in such a way that each database has a low number of schema changes per week.
If you create separate tables and indexes for each customer of your application, do not put all of the tables and indexes in the same database. Instead, split them across many databases, to mitigate the performance issues with creating a large number of indexes.
To learn more about other data management patterns and application design for multi-tenancy, see Implementing Multi-Tenancy in Cloud Spanner