This topic describes foreign keys in Cloud Spanner and how they can be used to enforce referential integrity in your database solution.
Foreign keys allow you to define relationships between tables. Cloud Spanner ensures the data integrity of these relationships is maintained.
Imagine you are the lead developer for an e-commerce business. You are designing a database to process customer orders. The database needs to store information about each order, customer and product. Figure 1 illustrates the basic database structure for the application.
Figure 1. Diagram of an order processing database
You define a
Customers table to store customer information, an
to keep track of all orders made, and a
Products table to store information
about each product that customers can order.
Figure 1 also shows links between the tables that map to the following real- world relationships:
An order was placed by a customer
An order was placed for a product
You decide your database should enforce the following rules to make sure orders in our system are valid.
You can't create an order for a customer that doesn't exist.
A customer can't place an order for a product you don't carry.
When we enforce these rules, or constraints, we say that we are maintaining the referential integrity of our data. When a database maintains referential integrity, all attempts to add invalid data, which would result in invalid links or references between data, will fail. Referential integrity prevents user errors. Cloud Spanner enforces referential integrity through foreign keys.
Enforcing referential integrity with foreign keys
Let's look at our order processing example again, with more detail added to the design as shown in Figure 2.
Figure 2. Diagram of our database schema with foreign keys
The design now shows column names and types in each table. The
also defines two foreign key relationships.
FK_CustomerOrder ensures that all
Orders have a valid
FK_ProductOrder foreign key
ensures that all
ProductID values in the
Orders table are valid. The
following table maps these constraints back to the real-world rules we want to
|Foreign Key Name||Constraint||Real-world description|
|FK_CustomerOrder||Ensures that all rows in
||An order was placed by a valid customer|
|FK_ProductOrder||Ensures that all rows in
||An order was placed for a valid product|
Cloud Spanner fails any transaction that attempts to insert or update a row in
Orders table that has a
ProductID not found in the
Products tables. It also fails transactions that attempt to
update or delete rows in the
Products tables that would
invalidate the ids in the
Orders table. For more details about how
Cloud Spanner validates constraints, refer to the Transaction constraint
Defining foreign keys
Foreign keys are created and removed from your Cloud Spanner database
using DDL. Foreign keys are added to a new table with the
TABLE statement. Similarly, a foreign key can be added to, or removed from,
an existing table with the
ALTER TABLE statement. The following is an
example of creating a new table with a foreign key.
CREATE TABLE Orders ( OrderID INT64 NOT NULL, CustomerID INT64 NOT NULL, Quantity INT64 NOT NULL, ProductID INT64 NOT NULL, CONSTRAINT FK_CustomerOrder FOREIGN KEY (CustomerID) REFERENCES Customers (CustomerID) ) PRIMARY KEY (OrderID);
For more examples of how to create and manage foreign keys, see Create and manage foreign key relationships.
The following is a list of characteristics of foreign keys in Cloud Spanner.
The table that defines the foreign key is the referencing table and foreign key columns are the referencing columns.
The foreign key references the referenced columns of the referenced table.
As in the example above, you can name each foreign key constraint. If not, Cloud Spanner generates a name automatically. The generated name is queryable from Cloud Spanner's INFORMATION_SCHEMA. Constraint names are scoped to the schema, along with the names for tables and indexes, for example, and must be unique within the schema.
The number of referencing and referenced columns must be the same. Order is significant. The first referencing column refers to the first referenced column, the second to the second, etc.
A referencing column and its referenced counterpart must be the same type. The columns must also be indexable.
Foreign keys cannot be created on columns with the
Array columns are not supported.
A foreign key may reference columns of the same table (a "self-referencing" foreign key). An example is an Employee table with a ManagerId column that references the table's EmployeeId column.
Foreign keys may also form circular relationships between tables where two tables reference each other, either directly or indirectly. The referenced table must exist before creating a foreign key, so at least one of the foreign keys must be added using the
The referenced keys must be unique. Cloud Spanner uses the
PRIMARY KEYof the referenced table if the foreign key's referenced columns match the referenced table's primary key columns. If Cloud Spanner cannot use the referenced table's primary key, it creates a
UNIQUE NULL_FILTERED INDEXover the referenced columns.
Cloud Spanner may also be able to use the primary key of the referencing table, although this is less common. If not, Cloud Spanner creates a
NULL_FILTERED INDEXover the referencing columns.
Foreign keys do not use secondary indexes you have created; they create their own backing indexes. These backing indexes are usable in query evaluations, including in explicit force_index directives. The name of the backing indexes are queryable from Cloud Spanner's INFORMATION_SCHEMA. For more information, see Backing indexes.
Long-running schema changes
Adding a foreign key to an existing table, or creating a new table with a foreign key, can lead to long-running operations. In the case of a new table, the table is not writable until the long-running operation is completed.
When creating a new table with a foreign key, Cloud Spanner needs to backfill the referenced indexes as needed for each foreign key.
When adding a foreign key to an existing table, Cloud Spanner needs to backfill the referencing and referenced indexes as needed. In addition, Cloud Spanner validates existing data in the table to ensure it complies with the foreign key's referential integrity constraint. The schema change will fail if any data is invalid.
Either of the above schema changes can fail if the referenced index can't be
created due to a
UNIQUE constraint violation.
You can query INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS.SPANNER_STATE to check foreign key creation state.
Constraint validation for a transaction
Cloud Spanner validates foreign key constraints when a transaction is being committed, or when the effects of writes are being made visible to subsequent operations in the transaction.
A value inserted into the referencing column(s) is matched against the values of
the referenced table and referenced columns. Rows with
NULL referencing values
are not checked, that is, they can be added to the referencing table.
Cloud Spanner validates all applicable foreign key referential constraints when attempting to update data via either DML statements or an API. All pending changes are rolled back if any constraints are invalid.
Validation occurs immediately after each DML statement. You must, for example, insert the referenced row before inserting its referencing rows. When using a mutation API, mutations are buffered until the transaction is committed. Foreign key validation is deferred until the transaction is committed. In this case it is permissible to insert the referencing rows first.
Each transaction is evaluated for modifications that affect foreign key constraints. These evaluations may require additional requests to the server. Backing indexes also require additional processing time to evaluate transaction modifications and to maintain the indexes. Additional storage is also required for each index.
Foreign keys do not use user-created indexes. They create their own backing indexes.
Cloud Spanner can create up to two secondary backing indexes for each foreign key, one for the referencing columns, and a second for the referenced columns. However, a foreign key usually references the primary keys of the referenced table, so the second index on the referenced table is typically not needed.
The backing index for the referenced table is a
index; the creation of the foreign key fails if any existing data violates the
index's uniqueness constraint. The backing index for the referencing table is
If two or more foreign keys require the same backing index, Cloud Spanner will create a single index for all of them. The backing indexes are dropped when the foreign keys using them are dropped. Users can't alter or drop the backing indexes.
The foreign key backing indexes are usable in query evaluations. Their names are queryable from Cloud Spanner's INFORMATION_SCHEMA.
Comparison of foreign keys and table interleaving
Cloud Spanner's table interleaving is a good choice for many parent-child relationships where the child table's primary key includes the parent table's primary key columns. The co-location of child rows with their parent rows can significantly improve performance.
Foreign keys are a more general parent-child solution and address additional use cases. They are not limited to primary key columns, and tables can have multiple foreign key relationships, both as a parent in some relationships and a child in others. However, a foreign key relation does not imply co-location of the tables in the storage layer.
Let's look at an example, using the order processing schema we discussed earlier
in this topic. Recall our
Orders table was defined as follows:
Figure 3. Diagram of our database schema with foreign keys
The design in Figure 3 has some limitations. For example, each order can currently only contain one order item.
Let's imagine our customers tell us that they'd like to be able to order more
than one product per order. We can enhance our design by introducing an
OrderItems table that contains an entry for each product the customer
ordered. We can introduce another foreign key to represent this new one-to-many
OrderItems. But, we also know that a lot of
the time we'll want to run queries across orders and their respective order
items. Co-location of this data would boost performance, so we'll create the
parent-child relationship using Cloud Spanner's table interleaving capability.
Here's how we define the
OrderItems table, interleaved with
CREATE TABLE OrderItems ( OrderID INT64 NOT NULL, ProductID INT64 NOT NULL, Quantity INT64 NOT NULL, FOREIGN KEY (ProductID) REFERENCES Products (ProductID) ) PRIMARY KEY (OrderID, ProductID), INTERLEAVE IN PARENT Orders ON DELETE CASCADE;
Figure 4 is a visual representation of the updated database schema as a result
of introducing this new table,
OrderItems, interleaved with
Orders. Here you
can also see the one-to-many relationship between those two tables.
Figure 4. Addition of an interleaved OrderItems table
In this configuration, we can have multiple
OrderItems entries in each order and
the OrderItem entries for each order are interleaved, and therefore co-located
with the orders. Physically interleaving
OrderItems in this way can
improve performance, effectively pre-joining the tables and allowing you to
access related rows together while minimizing disk accesses. For example,
Cloud Spanner can perform joins by primary key locally, minimizing disk access
and network traffic.
The following table summarizes how foreign keys and table interleaving compare. You can use this information to decide what is right for your design.
|Parent-child relationship type||Table Interleaving||Foreign Keys|
|Can use primary keys||Yes||Yes|
|Can use non-primary-key columns||No||Yes|
|Number of parents supported||0 .. 1||0 .. N|
|Stores parent and child data together||Yes||No|
|Supports cascade delete||Yes||No|
|Null matching mode||Passes if all referencing values are not distinct from the referenced values.
Null values are not distinct from null values; null values are distinct from non-null values.
|Passes if any referencing values are null.
Passes if all referencing values are non-null, and the referenced table has a row with values equal to the referencing values.
Fails if no matching row was found.
|Enforcement Timing||Per operation when using the mutation API.
Per statement when using DML.
|Per transaction when using the mutation API.
Per statement when using DML.
|Can be removed easily||No. Table interleaving cannot be removed after it's created, unless you delete the whole child table.||Yes|