This document describes how to create efficient queries by using best practices for designing Spanner Graph schemas. Schema design can be iterative, so we recommend that you first identify critical query patterns to guide your schema design.
For general information about Spanner schema design best practices, see Schema design best practices.
Optimize edge traversal
Edge traversal is the process of navigating through a graph by following its edges, starting at a particular node and moving along connected edges to reach other nodes. Edge traversal is a fundamental operation in Spanner Graph, so improving edge traversal efficiency is key to your application performance.
You can traverse an edge in two directions: traversing from the source node to the destination node is called forward edge traversal, while traversing from the destination node to the source node is called reverse edge traversal.
Optimize forward edge traversal using interleaving
To improve forward edge traversal performance, interleave the edge input table into the source node input table to colocate edges with source nodes. Interleaving is a storage optimization technique in Spanner that physically colocates child table rows with their corresponding parent rows in storage. For more information about interleaving, see Schemas overview.
The following example demonstrates these best practices:
CREATE TABLE Person (
id INT64 NOT NULL,
name STRING(MAX),
) PRIMARY KEY (id);
CREATE TABLE PersonOwnAccount (
id INT64 NOT NULL,
account_id INT64 NOT NULL,
create_time TIMESTAMP,
) PRIMARY KEY (id, account_id),
INTERLEAVE IN PARENT Person ON DELETE CASCADE;
Optimize reverse edge traversal using foreign key
To efficiently traverse reverse edges, create a foreign key constraint between the edge and the destination node. This foreign key automatically creates a secondary index on the edge keyed by the destination node keys. The secondary index is automatically used during query execution.
The following example demonstrates these best practices:
CREATE TABLE Person (
id INT64 NOT NULL,
name STRING(MAX),
) PRIMARY KEY (id);
CREATE TABLE Account (
id INT64 NOT NULL,
create_time TIMESTAMP,
) PRIMARY KEY (id);
CREATE TABLE PersonOwnAccount (
id INT64 NOT NULL,
account_id INT64 NOT NULL,
create_time TIMESTAMP,
CONSTRAINT FK_Account FOREIGN KEY (account_id) REFERENCES Account (id),
) PRIMARY KEY (id, account_id),
INTERLEAVE IN PARENT Person ON DELETE CASCADE;
Optimize reverse edge traversal using secondary index
If you don't want to create a foreign key on the edge, for example, due to the strict data integrity it enforces, you can directly create a secondary index on the edge input table, as shown in the following example:
CREATE TABLE PersonOwnAccount (
id INT64 NOT NULL,
account_id INT64 NOT NULL,
create_time TIMESTAMP,
) PRIMARY KEY (id, account_id),
INTERLEAVE IN PARENT Person ON DELETE CASCADE;
CREATE INDEX Reverse_PersonOwnAccount
ON PersonOwnAccount (account_id);
Disallow dangling edges
A dangling edge is an edge that connects fewer than two nodes. A dangling edge can occur when a node is deleted without removing its associated edges, or when an edge is created without properly linking it to its nodes.
Disallowing dangling edges provides the following benefits:
- Enforces graph structure integrity.
- Improves query performance by avoiding the extra work to filter out edges where endpoints don't exist.
Disallow dangling edges using referential constraints
To disallow dangling edges, specify constraints on both endpoints:
- Interleave the edge input table into the source node input table. This approach ensures that the source node of an edge always exists.
- Create a foreign key constraint on edges to ensure that the destination node of an edge always exists.
The following example uses interleaving and a foreign key to enforce referential integrity:
CREATE TABLE PersonOwnAccount (
id INT64 NOT NULL,
account_id INT64 NOT NULL,
create_time TIMESTAMP,
CONSTRAINT FK_Account FOREIGN KEY (account_id) REFERENCES Account (id) ON DELETE CASCADE,
) PRIMARY KEY (id, account_id),
INTERLEAVE IN PARENT Person ON DELETE CASCADE;
Use ON DELETE CASCADE to automatically remove edges when deleting a node
When you use interleaving or a foreign key to disallow dangling edges, use the
ON DELETE
clause to control the behavior when you want to delete a node with
edges that are still attached. For more information, see
delete cascading for interleaved tables
and
delete cascading with foreign keys.
You can use ON DELETE
in the following ways:
ON DELETE NO ACTION
(or omitting theON DELETE
clause): Deleting a node with edges will fail.ON DELETE CASCADE
: Deleting a node automatically removes the associated edges in the same transaction.
Delete cascade for edges connecting different types of nodes
Delete edges when the source node is deleted. For example,
INTERLEAVE IN PARENT Person ON DELETE CASCADE
deletes all outgoingPersonOwnAccount
edges from thePerson
node being deleted. For more information, see Create interleaved tables.Delete edges when the destination node is deleted. For example,
CONSTRAINT FK_Account FOREIGN KEY(account_id) REFERENCES Account(id) ON DELETE CASCADE
deletes all incomingPersonOwnAccount
edges into theAccount
node being deleted. For more information, see Foreign keys.
Delete cascade for edges connecting the same type of nodes
When the source and destination nodes of an edge have the same type and the
edge is interleaved into the source node, you can define ON DELETE CASCADE
only for the source node or destination node (but not both nodes).
To automatically remove dangling edges in both cases, create a foreign key on the edge source node reference instead of interleaving the edge input table into the source node input table.
We recommend interleaving for
optimizing forward edge traversal.
Be sure to verify the impact on your workloads before you proceed. See the
following example, which uses AccountTransferAccount
as the edge input
table:
--Define two Foreign Keys, each on one end Node of Transfer Edge, both with ON DELETE CASCADE action:
CREATE TABLE AccountTransferAccount (
id INT64 NOT NULL,
to_id INT64 NOT NULL,
amount FLOAT64,
create_time TIMESTAMP NOT NULL,
order_number STRING(MAX),
CONSTRAINT FK_FromAccount FOREIGN KEY (id) REFERENCES Account (id) ON DELETE CASCADE,
CONSTRAINT FK_ToAccount FOREIGN KEY (to_id) REFERENCES Account (id) ON DELETE CASCADE,
) PRIMARY KEY (id, to_id);
Filter by node or edge properties with secondary indexes
Secondary indexes are essential for efficient query processing. They support quick lookups of nodes and edges based on specific property values, without having to traverse the entire graph structure. This is important when you're working with large graphs, because traversing all nodes and edges can be very inefficient.
Speed up filtering nodes by property
To speed up filtering by node properties, create secondary indexes on
properties. For example, the following query finds accounts for a given
nickname. Without a secondary index, all Account
nodes are scanned to match
the filtering criteria.
GRAPH FinGraph
MATCH (acct:Account)
WHERE acct.nick_name = "abcd"
RETURN acct.id;
To speed up the query, create a secondary index on the filtered property, as shown in the following example:
CREATE TABLE Account (
id INT64 NOT NULL,
create_time TIMESTAMP,
is_blocked BOOL,
nick_name STRING(MAX),
) PRIMARY KEY (id);
CREATE INDEX AccountByEmail
ON Account (nick_name);
Tip: Use NULL-filtered indexes for sparse properties. For more information, see Disable indexing of NULL values.
Speed up forward edge traversal with filtering on edge properties
When you traverse an edge while filtering on its properties, you can speed up the query by creating a secondary index on the edge properties and interleaving the index into the source node.
For example, the following query finds accounts owned by a given person after a certain time:
GRAPH FinGraph
MATCH (person:Person)-[owns:Owns]->(acct:Account)
WHERE person.id = 1
AND owns.create_time >= PARSE_TIMESTAMP("%c", "Thu Dec 25 07:30:00 2008")
RETURN acct.id;
By default, this query reads all edges of the specified person and then filters
those edges that satisfy the condition on create_time
.
The following example shows you how to improve query efficiency by creating
a secondary index on the edge source node reference (id
) and the edge property
(create_time
). Interleave the index under the source node input table to
colocate the index with the source node.
CREATE TABLE PersonOwnAccount (
id INT64 NOT NULL,
account_id INT64 NOT NULL,
create_time TIMESTAMP,
) PRIMARY KEY (id, account_id),
INTERLEAVE IN PARENT Person ON DELETE CASCADE;
CREATE INDEX PersonOwnAccountByCreateTime
ON PersonOwnAccount (id, create_time)
INTERLEAVE IN Person;
Using this approach, the query can efficiently find all edges satisfying the
condition on create_time
.
Speed up reverse edge traversal with filtering on edge properties
When you traverse a reverse edge while filtering on its properties, you can speed up the query by creating a secondary index using the destination node and the edge properties for filtering.
The following example query performs reverse edge traversal with filtering on edge properties:
GRAPH FinGraph
MATCH (acct:Account)<-[owns:Owns]-(person:Person)
WHERE acct.id = 1
AND owns.create_time >= PARSE_TIMESTAMP("%c", "Thu Dec 25 07:30:00 2008")
RETURN person.id;
To speed up this query using a secondary index, use one of the following options:
Create a secondary index on the edge destination node reference (
account_id
) and the edge property (create_time
), as shown in the following example:CREATE TABLE PersonOwnAccount ( id INT64 NOT NULL, account_id INT64 NOT NULL, create_time TIMESTAMP, ) PRIMARY KEY (id, account_id), INTERLEAVE IN PARENT Person ON DELETE CASCADE; CREATE INDEX PersonOwnAccountByCreateTime ON PersonOwnAccount (account_id, create_time);
This approach provides better performance because the reverse edges are sorted by
account_id
andcreate_time
, which lets the query engine efficiently find edges foraccount_id
satisfying the condition oncreate_time
. However, if different query patterns filter on different properties, then each property might require a separate index, which can add overhead.Create a secondary index on the edge destination node reference (
account_id
) and store the edge property (create_time
) in a storing column, as shown in the following example:CREATE TABLE PersonOwnAccount ( id INT64 NOT NULL, account_id INT64 NOT NULL, create_time TIMESTAMP, ) PRIMARY KEY (id, account_id), INTERLEAVE IN PARENT Person ON DELETE CASCADE; CREATE INDEX PersonOwnAccountByCreateTime ON PersonOwnAccount (account_id) STORING (create_time);
This approach can store multiple properties; however, the query must read all edges of the destination node and then filter on edge properties.
You can combine these approaches by following these guidelines:
- Use edge properties in index columns if they are used in performance-critical queries.
- For properties used in less performance-sensitive queries, add them in the storing columns.
Model node and edge types with labels and properties
Node and edge types are commonly modeled with labels. However, you can also use
properties to model types. Consider an example in which there are many different
types of accounts, like BankAccount
, InvestmentAccount
, and
RetirementAccount
. You can store the accounts in separate input tables and
model them as separate labels, or you can store the accounts in a single input
table and use a property to differentiate between the types.
Start the modeling process by modeling the types with labels. Consider using properties in the following scenarios.
Improve schema management
If your graph has many different node and edge types, managing a separate input table for each can become difficult. To make schema management easier, model the type as a property.
Model types in a property to manage frequently changing types
When you model types as labels, adding or removing types requires changes to the schema. If you perform too many schema updates in a short period of time, Spanner might throttle the processing of queued schema updates. For more information, see Limit the frequency of schema updates.
If you need to change the schema frequently, we recommend that you model the type in a property to work around limitations on the frequency of schema updates.
Speed up queries
Modeling types with properties might speed up queries when the node or edge pattern
references multiple labels. The following example query finds all instances of
SavingsAccount
and InvestmentAccount
owned by a Person
, assuming account
types are modeled with labels:
GRAPH FinGraph
MATCH (:Person {id: 1})-[:Owns]->(acct:SavingsAccount|InvestmentAccount)
RETURN acct.id;
The acct
node pattern references two labels. If this is a
performance-critical query, consider modeling Account
using a property. This
approach might provide better query performance, as shown in the following query
example. We recommend that you benchmark both queries.
GRAPH FinGraph
MATCH (:Person {id: 1})-[:Owns]->(acct:Account)
WHERE acct.type IN ("Savings", "Investment")
RETURN acct.id;
Store type in the node element key to speed up queries
To speed up queries with filtering on the node type when a node type is modeled with a property and the type doesn't change across the node lifetime, follow these steps:
- Include the property as part of the node element key.
- Add the node type in the edge input table.
- Include the node type in the edge referencing keys.
The following example applies this optimization to the Account
node and the
AccountTransferAccount
edge.
CREATE TABLE Account (
type STRING(MAX) NOT NULL,
id INT64 NOT NULL,
create_time TIMESTAMP,
) PRIMARY KEY (type, id);
CREATE TABLE AccountTransferAccount (
type STRING(MAX) NOT NULL,
id INT64 NOT NULL,
to_type STRING(MAX) NOT NULL,
to_id INT64 NOT NULL,
amount FLOAT64,
create_time TIMESTAMP NOT NULL,
order_number STRING(MAX),
) PRIMARY KEY (type, id, to_type, to_id),
INTERLEAVE IN PARENT Account ON DELETE CASCADE;
CREATE PROPERTY GRAPH FinGraph
NODE TABLES (
Account
)
EDGE TABLES (
AccountTransferAccount
SOURCE KEY (type, id) REFERENCES Account
DESTINATION KEY (to_type, to_id) REFERENCES Account
);
Configure TTL on nodes and edges
Spanner Time to live (TTL) is a mechanism that supports automatic expiration and removal of data after a specified period. This is often used for data that has a limited lifespan or relevance, like session information, temporary caches, or event logs. In these cases, TTL helps to maintain database size and performance.
The following example uses TTL to delete accounts 90 days after their closure:
CREATE TABLE Account (
id INT64 NOT NULL,
create_time TIMESTAMP,
close_time TIMESTAMP,
) PRIMARY KEY (id),
ROW DELETION POLICY (OLDER_THAN(close_time, INTERVAL 90 DAY));
If the node table has a TTL and an edge table interleaved within, the
interleave must be defined with
ON DELETE CASCADE
.
Similarly, if the node table has a TTL and is referenced by an edge table
through a foreign key, the foreign key must be defined with
ON DELETE CASCADE
.
In the following example, AccountTransferAccount
is stored for up to ten years
while an account remains active. When an account is deleted, the transfer
history is also deleted.
CREATE TABLE AccountTransferAccount (
id INT64 NOT NULL,
to_id INT64 NOT NULL,
amount FLOAT64,
create_time TIMESTAMP NOT NULL,
order_number STRING(MAX),
) PRIMARY KEY (id, to_id),
INTERLEAVE IN PARENT Account ON DELETE CASCADE,
ROW DELETION POLICY (OLDER_THAN(create_time, INTERVAL 3650 DAY));
Merge node and edge input tables
You can use the same input table to define more than one node and edge in your schema.
In the following example tables, the Account
nodes have a composite key
(owner_id, account_id)
. There is an implicit edge definition, Person
node
with key (id
) owns the Account
node with composite key
(owner_id, account_id)
when id
equals owner_id
.
CREATE TABLE Person (
id INT64 NOT NULL,
) PRIMARY KEY (id);
-- Assume each account has exactly one owner.
CREATE TABLE Account (
owner_id INT64 NOT NULL,
account_id INT64 NOT NULL,
) PRIMARY KEY (owner_id, account_id);
In this case, you can use the Account
input table to define the Account
node and the PersonOwnAccount
edge, as shown in the following schema example.
To ensure that all element table names are unique, the example gives the edge
table definition the alias Owns
.
CREATE PROPERTY GRAPH FinGraph
NODE TABLES (
Person,
Account
)
EDGE TABLES (
Account AS Owns
SOURCE KEY (owner_id) REFERENCES Person
DESTINATION KEY (owner_id, account_id) REFERENCES Account
);