A generated column is a column that is always computed from other columns in a row. These columns can make a query simpler, save the cost of evaluating an expression at query time, and can be indexed or used as a foreign key. In this article, we'll describe how to manage this column type in your database.
Adding a generated column to a new table
In the following CREATE TABLE
snippet, we create a table to store information about users. We
have columns for FirstName
and LastName
and define a generated column for
FullName
, which is the concatenation of FirstName
and LastName
CREATE TABLE Users (
Id STRING(20) NOT NULL,
FirstName STRING(50),
LastName STRING(50),
Age INT64 NOT NULL,
FullName STRING(100) AS (ARRAY_TO_STRING([FirstName, LastName], " ")) STORED,
) PRIMARY KEY (Id);
The value of FullName
is computed when a new row is inserted or when
FirstName
and/or LastName
is updated for an existing row. The computed value
is stored along with other columns from the table. The SQL in parentheses is
called the generation expression.
expression can be any valid SQL expression that is assignable to the column data type with the following restrictions.
The expression can only reference columns in the same table.
The expression can't contain subqueries.
The expression can't contain non-deterministic functions such as
PENDING_COMMIT_TIMESTAMP()
,CURRENT_DATE()
, andCURRENT_TIMESTAMP()
.You can't modify the expression of a generated column.
The
STORED
attribute following the expression causes the result of the function to be stored along with other columns of the table. Subsequent updates to any of the referenced columns causes the expression to be re-evaluated and stored.Generated columns without the
STORED
attribute are not allowed.Direct writes to generated columns are not allowed.
Generated columns can't be used as, or part of, a primary key. They can, however, be secondary index keys.
Column option
allow_commit_timestamp
is not allowed on generated columns, or any columns referenced by generated columns.You can't change the data type of a generated column, or any columns referenced by the generated column.
You can't drop a column referenced by a generated column.
The generated column can be queried just like any other column, as shown in the following example.
SELECT Id, FullName
FROM Users;
This is a equivalent to the following statement, which doesn't use the stored generated column.
SELECT Id, ARRAY_TO_STRING([FirstName, LastName], " ") as FullName
FROM Users;
Apart from making the query simpler and saving the cost of evaluating the expression at query time, a generated column can also be indexed or used as a foreign key.
Creating an index on a generated column
To help with lookups on our FullName
generated column, we can create a
secondary index as shown in the following snippet.
CREATE INDEX UsersByFullName ON Users (FullName);
Adding a generated column to an existing table
Using the following ALTER TABLE
statement, we add a generated column to the
Users
table to generate and store the user's initials.
ALTER TABLE Users ADD COLUMN Initials STRING(2)
AS (ARRAY_TO_STRING([SUBSTR(FirstName, 0, 1), SUBSTR(LastName, 0, 1)], "")) STORED;
Adding a stored generated column to an existing table is a long-running operation to backfill the column values. During backfilling, the stored generated columns cannot be read or queried. The backfilling state is reflected in the INFORMATION_SCHEMA.
Creating a partial index using a generated column
What if we only wanted to query users who are over 18? A full scan of the table would be inefficient, so we use a partial index.
Use the following statement to add another generated column which returns the user's age if they are over 18, and returns
NULL
otherwise.ALTER TABLE Users ADD COLUMN AgeAbove18 INT64 AS (IF(Age > 18, Age, NULL)) STORED;
Create an index on this new column, and disable the indexing of
NULL
values with theNULL_FILTERED
keyword. This partial index is smaller and more efficient than a normal index because it excludes everyone who is 18 or younger.CREATE NULL_FILTERED INDEX UsersAbove18ByAge ON Users (AgeAbove18);
To retrieve the
Id
andAge
of all users who are over 18, run the following query.SELECT Id, Age FROM Users@{FORCE_INDEX=UsersAbove18ByAge} WHERE AgeAbove18 IS NOT NULL;
To filter on a different age, for example, to retrieve all users who are over 21, use the same index and filter on the generated column as follows:
SELECT Id, Age FROM Users@{FORCE_INDEX=UsersAbove18ByAge} WHERE AgeAbove18 > 21;
Removing a generated column
The following DDL statement drops a generated column from the Users
table.
ALTER TABLE Users
DROP COLUMN Initials;
Modifying a generated column expression
Modifying the expression of a generated column is not allowed. Instead, you need to drop the existing column and create a new generated column with the new expression.
Viewing properties of a generated column
Cloud Spanner's INFORMATION_SCHEMA contains information about the generated columns on your database. The following are some examples of the questions you can answer by querying the information schema.
What generated columns are defined in my database?
SELECT c.TABLE_NAME, c.COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS as c
WHERE c.GENERATION_EXPRESSION IS NOT NULL;
What is the current state of generated columns in table Users
?
If you have added a generated column to an existing table, you might want to
view its current state to determine, for example, whether the column has been
backfilled and fully usable. If SPANNER_STATE
returns WRITE_ONLY
in the following query, it means Cloud Spanner is still in the process of
backfilling the column and no reads are allowed.
SELECT c.TABLE_NAME, c.COLUMN_NAME, c.SPANNER_STATE
FROM INFORMATION_SCHEMA.COLUMNS as c
WHERE c.TABLE_NAME="Users" AND c.GENERATION_EXPRESSION IS NOT NULL;
What's next
Learn more about Cloud Spanner's INFORMATION SCHEMA.
See more details about generated columns in the CREATE TABLE parameter details.