A generated column is a column that is always computed from other columns in a row. These columns can make a query simpler, save the cost of evaluating an expression at query time, and can be indexed or used as a foreign key. In this article, we'll describe how to manage this column type in your database.
Add a generated column to a new table
In the following CREATE TABLE
snippet, we create a table
to store information about users. We
have columns for FirstName
and LastName
and define a generated column for
FullName
, which is the concatenation of FirstName
and LastName
Google Standard SQL
CREATE TABLE Users (
Id STRING(20) NOT NULL,
FirstName STRING(50),
LastName STRING(50),
Age INT64 NOT NULL,
FullName STRING(100) AS (ARRAY_TO_STRING([FirstName, LastName], " ")) STORED,
) PRIMARY KEY (Id);
PostgreSQL
CREATE TABLE users (
id VARCHAR(20) NOT NULL,
firstname VARCHAR(50),
lastname VARCHAR(50),
age BIGINT NOT NULL,
fullname VARCHAR(100) GENERATED ALWAYS AS (firstname || ' ' || lastname) STORED,
PRIMARY KEY(id)
);
The value of FullName
is computed when a new row is inserted or when
FirstName
and/or LastName
is updated for an existing row. The computed value
is stored along with other columns from the table. The SQL in parentheses is
called the generation expression.
expression
can be any valid SQL expression that is assignable to the column data type with the following restrictions.The expression can only reference columns in the same table.
The expression can't contain subqueries.
The expression can't contain non-deterministic functions such as
PENDING_COMMIT_TIMESTAMP()
,CURRENT_DATE()
, andCURRENT_TIMESTAMP()
.You can't modify the expression of a generated column.
The
STORED
attribute following the expression causes the result of the function to be stored along with other columns of the table. Subsequent updates to any of the referenced columns causes the expression to be re-evaluated and stored.Generated columns without the
STORED
attribute are not allowed.Direct writes to generated columns are not allowed.
Generated columns can't be used as, or part of, a primary key. They can, however, be secondary index keys.
Column option
allow_commit_timestamp
is not allowed on generated columns or any columns referenced by generated columns.You can't change the data type of a generated column, or any columns referenced by the generated column.
You can't drop a column referenced by a generated column.
The generated column can be queried just like any other column, as shown in the following example.
Google Standard SQL
SELECT Id, FullName
FROM Users;
PostgreSQL
SELECT id, fullname
FROM users;
This is a equivalent to the following statement, which doesn't use the stored generated column.
Google Standard SQL
SELECT Id, ARRAY_TO_STRING([FirstName, LastName], " ") as FullName
FROM Users;
PostgreSQL
SELECT id, firstname || ' ' || lastname as fullname
FROM users;
Apart from making the query simpler and saving the cost of evaluating the expression at query time, a generated column can also be indexed or used as a foreign key.
Create an index on a generated column
To help with lookups on our FullName
generated column, we can create a
secondary index as shown in the following snippet.
Google Standard SQL
CREATE INDEX UsersByFullName ON Users (FullName);
PostgreSQL
CREATE INDEX UserByFullName ON users (fullname);
Add a generated column to an existing table
Using the following ALTER TABLE
statement, we add a generated column to the
Users
table to generate and store the user's initials.
Google Standard SQL
ALTER TABLE Users ADD COLUMN Initials STRING(2)
AS (ARRAY_TO_STRING([SUBSTR(FirstName, 0, 1), SUBSTR(LastName, 0, 1)], "")) STORED;
PostgreSQL
ALTER TABLE users ADD COLUMN Initials VARCHAR(2)
GENERATED ALWAYS AS (SUBSTR(FirstName, 0, 1) || SUBSTR(LastName, 0, 1)) STORED;
Adding a stored generated column to an existing table is a long-running operation to backfill the column values. During backfilling, the stored generated columns cannot be read or queried. The backfilling state is reflected in the INFORMATION_SCHEMA.
Create a partial index using a generated column
What if we only wanted to query users who are over 18? A full scan of the table would be inefficient, so we use a partial index.
Use the following statement to add another generated column which returns the user's age if they are over 18, and returns
NULL
otherwise.Google Standard SQL
ALTER TABLE Users ADD COLUMN AgeAbove18 INT64 AS (IF(Age > 18, Age, NULL)) STORED;
PostgreSQL
ALTER TABLE Users ADD COLUMN AgeAbove18 BIGINT GENERATED ALWAYS AS (nullif( Age , least( 18, Age) )) STORED;
Create an index on this new column, and disable the indexing of
NULL
values with theNULL_FILTERED
keyword in Google Standard SQL or theIS NOT NULL
predicate in PostgreSQL. This partial index is smaller and more efficient than a normal index because it excludes everyone who is 18 or younger.Google Standard SQL
CREATE NULL_FILTERED INDEX UsersAbove18ByAge ON Users (AgeAbove18);
PostgreSQL
CREATE INDEX UsersAbove18ByAge ON users (AgeAbove18) WHERE AgeAbove18 IS NOT NULL;
To retrieve the
Id
andAge
of all users who are over 18, run the following query.Google Standard SQL
SELECT Id, Age FROM Users@{FORCE_INDEX=UsersAbove18ByAge} WHERE AgeAbove18 IS NOT NULL;
PostgreSQL
SELECT Id, Age FROM users /*@ FORCE_INDEX = UsersAbove18ByAge */ WHERE AgeAbove18 IS NOT NULL;
To filter on a different age, for example, to retrieve all users who are over 21, use the same index and filter on the generated column as follows:
Google Standard SQL
SELECT Id, Age FROM Users@{FORCE_INDEX=UsersAbove18ByAge} WHERE AgeAbove18 > 21;
PostgreSQL
SELECT Id, Age FROM users /*@ FORCE_INDEX = UsersAbove18ByAge */ WHERE AgeAbove18 > 21;
Remove a generated column
The following DDL statement drops a generated column from the Users
table.
ALTER TABLE Users
DROP COLUMN Initials;
Modify a generated column expression
Modifying the expression of a generated column is not allowed. Instead, you need to drop the existing column and create a new generated column with the new expression.
View properties of a generated column
Cloud Spanner's INFORMATION_SCHEMA contains information about the generated columns on your database. The following are some examples of the questions you can answer by querying the information schema.
What generated columns are defined in my database?
SELECT c.TABLE_NAME, c.COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS as c
WHERE c.GENERATION_EXPRESSION IS NOT NULL;
What is the current state of generated columns in table Users
?
If you have added a generated column to an existing table, you might want to
view its current state to determine, for example, whether the column has been
backfilled and fully usable. If SPANNER_STATE
returns WRITE_ONLY
in the following query, it means Cloud Spanner is still in the process of
backfilling the column and no reads are allowed.
Google Standard SQL
SELECT c.TABLE_NAME, c.COLUMN_NAME, c.SPANNER_STATE
FROM INFORMATION_SCHEMA.COLUMNS AS c
WHERE c.TABLE_NAME="Users" AND c.GENERATION_EXPRESSION IS NOT NULL;
PostgreSQL
SELECT c.TABLE_NAME, c.COLUMN_NAME, c.SPANNER_STATE
FROM INFORMATION_SCHEMA.COLUMNS AS c
WHERE c.TABLE_NAME='users' AND c.GENERATION_EXPRESSION IS NOT NULL;
What's next
Learn more about Cloud Spanner's INFORMATION SCHEMA.
See more details about generated columns in the CREATE TABLE parameter details.