A generated column is a column that is always computed from other columns in a row. These columns can make a query simpler, save the cost of evaluating an expression at query time, and can be indexed or used as a foreign key. In this article, we'll describe how to manage this column type in your database.
Add a generated column to a new table
In the following CREATE TABLE
snippet, we create a table to store information
about users. We have columns for FirstName
and LastName
and define a
generated column for FullName
, which is the concatenation of FirstName
and
LastName
. The SQL in parentheses is called the generation expression.
A generated column can be marked as STORED
to save the cost of evaluating the
expression at query time. As a result, the value of FullName
is only computed
when a new row is inserted or when FirstName
or LastName
is updated for an
existing row. The computed value is stored along with other columns in the
table.
GoogleSQL
CREATE TABLE Users (
Id STRING(20) NOT NULL,
FirstName STRING(50),
LastName STRING(50),
Age INT64 NOT NULL,
FullName STRING(100) AS (FirstName || ' ' || LastName) STORED
) PRIMARY KEY (Id);
PostgreSQL
CREATE TABLE users (
id VARCHAR(20) NOT NULL,
firstname VARCHAR(50),
lastname VARCHAR(50),
age BIGINT NOT NULL,
fullname VARCHAR(100) GENERATED ALWAYS AS (firstname || ' ' || lastname) STORED,
PRIMARY KEY(id)
);
You can create a non-stored generated column by omitting the STORED
attribute in the DDL. This kind of generated column is evaluated at query time
and can make a query simpler. In PostgreSQL, you can create a non-stored
generated column using the VIRTUAL
attribute.
GoogleSQL
FullName STRING(MAX) AS (CONCAT(FirstName, " ", LastName))
PostgreSQL
fullname text GENERATED ALWAYS AS (firstname || ' ' || lastname) VIRTUAL
expression
can be any valid SQL expression that's assignable to the column data type with the following restrictions.The expression can only reference columns in the same table.
The expression can't contain subqueries.
Expressions with non-deterministic functions such as
PENDING_COMMIT_TIMESTAMP()
,CURRENT_DATE()
, andCURRENT_TIMESTAMP()
can't be made into aSTORED
generated column or a generated column that is indexed.You can't modify the expression of a
STORED
or indexed generated column.
The
STORED
attribute that follows the expression stores the result of the expression along with other columns of the table. Subsequent updates to any of the referenced columns cause Spanner to re-evaluate and store the expression.Generated columns that are not
STORED
can't be marked asNOT NULL
.Direct writes to generated columns aren't allowed.
Column option
allow_commit_timestamp
isn't allowed on generated columns or any columns that generated columns reference.For
STORED
or generated columns that are indexed, you can't change the data type of the column, or of any columns that the generated column references.You can't drop a column a generated column references.
You can use a generated column as a primary key with the following additional restrictions:
The generated primary key can't reference other generated columns.
The generated primary key can reference, at most, one non-key column.
The generated primary key can't depend on a non-key column with a
DEFAULT
clause.
The following rules apply when using generated key columns:
- Read APIs: You must fully specify the key columns, including the generated key columns.
- Mutation APIs: For
INSERT
,INSERT_OR_UPDATE
, andREPLACE
, Spanner doesn't allow you to specify generated key columns. ForUPDATE
, you can optionally specify generated key columns. ForDELETE
, you need to fully specify the key columns including the generated keys. - DML: You can't explicitly write to generated keys in
INSERT
orUPDATE
statements. - Query: In general, we recommend that you use the generated key column as a
filter in your query. Optionally, if the expression for the generated key
column uses only one column as a reference, the query can apply an equality
(
=
) orIN
condition to the referenced column. For more information and an example, see Create a unique key derived from a value column.
The generated column can be queried just like any other column, as shown in the following example.
GoogleSQL
SELECT Id, FullName
FROM Users;
PostgreSQL
SELECT id, fullname
FROM users;
The query using Fullname
is equivalent to the query with the generated
expression. Hence a generated column can make the query simpler.
GoogleSQL
SELECT Id, ARRAY_TO_STRING([FirstName, LastName], " ") as FullName
FROM Users;
PostgreSQL
SELECT id, firstname || ' ' || lastname as fullname
FROM users;
Create an index on a generated column
You can also index or use a generated column as a foreign key.
To help with lookups on our FullName
generated column, we can create a
secondary index as shown in the following snippet.
GoogleSQL
CREATE INDEX UsersByFullName ON Users (FullName);
PostgreSQL
CREATE INDEX UserByFullName ON users (fullname);
Add a generated column to an existing table
Using the following ALTER TABLE
statement, we can add a generated column to
the Users
table to generate and store the user's initials.
GoogleSQL
ALTER TABLE Users ADD COLUMN Initials STRING(2)
AS (ARRAY_TO_STRING([SUBSTR(FirstName, 0, 1), SUBSTR(LastName, 0, 1)], "")) STORED;
PostgreSQL
ALTER TABLE users ADD COLUMN initials VARCHAR(2)
GENERATED ALWAYS AS (SUBSTR(firstname, 0, 1) || SUBSTR(lastname, 0, 1)) STORED;
If you add a stored generated column to an existing table, a long-running operation to backfill the column values is started. During backfilling, the stored generated columns can't be read or queried. The backfilling state is reflected in the INFORMATION_SCHEMA table.
Create a partial index using a generated column
What if we only wanted to query users who are over 18? A full scan of the table would be inefficient, so we use a partial index.
Use the following statement to add another generated column which returns the user's age if they are over 18, and returns
NULL
otherwise.GoogleSQL
ALTER TABLE Users ADD COLUMN AgeAbove18 INT64 AS (IF(Age > 18, Age, NULL));
PostgreSQL
ALTER TABLE Users ADD COLUMN AgeAbove18 BIGINT GENERATED ALWAYS AS (nullif( Age , least( 18, Age) )) VIRTUAL;
Create an index on this new column, and disable the indexing of
NULL
values with theNULL_FILTERED
keyword in GoogleSQL or theIS NOT NULL
predicate in PostgreSQL. This partial index is smaller and more efficient than a normal index because it excludes everyone who is 18 or younger.GoogleSQL
CREATE NULL_FILTERED INDEX UsersAbove18ByAge ON Users (AgeAbove18);
PostgreSQL
CREATE INDEX UsersAbove18ByAge ON users (AgeAbove18) WHERE AgeAbove18 IS NOT NULL;
To retrieve the
Id
andAge
of all users who are over 18, run the following query.GoogleSQL
SELECT Id, Age FROM Users@{FORCE_INDEX=UsersAbove18ByAge} WHERE AgeAbove18 IS NOT NULL;
PostgreSQL
SELECT Id, Age FROM users /*@ FORCE_INDEX = UsersAbove18ByAge */ WHERE AgeAbove18 IS NOT NULL;
To filter on a different age, for example, to retrieve all users who are over 21, use the same index and filter on the generated column as follows:
GoogleSQL
SELECT Id, Age FROM Users@{FORCE_INDEX=UsersAbove18ByAge} WHERE AgeAbove18 > 21;
PostgreSQL
SELECT Id, Age FROM users /*@ FORCE_INDEX = UsersAbove18ByAge */ WHERE AgeAbove18 > 21;
An indexed generated column can save the cost of evaluating an expression at query time and avoid storing the values twice (in the base table and index) as compared to a
STORED
generated column.
Remove a generated column
The following DDL statement drops a generated column from the Users
table:
GoogleSQL
ALTER TABLE Users DROP COLUMN Initials;
PostgreSQL
ALTER TABLE users DROP COLUMN initials;
Modify a generated column expression
GoogleSQL
ALTER TABLE Users ALTER COLUMN FullName STRING(100)
AS (ARRAY_TO_STRING(ARRAY_TO_STRING([LastName, FirstName ], " ")));
PostgreSQL
ALTER TABLE users ADD COLUMN Initials VARCHAR(2)
GENERATED ALWAYS AS (lastname || ' ' || firstname) VIRTUAL;
Updating the expression of a STORED
generated column or an indexed non-stored
generated column is not allowed.
Create a primary key on a generated column
In Spanner, you can use a STORED
generated column in the
primary key.
The following example shows a DDL statement that creates the UserInfoLog
table with a ShardId
generated column. The value of the ShardId
column
depends on another column. It is derived by using a MOD
function on the
UserId
column. ShardId
is declared as part of the primary key.
GoogleSQL
CREATE TABLE UserInfoLog (
ShardId INT64 NOT NULL
AS (MOD(UserId, 2048)) STORED,
UserId INT64 NOT NULL,
FullName STRING(1024) NOT NULL,
) PRIMARY KEY (ShardId, UserId);
PostgreSQL
CREATE TABLE UserInfoLog (
ShardId BIGINT GENERATED ALWAYS
AS (MOD(UserId, '2048'::BIGINT)) STORED NOT NULL,
UserId BIGINT NOT NULL,
FullName VARCHAR(1024) NOT NULL,
PRIMARY KEY(ShardId, UserId));
Normally, to efficiently access a specific row you need to specify all key
columns. In the previous example, this would mean providing both a ShardId
and
UserId
. However, Spanner can sometimes infer the value of the
generated primary key column if it depends on a single other column and if the
value of the column it depends on is fully determined. This is true if the
column referenced by the generated primary key column meets one of the following
conditions:
- It's equal to a constant value or bound parameter in the
WHERE
clause, or - It gets its value set by an
IN
operator in theWHERE
clause - It gets its value from an equi-join condition
For example, for the following query:
GoogleSQL
SELECT * FROM UserInfoLog
AS T WHERE T.UserId=1;
PostgreSQL
SELECT * FROM UserInfoLog
AS T WHERE T.UserId=1;
Spanner can infer the value of ShardId
from the provided UserId
.
The previous query is equivalent to the following query after query
optimization:
GoogleSQL
SELECT * FROM UserInfoLog
AS T WHERE T.ShardId = MOD(1, 2048)
AND T.UserId=1;
PostgreSQL
SELECT * FROM UserInfoLog
AS T WHERE T.ShardId = MOD(1, 2048)
AND T.UserId=1;
The next example shows how to create the Students
table and use an
expression that retrieves the id
field of the StudentInfo
JSON column and
uses it as the primary key:
GoogleSQL
CREATE TABLE Students (
StudentId INT64 NOT NULL
AS (CAST(JSON_VALUE(StudentInfo, "$.id") AS INT64)) STORED,
StudentInfo JSON NOT NULL,
) PRIMARY KEY (StudentId);
PostgreSQL
CREATE TABLE Students (
StudentId BIGINT GENERATED ALWAYS
AS (((StudentInfo ->> 'id'::TEXT))::BIGINT) STORED NOT NULL,
StudentInfo JSONB NOT NULL,
PRIMARY KEY(StudentId));
View properties of a generated column
Spanner's INFORMATION_SCHEMA
contains information about the generated
columns on your database. The following are some examples of the questions you
can answer when you query the information schema.
What generated columns are defined in my database?
GoogleSQL
SELECT c.TABLE_NAME, c.COLUMN_NAME, C.IS_STORED
FROM INFORMATION_SCHEMA.COLUMNS as c
WHERE c.GENERATION_EXPRESSION IS NOT NULL;
PostgreSQL
SELECT c.TABLE_NAME, c.COLUMN_NAME, C.IS_STORED
FROM INFORMATION_SCHEMA.COLUMNS as c
WHERE c.GENERATION_EXPRESSION IS NOT NULL;
IS_STORED
is either YES
for stored generated columns, NO
for non-stored
generated columns, or NULL
for non-generated columns.
What is the current state of generated columns in table Users
?
If you have added a generated column to an existing table, you might want to
pass SPANNER_STATE
in a query to find out the current state of the column.
SPANNER_STATE
returns the following values:
COMMITTED
: The column is fully usable.WRITE_ONLY
: The column is being backfilled. No read is allowed.
Use the following query to find the state of a column:
GoogleSQL
SELECT c.TABLE_NAME, c.COLUMN_NAME, c.SPANNER_STATE
FROM INFORMATION_SCHEMA.COLUMNS AS c
WHERE c.TABLE_NAME="Users" AND c.GENERATION_EXPRESSION IS NOT NULL;
PostgreSQL
SELECT c.TABLE_NAME, c.COLUMN_NAME, c.SPANNER_STATE
FROM INFORMATION_SCHEMA.COLUMNS AS c
WHERE c.TABLE_NAME='users' AND c.GENERATION_EXPRESSION IS NOT NULL;
Note: A generated column that is non-stored can only be accessed using the SQL query. However, if it is indexed, you can use the read API to access the value from the index.
Performance
A STORED
generated column doesn't affect the performance of a read or query
operation. However, non-stored generated columns used in a query can impact its
performance due to the overhead of evaluating the generated column expression.
Performance of write operations (DML statements and mutations) is impacted
when using either a STORED
generated column or a generated column that is
indexed. The overhead is due to evaluating the generated column expression when
the write operation inserts or modifies any of the columns referenced in the
generated column expression. Since the overhead varies depending on the write
workload for the application, schema design, and dataset characteristics, we
recommend that you benchmark your applications before using a generated column.
What's next
Learn more about Spanner's Information schema for GoogleSQL-dialect databases and Information schema for PostgreSQL-dialect databases.
See more details about generated columns in the CREATE TABLE parameter details.