Creating and managing generated columns

A generated column is a column that is always computed from other columns in a row. These columns can make a query simpler, save the cost of evaluating an expression at query time, and can be indexed or used as a foreign key. In this article, we'll describe how to manage this column type in your database.

Adding a generated column to a new table

In the following CREATE TABLE snippet, we create a table to store information about users. We have columns for FirstName and LastName and define a generated column for FullName, which is the concatenation of FirstName and LastName

CREATE TABLE Users (
  Id STRING(20) NOT NULL,
  FirstName STRING(50),
  LastName STRING(50),
  Age INT64 NOT NULL,
  FullName STRING(100) AS (ARRAY_TO_STRING([FirstName, LastName], " ")) STORED,
) PRIMARY KEY (Id);

The value of FullName is computed when a new row is inserted or when FirstName and/or LastName is updated for an existing row. The computed value is stored along with other columns from the table. The SQL in parentheses is called the generation expression.

  • expression can be any valid SQL expression that is assignable to the column data type with the following restrictions.

  • The STORED attribute following the expression causes the result of the function to be stored along with other columns of the table. Subsequent updates to any of the referenced columns causes the expression to be re-evaluated and stored.

  • Generated columns without the STORED attribute are not allowed.

  • Direct writes to generated columns are not allowed.

  • Generated columns can't be used as, or part of, a primary key. They can, however, be secondary index keys.

  • Column option allow_commit_timestamp is not allowed on generated columns, or any columns referenced by generated columns.

  • You can't change the data type of a generated column, or any columns referenced by the generated column.

  • You can't drop a column referenced by a generated column.

The generated column can be queried just like any other column, as shown in the following example.

SELECT Id, FullName
FROM Users;

This is a equivalent to the following statement, which doesn't use the stored generated column.

SELECT Id, ARRAY_TO_STRING([FirstName, LastName], " ") as FullName
FROM Users;

Apart from making the query simpler and saving the cost of evaluating the expression at query time, a generated column can also be indexed or used as a foreign key.

Creating an index on a generated column

To help with lookups on our FullName generated column, we can create a secondary index as shown in the following snippet.

CREATE INDEX UsersByFullName ON Users (FullName);

Adding a generated column to an existing table

Using the following ALTER TABLE statement, we add a generated column to the Users table to generate and store the user's initials.

ALTER TABLE Users ADD COLUMN Initials STRING(2)
AS (ARRAY_TO_STRING([SUBSTR(FirstName, 0, 1), SUBSTR(LastName, 0, 1)], "")) STORED;

Adding a stored generated column to an existing table is a long-running operation to backfill the column values. During backfilling, the stored generated columns cannot be read or queried. The backfilling state is reflected in the INFORMATION_SCHEMA.

Creating a partial index using a generated column

What if we only wanted to query users who are over 18? A full scan of the table would be inefficient, so we use a partial index.

  1. Use the following statement to add another generated column which returns the user's age if they are over 18, and returns NULL otherwise.

    ALTER TABLE Users ADD COLUMN AgeAbove18 INT64
    AS (IF(Age > 18, Age, NULL)) STORED;
    
  2. Create an index on this new column, and disable the indexing of NULL values with the NULL_FILTERED keyword. This partial index is smaller and more efficient than a normal index because it excludes everyone who is 18 or younger.

    CREATE NULL_FILTERED INDEX UsersAbove18ByAge
    ON Users (AgeAbove18);
    
  3. To retrieve the Id and Age of all users who are over 18, run the following query.

    SELECT Id, Age
        FROM Users@{FORCE_INDEX=UsersAbove18ByAge}
        WHERE AgeAbove18 IS NOT NULL;
    
  4. To filter on a different age, for example, to retrieve all users who are over 21, use the same index and filter on the generated column as follows:

    SELECT Id, Age
        FROM Users@{FORCE_INDEX=UsersAbove18ByAge}
        WHERE AgeAbove18 > 21;
    

Removing a generated column

The following DDL statement drops a generated column from the Users table.

ALTER TABLE Users
DROP COLUMN Initials;

Modifying a generated column expression

Modifying the expression of a generated column is not allowed. Instead, you need to drop the existing column and create a new generated column with the new expression.

Viewing properties of a generated column

Cloud Spanner's INFORMATION_SCHEMA contains information about the generated columns on your database. The following are some examples of the questions you can answer by querying the information schema.

What generated columns are defined in my database?

SELECT c.TABLE_NAME, c.COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS as c
WHERE c.GENERATION_EXPRESSION IS NOT NULL;

What is the current state of generated columns in table Users?

If you have added a generated column to an existing table, you might want to view its current state to determine, for example, whether the column has been backfilled and fully usable. If SPANNER_STATE returns WRITE_ONLY in the following query, it means Cloud Spanner is still in the process of backfilling the column and no reads are allowed.

SELECT c.TABLE_NAME, c.COLUMN_NAME, c.SPANNER_STATE
FROM INFORMATION_SCHEMA.COLUMNS as c
WHERE c.TABLE_NAME="Users" AND c.GENERATION_EXPRESSION IS NOT NULL;

What's next