Create a search index

A search index is a data structure designed to enable very efficient search. Much like the index you'd find in the back of a book, a search index for a column of string data acts like an auxiliary table that has one column for unique words and another for where in the data those words occur.

Create a search index

To create a search index, use the CREATE SEARCH INDEX DDL statement. You can create a search index on these column types:

  • STRING
  • ARRAY<STRING>
  • STRUCT containing at least one nested field of type STRING or ARRAY<STRING>
  • JSON

For example, the following will create a search index on columns a and c of simple_table.

CREATE TABLE dataset.simple_table(a STRING, b INT64, c JSON);

CREATE SEARCH INDEX my_index
ON dataset.simple_table(a, c);

When you create a search index on ALL COLUMNS, all STRING or JSON data in the table is indexed. If the table contains no such data, for example if all columns contain integers, the index creation fails. When you specify a STRUCT column to be indexed, all nested subfields are indexed.

In the following example, a search index is created on a, c.e, and c.f.g.

CREATE TABLE dataset.my_table(
  a STRING,
  b INT64,
  c STRUCT <d INT64,
            e ARRAY<STRING>,
            f STRUCT<g STRING, h INT64>>) AS
SELECT 'hello' AS a, 10 AS b, (20, ['x', 'y'], ('z', 30)) AS c;

CREATE SEARCH INDEX my_index
ON dataset.my_table(ALL COLUMNS);

Since the search index was created on ALL COLUMNS, any columns added to the table are automatically indexed if they contain STRING data.

Understand index refresh

Search indexes are fully managed by BigQuery and automatically refreshed when the base table changes. The following schema changes to the base table can trigger a full refresh:

  • A new indexable column is added to a table with a search index on ALL COLUMNS.
  • An indexed column's type is changed into another containing STRING data.
  • A KMS change is performed.

If you delete the only indexed column in a table or rename the table itself, then the search index is deleted automatically.

Get information about search indexes

You can verify the existence and the readiness of a search index by querying INFORMATION_SCHEMA. There are two views that contain metadata on search indexes. The SEARCH_INDEXES view has information on each search index created on a dataset. The SEARCH_INDEX_COLUMNS view has information on which columns of each table in the dataset are indexed.

The following example shows all active search indexes on tables in the dataset my_dataset, located in the project my_project. It includes their names, the DDL statements used to create them, and the coverage. If a table is less than 1GB, then no index will be created, in which case coverage_percentage is 0.

SELECT table_name, index_name, ddl, coverage_percentage
FROM my_project.my_dataset.INFORMATION_SCHEMA.SEARCH_INDEXES
WHERE index_status = 'ACTIVE';

The results should look like the following:

+-------------+-------------+--------------------------------------------------------------------------------------+---------------------+
| table_name  | index_name  | ddl                                                                                  | coverage_percentage |
+-------------+-------------+--------------------------------------------------------------------------------------+---------------------+
| small_table | names_index | CREATE SEARCH INDEX `names_index` ON `my_project.my_dataset.small_table`(names)      | 0                   |
| large_table | logs_index  | CREATE SEARCH INDEX `logs_index` ON `my_project.my_dataset.large_table`(ALL COLUMNS) | 100                 |
+-------------+-------------+--------------------------------------------------------------------------------------+---------------------+

The following example creates a search index on all columns of my_table.

CREATE TABLE dataset.my_table(
  a STRING,
  b INT64,
  c STRUCT <d INT64,
            e ARRAY<STRING>,
            f STRUCT<g STRING, h INT64>>) AS
SELECT 'hello' AS a, 10 AS b, (20, ['x', 'y'], ('z', 30)) AS c;

CREATE SEARCH INDEX my_index
ON dataset.my_table(ALL COLUMNS);

The following query extracts information on which fields are indexed. The index_field_path indicates which field of a column is indexed. This differs from the index_column_name only in the case of a STRUCT, where the full path to the indexed field is given. In this example, column c contains an ARRAY<STRING> field e and another STRUCT called f which contains a STRING field g, each of which is indexed.

SELECT table_name, index_name, index_column_name, index_field_path
FROM my_project.dataset.INFORMATION_SCHEMA.SEARCH_INDEX_COLUMNS

The result is similar to the following:

+------------+------------+-------------------+------------------+
| table_name | index_name | index_column_name | index_field_path |
+------------+------------+-------------------+------------------+
| my_table   | my_index   | a                 | a                |
| my_table   | my_index   | c                 | c.e              |
| my_table   | my_index   | c                 | c.f.g            |
+------------+------------+-------------------+------------------+

The following query joins the INFORMATION_SCHEMA.SEARCH_INDEX_COUMNS view with the INFORMATION_SCHEMA.SEARCH_INDEXES and INFORMATION_SCHEMA.COLUMNS views to include the search index status and the data type of each column:

SELECT
  index_columns_view.index_catalog AS project_name,
  index_columns_view.index_SCHEMA AS dataset_name,
  indexes_view.TABLE_NAME AS table_name,
  indexes_view.INDEX_NAME AS index_name,
  indexes_view.INDEX_STATUS AS status,
  index_columns_view.INDEX_COLUMN_NAME AS column_name,
  index_columns_view.INDEX_FIELD_PATH AS field_path,
  columns_view.DATA_TYPE AS data_type
FROM
  mydataset.INFORMATION_SCHEMA.SEARCH_INDEXES indexes_view
INNER JOIN
  mydataset.INFORMATION_SCHEMA.SEARCH_INDEX_COLUMNS index_columns_view
  ON
    indexes_view.TABLE_NAME = index_columns_view.TABLE_NAME
    AND indexes_view.INDEX_NAME = index_columns_view.INDEX_NAME
LEFT OUTER JOIN
  mydataset.INFORMATION_SCHEMA.COLUMNS columns_view
  ON
    indexes_view.INDEX_CATALOG = columns_view.TABLE_CATALOG
    AND indexes_view.INDEX_SCHEMA = columns_view.TABLE_SCHEMA
    AND index_columns_view.TABLE_NAME = columns_view.TABLE_NAME
    AND index_columns_view.INDEX_COLUMN_NAME = columns_view.COLUMN_NAME
ORDER BY
  project_name,
  dataset_name,
  table_name,
  column_name;

The result is similar to the following:

+------------+------------+----------+------------+--------+-------------+------------+---------------------------------------------------------------+
| project    | dataset    | table    | index_name | status | column_name | field_path | data_type                                                     |
+------------+------------+----------+------------+--------+-------------+------------+---------------------------------------------------------------+
| my_project | my_dataset | my_table | my_index   | ACTIVE | a           | a          | STRING                                                        |
| my_project | my_dataset | my_table | my_index   | ACTIVE | c           | c.e        | STRUCT<d INT64, e ARRAY<STRING>, f STRUCT<g STRING, h INT64>> |
| my_project | my_dataset | my_table | my_index   | ACTIVE | c           | c.f.g      | STRUCT<d INT64, e ARRAY<STRING>, f STRUCT<g STRING, h INT64>> |
+------------+------------+----------+------------+--------+-------------+------------+---------------------------------------------------------------+

Search index quota

During the Preview, there is a daily limit of 10TiB for search index creation. Each time you call CREATE SEARCH INDEX, the size of the base table is counted against this limit. However, you may index a single table in a given day that is larger than 10TiB, but no larger than 100TiB. Note that the quotas are expected to be relaxed over time. Reach out to your account team if you need to index more data than these limits.

Incremental index refresh that occurs when data is added to a table with an existing search index is not counted towards this quota.

What's next