Stay organized with collections Save and categorize content based on your preferences.

Search with an index

The SEARCH function provides tokenized search on data. This page provides examples of searching in BigQuery. For more details about the syntax of the SEARCH function, see the reference section on Search functions.

SEARCH is designed to be used with an index to optimize lookups. The SEARCH function always returns correct results from all ingested data, even if some of the data isn't indexed yet. However, its performance is greatly improved with an index. Savings in bytes processed and slot milliseconds are maximized when the number of search results make up a relatively small fraction of the total rows because less data is scanned. To determine whether an index was used for a query, see search index usage.

Create a table with a search index

The following table called Logs will be used in examples to show different ways of using the SEARCH function. This example table is quite small, but in practice the performance gains you get with SEARCH improve with the size of the dataset.

CREATE TABLE my_dataset.Logs (Level STRING, Source STRING, Message STRING)
AS (
  SELECT 'INFO' as Level, '65.177.8.234' as Source, 'Entry Foo-Bar created' as Message
  UNION ALL
  SELECT 'WARNING', '132.249.240.10', 'Entry Foo-Bar already exists, created by 65.177.8.234'
  UNION ALL
  SELECT 'INFO', '94.60.64.181', 'Entry Foo-Bar deleted'
  UNION ALL
  SELECT 'SEVERE', '4.113.82.10', 'Entry Foo-Bar does not exist, deleted by 94.60.64.181'
  UNION ALL
  SELECT 'INFO', '181.94.60.64', 'Entry Foo-Baz created'
);

The table looks like the following:

+---------+----------------+-------------------------------------------------------+
| Level   | Source         | Message                                               |
+---------+----------------+-------------------------------------------------------+
| INFO    | 65.177.8.234   | Entry Foo-Bar created                                 |
| WARNING | 132.249.240.10 | Entry Foo-Bar already exists, created by 65.177.8.234 |
| INFO    | 94.60.64.181   | Entry Foo-Bar deleted                                 |
| SEVERE  | 4.113.82.10    | Entry Foo-Bar does not exist, deleted by 94.60.64.181 |
| INFO    | 181.94.60.64   | Entry Foo-Baz created                                 |
+---------+----------------+-------------------------------------------------------+

Create a search index on the Logs table using the default text analyzer:

CREATE SEARCH INDEX my_index ON my_dataset.Logs(ALL COLUMNS);

Search an entire table

The following query searches across all columns of the Logs table for the value bar and returns the rows that contain this value, regardless of capitalization. Since the search index uses the default text analyzer, you don't need to specify it in the SEARCH function.

SELECT * FROM my_dataset.Logs WHERE SEARCH(Logs, 'bar');
+---------+----------------+-------------------------------------------------------+
| Level   | Source         | Message                                               |
+---------+----------------+-------------------------------------------------------+
| INFO    | 65.177.8.234   | Entry Foo-Bar created                                 |
| WARNING | 132.249.240.10 | Entry Foo-Bar already exists, created by 65.177.8.234 |
| INFO    | 94.60.64.181   | Entry Foo-Bar deleted                                 |
| SEVERE  | 4.113.82.10    | Entry Foo-Bar does not exist, deleted by 94.60.64.181 |
+---------+----------------+-------------------------------------------------------+

The following query searches across all columns of the Logs table for the value `94.60.64.181` and returns the rows that contain this value. The backticks allow for an exact search, which is why the last row of the Logs table which contains 181.94.60.64 is omitted.

SELECT * FROM my_dataset.Logs WHERE SEARCH(Logs, '`94.60.64.181`');
+---------+----------------+-------------------------------------------------------+
| Level   | Source         | Message                                               |
+---------+----------------+-------------------------------------------------------+
| INFO    | 94.60.64.181   | Entry Foo-Bar deleted                                 |
| SEVERE  | 4.113.82.10    | Entry Foo-Bar does not exist, deleted by 94.60.64.181 |
+---------+----------------+-------------------------------------------------------+

Search a subset of columns

SEARCH makes it easy to specify a subset of columns within which to search for data. The following query searches the Message column of the Logs table for the value 94.60.64.181 and returns the rows that contain this value.

SELECT * FROM my_dataset.Logs WHERE SEARCH(Message, '`94.60.64.181`');
+---------+----------------+-------------------------------------------------------+
| Level   | Source         | Message                                               |
+---------+----------------+-------------------------------------------------------+
| SEVERE  | 4.113.82.10    | Entry Foo-Bar does not exist, deleted by 94.60.64.181 |
+---------+----------------+-------------------------------------------------------+

The following query searches both the Source and Message columns of the Logs table. It returns the rows that contain the value 94.60.64.181 from either column.

SELECT * FROM my_dataset.Logs WHERE SEARCH((Source, Message), '`94.60.64.181`');
+---------+----------------+-------------------------------------------------------+
| Level   | Source         | Message                                               |
+---------+----------------+-------------------------------------------------------+
| INFO    | 94.60.64.181   | Entry Foo-Bar deleted                                 |
| SEVERE  | 4.113.82.10    | Entry Foo-Bar does not exist, deleted by 94.60.64.181 |
+---------+----------------+-------------------------------------------------------+

If a table table has many columns and you want to search most of them, it may be easier to specify only the columns to exclude from the search. The following query searches across all columns of the Logs table except for the Message column. It returns the rows of any columns other than Message that contains the value 94.60.64.181.

SELECT *
FROM my_dataset.Logs
WHERE SEARCH(
  (SELECT AS STRUCT Logs.* EXCEPT (Message)), '`94.60.64.181`');
+---------+----------------+---------------------------------------------------+
| Level   | Source         | Message                                           |
+---------+----------------+---------------------------------------------------+
| INFO    | 94.60.64.181   | Entry Foo-Bar deleted                             |
+---------+----------------+---------------------------------------------------+

Use a different text analyzer

The following example creates a table called contact_info with an index that uses the NO_OP_ANALYZER text analyzer:

CREATE TABLE my_dataset.contact_info (name STRING, email STRING)
AS (
  SELECT 'Kim Lee' AS name, 'kim.lee@example.com' AS email
  UNION ALL
  SELECT 'Kim' AS name, 'kim@example.com' AS email
  UNION ALL
  SELECT 'Sasha' AS name, 'sasha@example.com' AS email
);
CREATE SEARCH INDEX noop_index ON my_dataset.contact_info(ALL COLUMNS)
OPTIONS (analyzer = 'NO_OP_ANALYZER');
+---------+---------------------+
| name    | email               |
+---------+---------------------+
| Kim Lee | kim.lee@example.com |
| Kim     | kim@example.com     |
| Sasha   | sasha@example.com   |
+---------+---------------------+

The following query searches for Kim in the name column and kim in the email column. Since the search index doesn't use the default text analyzer, you must pass the name of the analyzer to the SEARCH function.

SELECT
  name,
  SEARCH(name, 'Kim', analyzer=>'NO_OP_ANALYZER') AS name_Kim,
  email,
  SEARCH(email, 'kim', analyzer=>'NO_OP_ANALYZER') AS email_kim
FROM
  my_dataset.contact_info;

The NO_OP_ANALYZER doesn't modify the text, so the SEARCH function only returns TRUE for exact matches:

+---------+----------+---------------------+-----------+
| name    | name_Kim | email               | email_kim |
+---------+----------+---------------------+-----------+
| Kim Lee | FALSE    | kim.lee@example.com | FALSE     |
| Kim     | TRUE     | kim@example.com     | FALSE     |
| Sasha   | FALSE    | sasha@example.com   | FALSE     |
+---------+----------+---------------------+-----------+

Search index usage

To determine whether a search index was used for a query, look at the query's Job Information in Query results. The Index Usage Mode and Index Unused Reasons fields provide detailed information on search index usage.

Job information showing why a search index was unused.

Information on search index usage is also available through the searchStatistics field in the Jobs.Get API method. The indexUsageMode field in searchStatistics indicates whether a search index was used with the following values:

  • UNUSED: no search index was used.
  • PARTIALLY_USED: part of the query used search indexes and part did not.
  • FULLY_USED: every SEARCH function in the query used a search index.

When indexUsageMode is UNUSED or PARTIALLY_USED, the indexUnusuedReasons field contains information about why search indexes were not used in the query.

To view searchStatistics for a query, run the bq show command.

bq show --format=prettyjson -j JOB_ID

Example

Suppose you run a query that calls the SEARCH function on data in a table. You can view the job details of the query to find the job ID, then run the bq show command to see more information:

bq show --format=prettyjson --j my_project:US.bquijob_123x456_789y123z456c

The output contains many fields, including searchStatistics, which looks similar to the following. In this example, indexUsageMode indicates that the index was not used. The reason is that the table doesn't have a search index. To solve this problem, create a search index on the table. See the indexUnusedReason code field for a list of all reasons a search index might not be used in a query.

"searchStatistics": {
  "indexUnusedReasons": [
    {
      "baseTable": {
        "datasetId": "my_dataset",
        "projectId": "my_project",
        "tableId": "my_table"
      },
      "code": "INDEX_CONFIG_NOT_AVAILABLE",
      "message": "There is no search index configuration for the base table `my_project:my_dataset.my_table`."
    }
  ],
  "indexUsageMode": "UNUSED"
},

Best practices

The following sections describe best practices when using the SEARCH function.

Search selectively

Searching works best when your search has few results. Make your searches as specific as possible.

If you know which columns of a table should contain your search terms, then restrict your search to only those columns. This improves performance and reduces the number of bytes that need to be scanned.

Use backticks

Enclosing your search query in backticks forces an exact match. This is helpful if your search is case-sensitive or contains characters that shouldn't be interpreted as delimiters. For example, to search for the IP address 192.0.2.1, use `192.0.2.1`. Without the backticks, the search returns any row that contains the individual tokens 192, 0, 2, and 1, in any order.

Query optimization

Queries that use the SEARCH function on a very large partitioned table are optimized when you use an ORDER BY clause on the partitioned field and a LIMIT clause. The optimization is applied whether or not the table is indexed. This works well if you're searching for a common term. For example, suppose the Logs table created earlier is partitioned on an additional DATE type column called day. The following query is optimized:

SELECT
  Level, Source, Message
FROM
  my_dataset.Logs
WHERE
  SEARCH(Message, "foo")
ORDER BY
  day
LIMIT 10;