Search functions

BigQuery supports the following search functions.

SEARCH(search_data, search_query, [json_scope=>values])

values:
  { 'JSON_VALUES' | 'JSON_KEYS' | 'JSON_KEYS_AND_VALUES' }

Description

The SEARCH function checks to see whether a BigQuery table or other search data contains a set of search terms. It returns TRUE if all tokens in the search_query appear in the search_data based on the tokenization described in the text analyzer, and FALSE otherwise. If search_query is NULL or doesn't contain any search terms, an error is thrown.

SEARCH is designed to work with search indexes to optimize point lookups. Although the SEARCH function works for columns of a table that are not indexed, its performance will be greatly improved with a search index.

Arguments

  • search_data: the data to search over, in the form of a STRING literal, list of columns, or table reference. A table reference is evaluated as a STRUCT whose fields are the columns of the table. A column reference must be one of the following types:

    • ARRAY<STRING>
    • ARRAY<STRUCT>
    • JSON
    • STRING
    • STRUCT
  • search_query: a STRING literal that represents the terms of the search query. If search_query is NULL or empty, an error is returned.

  • json_scope: This optional mandatory-named argument takes one of the following values to indicate the scope of JSON data to be searched. It has no effect if search_data is not a JSON or does not contain a JSON field.

    • 'JSON_VALUES': Only the JSON values are searched. If json_scope is not provided, this is used by default.
    • 'JSON_KEYS': Only the JSON keys are searched.
    • 'JSON_KEYS_AND_VALUES': The JSON keys and values are searched.

Text analyzer

The search_data and the search_query are run through a text analyzer that splits them into individual terms according to the following rules:

  • Text is made lowercase. ASCII values greater than 127 are kept as is.
  • Text is split into individual terms called tokens by the following breakers:

    [ ] < > ( ) { } | ! ; , ' " * & ? + / : = @ . - $ % \ _ \n \r \s \t %21 %26 %2526 %3B %7C %20 %2B %3D %2520 %5D %5B %3A %0A %2C %28 %29

    For example, Hello|WORLD becomes hello and world.

The search_query has the following additional rules for parsing which do not apply to search_data:

  • Text enclosed in backticks forces an exact match.

    For example, `Hello World` happy days becomes Hello World, happy, and days.

  • The backtick itself can be escaped using a backslash, as in \`foobar\`.

  • The following are reserved words and must be enclosed in backticks, regardless of capitalization:

    AND, NOT, OR, IN, and NEAR

  • Text not enclosed in backticks requires the following reserved characters to be escaped by a double backslash \\:

    • [ ] < > ( ) { } | ! ' " * & ? + / : = - \ ~ ^
    • If the quoted string is preceded by the character r or R, such as r"my\+string", then it is treated as a raw string and only a single backslash is required to escape the reserved characters. For more information about raw strings and escape sequences, see String and byte literals.

The terms within search_data must contain all query terms, in any order, from the search_query for the function to return TRUE. To perform a cross-field search, the search_data must be a STRUCT or ARRAY. Each STRING field of the STRUCT or ARRAY is individually searched for the query terms. If at least one field includes all search_query tokens, in any order, SEARCH returns TRUE. Otherwise it has the following behavior:

  • If at least one field is a NULL STRING, SEARCH returns NULL.
  • Otherwise, SEARCH returns FALSE.

Return type

BOOL

Examples

The following table shows examples of how search_data is broken into searchable terms by the text analyzer. All entries should be read as strings.

search_data searchable terms
127.0.0.1 127
0
1
127.0.0.1
127.0.0
127.0
0.0
0.0.1
0.1
foo@bar.com foo
bar
com
foo@bar
bar.com
foo@bar.com
The fox. the
fox
The
The fox
The fox.
fox
fox.

The following table shows examples of how search_query is broken into query terms by the text analyzer. All entries should be read as strings.

search_query query terms
127.0.0.1 127
0
1
`127.0.0.1` 127.0.0.1
foo@bar.com foo
bar
com
`foo@bar.com` foo@bar.com

The following table shows examples of calls to the SEARCH function and reasons for various return values.

+------------------------------------+---------+----------------------------------------------------------------+
| function call                      | returns | reason                                                         |
+------------------------------------+---------+----------------------------------------------------------------+
| SEARCH('foobar', NULL)             | error   | The search_query is NULL.                                      |
| SEARCH('foobar', '')               | error   | The search_query is empty.                                     |
| SEARCH('foo-bar', 'foo bar')       | true    | '-' and ' ' are breakers.                                      |
| SEARCH('foo-bar', 'foobar')        | false   | The search_query is not split.                                 |
| SEARCH('foo-bar', 'foo\\&bar')     | true    | The double backslash escapes the ampersand which is a breaker. |
| SEARCH('foo-bar', R'foo\&bar')     | true    | The single backslash escapes the ampersand in a raw string.    |
| SEARCH('foo-bar', '`foo&bar`')     | false   | The backticks require an exact match for foo&bar.              |
| SEARCH('foo&bar', '`foo&bar`')     | true    | An exact match is found.                                       |
| SEARCH('foo-bar', 'bar foo')       | true    | The order of terms doesn't matter.                             |
| SEARCH('foo-bar', 'Foo Bar')       | true    | Tokens are made lower-case.                                    |
| SEARCH('foo-bar', '`foo-bar`')     | true    | An exact match is found.                                       |
| SEARCH('foo-bar', '`Foo`')         | false   | Backticks preserve capitalization.                             |
| SEARCH('`foo-bar`', '`foo-bar`')   | false   | Backticks don't have special meaning for search_data.          |
| SEARCH('foo@bar.com', '`bar.com`') | true    | An exact match is found after the breaker in search_data.      |
| SEARCH('a foo-bar b', '`foo-bar`') | true    | An exact match is found between the space breakers.            |
| SEARCH(['foo', 'bar'], 'foo bar')  | false   | No single array entry matches all search terms.                |
| SEARCH('foo=', '`foo\\=`')         | false   | The search_query is equivalent to foo\=.                       |
| SEARCH('foo=', R'`foo\=`')         | false   | This is equivalent to the previous example.                    |
| SEARCH('foo=', 'foo\\=')           | true    | The equals sign is a breaker in the data and query.            |
| SEARCH('foo=', R'foo\=')           | true    | This is equivalent to the previous example.                    |
+------------------------------------+---------+----------------------------------------------------------------+

Consider the following table called meals with columns breakfast, lunch, and dinner:

+-------------------+-------------------------+------------------+
| breakfast         | lunch                   | dinner           |
+-------------------+-------------------------+------------------+
| Potato pancakes   | Toasted cheese sandwich | Beef soup        |
| Avocado toast     | Tomato soup             | Chicken soup     |
+-------------------+-------------------------+------------------+

The following query shows how to search single columns, multiple columns, and whole tables.

SELECT
  SEARCH(lunch, 'soup') AS lunch_soup,
  SEARCH((breakfast, dinner), 'soup') AS breakfast_or_dinner_soup,
  SEARCH(meals, 'soup') AS anytime_soup
FROM meals;

+------------+--------------------------+--------------+
| lunch_soup | breakfast_or_dinner_soup | anytime_soup |
+------------+--------------------------+--------------+
| false      | true                     | true         |
| true       | true                     | true         |
+------------+--------------------------+--------------+