Search functions

Stay organized with collections Save and categorize content based on your preferences.

Google Standard SQL for BigQuery supports the following search functions.

SEARCH(search_data, search_query[, json_scope=>json_scope_value][, analyzer=>analyzer_name])

json_scope_value:
  { 'JSON_VALUES' | 'JSON_KEYS' | 'JSON_KEYS_AND_VALUES' }

analyzer_name:
  { 'LOG_ANALYZER' | 'NO_OP_ANALYZER' }

Description

The SEARCH function checks to see whether a BigQuery table or other search data contains a set of search terms. It returns TRUE if all tokens in the search_query appear in the search_data based on the tokenization described in the text analyzer, and FALSE otherwise. If search_query is NULL or doesn't contain any search terms, an error is thrown.

SEARCH is designed to work with search indexes to optimize point lookups. Although the SEARCH function works for columns of a table that are not indexed, its performance will be greatly improved with a search index.

Arguments

  • search_data: the data to search over, in the form of a STRING literal, list of columns, or table reference. A table reference is evaluated as a STRUCT whose fields are the columns of the table. A column reference must be one of the following types:

    • ARRAY<STRING>
    • ARRAY<STRUCT>
    • JSON
    • STRING
    • STRUCT
  • search_query: a STRING literal that represents the terms of the search query. If search_query is NULL, an error is returned. If search_query contains no tokens and the text analyzer is LOG_ANALYZER, an error is returned.

  • json_scope: This optional mandatory-named argument takes one of the following values to indicate the scope of JSON data to be searched. It has no effect if search_data is not a JSON or does not contain a JSON field.

    • 'JSON_VALUES': Only the JSON values are searched. If json_scope is not provided, this is used by default.
    • 'JSON_KEYS': Only the JSON keys are searched.
    • 'JSON_KEYS_AND_VALUES': The JSON keys and values are searched.
  • analyzer: This optional mandatory-named argument takes one of the following values to indicate the text analyzer to use:

    • 'LOG_ANALYZER': (Default) This text analyzer tokenizes and normalizes the text according to the rules in the following section.
    • 'NO_OP_ANALYZER': No tokenization or normalization is applied to the text. If the underlying search index uses NO_OP_ANALYZER, then you must set this argument when you search against the table; otherwise, the index is not used.

Text analyzer rules

The search_data and the search_query are run through a text analyzer. The analyzer should match the one used to create the index.

The SEARCH function supports two text analyzers: NO_OP_ANALYZER and LOG_ANALYZER. Use the NO_OP_ANALYZER when you have pre-processed data that you want to match exactly. No tokenization or normalization is applied to the text. The LOG_ANALYZER modifies text data in the following ways:

  • Text is made lowercase. ASCII values greater than 127 are kept as is.
  • Text is split into individual terms called tokens by the following delimiters:

    [ ] < > ( ) { } | ! ; , ' " * & ? + / : = @ . - $ % \ _ \n \r \s \t %21 %26 %2526 %3B %7C %20 %2B %3D %2520 %5D %5B %3A %0A %2C %28 %29

    For example, Hello|WORLD becomes hello and world.

The search_query has the following additional rules for parsing which do not apply to search_data:

  • Text enclosed in backticks forces an exact match.

    For example, `Hello World` happy days becomes Hello World, happy, and days.

  • Search query text enclosed in backticks must match exactly in search_data, subject to the following conditions:

    • It appears at the start of search_data or is immediately preceded by a delimiter.
    • It appears at the end of search_data or is immediately followed by a delimiter.

    For example, SEARCH('foo.bar', '`foo.`') returns FALSE because the text enclosed in the backticks foo. is immediately followed by the character b in the search data foo.bar, rather than by a delimiter or the end of the string. However, SEARCH('foo..bar', '`foo.`') returns TRUE because foo. is immediately followed by the delimiter . in the search data.

  • The backtick itself can be escaped using a backslash, as in \`foobar\`.

  • The following are reserved words and must be enclosed in backticks:

    AND, NOT, OR, IN, and NEAR

  • Text not enclosed in backticks requires the following reserved characters to be escaped by a double backslash \\:

    • [ ] < > ( ) { } | ! ' " * & ? + / : = - \ ~ ^
    • If the quoted string is preceded by the character r or R, such as r"my\+string", then it is treated as a raw string and only a single backslash is required to escape the reserved characters. For more information about raw strings and escape sequences, see String and byte literals.

The terms within search_data must contain all query terms, in any order, from the search_query for the function to return TRUE. To perform a cross-field search, search_data must be a STRUCT, ARRAY, or JSON. Each STRING field of a STRUCT, ARRAY, or JSON is individually searched for the query terms. If at least one field includes all search_query tokens, in any order, SEARCH returns TRUE. Otherwise it has the following behavior:

  • If at least one field is a NULL STRING, SEARCH returns NULL.
  • Otherwise, SEARCH returns FALSE.

Return type

BOOL

Examples

The following table shows examples of how search_data is broken into searchable terms by the LOG_ANALYZER text analyzer. All entries are strings.

search_data searchable terms
127.0.0.1 127
0
1
127.0.0.1
127.0.0
127.0
0.0
0.0.1
0.1
foo@bar.com foo
bar
com
foo@bar
bar.com
foo@bar.com
The fox. the
fox
The
The fox
The fox.
fox
fox.

The following table shows examples of how search_query is broken into query terms by the LOG_ANALYZER text analyzer. All entries are strings.

search_query query terms
127.0.0.1 127
0
1
`127.0.0.1` 127.0.0.1
foo@bar.com foo
bar
com
`foo@bar.com` foo@bar.com

The following table shows examples of calls to the SEARCH function and reasons for various return values:

+------------------------------------+---------+-----------------------------------------------------------------+
| function call                      | returns | reason                                                          |
+------------------------------------+---------+-----------------------------------------------------------------+
| SEARCH('foobar', NULL)             | ERROR   | The search_query is NULL.                                       |
| SEARCH('foobar', '')               | ERROR   | The search_query contains no tokens.                            |
| SEARCH('foo-bar', 'foo bar')       | TRUE    | '-' and ' ' are delimiters.                                     |
| SEARCH('foo-bar', 'foobar')        | FALSE   | The search_query is not split.                                  |
| SEARCH('foo-bar', 'foo\\&bar')     | TRUE    | The double backslash escapes the ampersand which is a delimiter.|
| SEARCH('foo-bar', R'foo\&bar')     | TRUE    | The single backslash escapes the ampersand in a raw string.     |
| SEARCH('foo-bar', '`foo&bar`')     | FALSE   | The backticks require an exact match for foo&bar.               |
| SEARCH('foo&bar', '`foo&bar`')     | TRUE    | An exact match is found.                                        |
| SEARCH('foo-bar', 'bar foo')       | TRUE    | The order of terms doesn't matter.                              |
| SEARCH('foo-bar', 'Foo Bar')       | TRUE    | Tokens are made lower-case.                                     |
| SEARCH('foo-bar', '`foo-bar`')     | TRUE    | An exact match is found.                                        |
| SEARCH('foo-bar', '`Foo`')         | FALSE   | Backticks preserve capitalization.                              |
| SEARCH('`foo-bar`', '`foo-bar`')   | FALSE   | Backticks don't have special meaning for search_data and are    |
|                                    |         | not delimiters in the default LOG_ANALYZER.                     |
| SEARCH('foo@bar.com', '`bar.com`') | TRUE    | An exact match is found after the delimiter in search_data.     |
| SEARCH('a foo-bar b', '`foo-bar`') | TRUE    | An exact match is found between the space delimiters.           |
| SEARCH(['foo', 'bar'], 'foo bar')  | FALSE   | No single array entry matches all search terms.                 |
| SEARCH('foo=', '`foo\\=`')         | FALSE   | The search_query is equivalent to foo\=.                        |
| SEARCH('foo=', R'`foo\=`')         | FALSE   | This is equivalent to the previous example.                     |
| SEARCH('foo=', 'foo\\=')           | TRUE    | The equals sign is a delimiter in the data and query.           |
| SEARCH('foo=', R'foo\=')           | TRUE    | This is equivalent to the previous example.                     |
| SEARCH('foo.bar', '`foo`')         | TRUE    | An exact match is found.                                        |
| SEARCH('foo.bar', '`foo.`')        | FALSE   | `foo.` is not analyzed because of backticks; it is not followed |
|                                    |         | by a delimiter in search_data 'foo.bar'.                        |
| SEARCH('foo..bar', '`foo.`')       | TRUE    | `foo.` is not analyzed because of backticks; it is followed by  |
|                                    |         | the delimiter '.' in search_data 'foo..bar'.                    |
+------------------------------------+---------+-----------------------------------------------------------------+

The following table shows examples of calls to the SEARCH function using the NO_OP_ANALYZER text analyzer and reasons for various return values:

+------------------------------------------------------+---------+----------------------------------------------------------+
| function call                                        | returns | reason                                                   |
+------------------------------------------------------+---------+----------------------------------------------------------+
| SEARCH('foo', 'foo', analyzer=>'NO_OP_ANALYZER')     | TRUE    | An exact match is found.                                 |
| SEARCH('foo', '`foo`', analyzer=>'NO_OP_ANALYZER')   | FALSE   | Backticks are not special characters for NO_OP_ANALYZER. |
| SEARCH('Foo', 'foo', analyzer=>'NO_OP_ANALYZER')     | FALSE   | The capitalization does not match.                       |
| SEARCH('foo bar', 'foo', analyzer=>'NO_OP_ANALYZER') | FALSE   | There are no delimiters for NO_OP_ANALYZER.              |
| SEARCH('', '', analyzer=>'NO_OP_ANALYZER')           | TRUE    | An exact match is found.                                 |
+------------------------------------------------------+---------+----------------------------------------------------------+

Consider the following table called meals with columns breakfast, lunch, and dinner:

+-------------------+-------------------------+------------------+
| breakfast         | lunch                   | dinner           |
+-------------------+-------------------------+------------------+
| Potato pancakes   | Toasted cheese sandwich | Beef soup        |
| Avocado toast     | Tomato soup             | Chicken soup     |
+-------------------+-------------------------+------------------+

The following query shows how to search single columns, multiple columns, and whole tables.

SELECT
  SEARCH(lunch, 'soup') AS lunch_soup,
  SEARCH((breakfast, dinner), 'soup') AS breakfast_or_dinner_soup,
  SEARCH(meals, 'soup') AS anytime_soup
FROM meals;

+------------+--------------------------+--------------+
| lunch_soup | breakfast_or_dinner_soup | anytime_soup |
+------------+--------------------------+--------------+
| false      | true                     | true         |
| true       | true                     | true         |
+------------+--------------------------+--------------+