Wildcard Tables

Introduction

Wildcard tables enable you to query multiple tables using concise SQL statements. Wildcard tables are available only in standard SQL. For equivalent functionality in legacy SQL, see Table Wildcard Functions.

A wildcard table represents a union of all the tables that match the wildcard expression. For example, the following FROM clause uses the wildcard expression gsod* to match all tables in the noaa_gsod dataset that begin with the string gsod.

FROM
  `bigquery-public-data.noaa_gsod.gsod*`

Each row in the wildcard table contains a special column that contains the value matched by the wildcard character.

For instructions and sample queries that use wildcards tables, see Querying Multiple Tables Using a Wildcard Table.

When to use wildcard tables

Wildcard tables are useful when a dataset contains multiple, similarly named tables that have compatible schemas. Typically, such datasets contain tables that each represent data from a single day, month, or year. For example, a public dataset hosted by BigQuery, the NOAA Global Surface Summary of the Day Weather Data, contains a table for each year from 1929 through the present.

A query that scans all the table IDs from 1929 through 1940 would be very long if you have to name all 12 tables in the FROM clause (most of the tables are omitted in this sample):

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM (
  SELECT
    *
  FROM
    `bigquery-public-data.noaa_gsod.gsod1929` UNION ALL
  SELECT
    *
  FROM
    `bigquery-public-data.noaa_gsod.gsod1930` UNION ALL
  SELECT
    *
  FROM
    `bigquery-public-data.noaa_gsod.gsod1931` UNION ALL

  # ... Tables omitted for brevity

  SELECT
    *
  FROM
    `bigquery-public-data.noaa_gsod.gsod1940` )
WHERE
  max != 9999.9 # code for missing data
ORDER BY
  max DESC

The same query using a wildcard table is much more concise:

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod19*`
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX BETWEEN '29'
  AND '40'
ORDER BY
  max DESC

Wildcard table syntax

Wildcard table syntax:

SELECT
FROM
  `<project-id>.<dataset-id>.<table-prefix>*`
WHERE
  bool_expression
<project-id>
Cloud Platform project ID. Optional if you use your default project ID.
<dataset-id>
BigQuery dataset ID.
<table-prefix>
A string that is common across all tables that are matched by the wildcard character. The table prefix is optional. Omitting the table prefix matches all tables in the dataset. See Best practices for information about omitting the table prefix.
* (wildcard character)
The wildcard character, "*", represents one more characters of a table name. The wildcard character can appear only as the final character of a wildcard table name.

The wildcard table WHERE clause also supports the _TABLE_SUFFIX pseudo column, which contains values matched by the wildcard character. You can use _TABLE_SUFFIX to filter your query using a comparison operator. For example, the following WHERE clauses use comparison operators:

WHERE
  _TABLE_SUFFIX BETWEEN '29' AND '40'

WHERE
  _TABLE_SUFFIX = '1929'

WHERE
  _TABLE_SUFFIX < '1941'

For more information about the _TABLE_SUFFIX pseudo column, see Filtering selected tables using _TABLE_SUFFIX.

Enclose table names with wildcards in backticks

The wildcard table name contains the special character (*), which means that you must enclose the wildcard table name in backtick (`) characters. For example, the following query is valid because it uses backticks:

#standardSQL
/* Valid standard SQL query */
SELECT
  max
FROM
  `bigquery-public-data.noaa_gsod.gsod*`
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX = '1929'
ORDER BY
  max DESC

The following query is NOT valid because it isn't properly quoted with backticks:

#standardSQL
/* Syntax error: Expected end of statement but got "-" at [4:11] */
SELECT
  max
FROM
  # missing backticks
  bigquery-public-data.noaa_gsod.gsod*
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX = '1929'
ORDER BY
  max DESC

Quotation marks do not work:

#standardSQL
/* Syntax error: Unexpected string literal: 'bigquery-public-data.noaa_gsod.gsod*' at [4:3] */
SELECT
  max
FROM
  # quotes are not backticks
  'bigquery-public-data.noaa_gsod.gsod*'
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX = '1929'
ORDER BY
  max DESC

Best practices

Longer prefixes generally perform better than shorter prefixes. For example, the following query uses a long prefix (gsod200):

#standardSQL
SELECT
  max
FROM
  `bigquery-public-data.noaa_gsod.gsod200*`
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX BETWEEN '0' AND '1'
ORDER BY
  max DESC

The following query generally performs worse because it uses an empty prefix:

#standardSQL
SELECT
  max
FROM
  `bigquery-public-data.noaa_gsod.*`
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX BETWEEN 'gsod2000' AND 'gsod2001'
ORDER BY
  max DESC

Send feedback about...

BigQuery Documentation