Querying Multiple Tables Using a Wildcard Table

Wildcard tables enable you to query multiple tables using concise SQL statements. Wildcard tables are available only in standard SQL. For equivalent functionality in legacy SQL, see Table Wildcard Functions.

A wildcard table represents a union of all the tables that match the wildcard expression. For example, the following FROM clause uses the wildcard expression gsod* to match all tables in the noaa_gsod dataset that begin with the string gsod.

FROM
  `bigquery-public-data.noaa_gsod.gsod*`

Each row in the wildcard table contains a special column that contains the value matched by the wildcard character.

For information on wilcard table syntax, see Wildcard Tables in the standard SQL reference.

Before you begin

When to use wildcard tables

Wildcard tables are useful when a dataset contains multiple, similarly named tables that have compatible schemas. Typically, such datasets contain tables that each represent data from a single day, month, or year. For example, a public dataset hosted by BigQuery, the NOAA Global Surface Summary of the Day Weather Data, contains a table for each year from 1929 through the present.

A query that scans all the table IDs from 1929 through 1940 would be very long if you have to name all 12 tables in the FROM clause (most of the tables are omitted in this sample):

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM (
  SELECT
    *
  FROM
    `bigquery-public-data.noaa_gsod.gsod1929` UNION ALL
  SELECT
    *
  FROM
    `bigquery-public-data.noaa_gsod.gsod1930` UNION ALL
  SELECT
    *
  FROM
    `bigquery-public-data.noaa_gsod.gsod1931` UNION ALL

  # ... Tables omitted for brevity

  SELECT
    *
  FROM
    `bigquery-public-data.noaa_gsod.gsod1940` )
WHERE
  max != 9999.9 # code for missing data
ORDER BY
  max DESC

The same query using a wildcard table is much more concise:

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod19*`
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX BETWEEN '29'
  AND '40'
ORDER BY
  max DESC
Wildcard tables support native BigQuery storage only. You cannot use wildcards when querying an external table or a view.

Querying sets of tables using wildcard tables

Wildcard tables enable you to query several tables concisely. For example, the National Oceanic and Atmospheric Administration (NOAA) GSOD weather dataset contains annual tables that all share the common prefix gsod followed by the four-digit year. The tables are named gsod1929, gsod1930, gsod1931, and so on.

To query a group of tables that share a common prefix, use the table wildcard symbol (*) after the table prefix in your FROM statement. For example, the following query finds the maximum temperature reported during the 1940s:

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod194*`
WHERE
  max != 9999.9 # code for missing data
ORDER BY
  max DESC

Filtering selected tables using _TABLE_SUFFIX

To restrict the query so that it scans an arbitrary set of tables, use the _TABLE_SUFFIX pseudo column in the WHERE clause. The _TABLE_SUFFIX pseudo column contains the values matched by the table wildcard. For example, the previous sample query that scans all tables from the 1940s uses a table wildcard to represent the last digit of the year:

FROM
  `bigquery-public-data.noaa_gsod.gsod194*`

The corresponding _TABLE_SUFFIX pseudo column contains values in the range 0 through 9, representing the tables gsod1940 through gsod1949. These _TABLE_SUFFIX values can be used in the WHERE clause to filter for specific tables.

For example, to filter for the maximum temperature in the years 1940 and 1944, use the values 0 and 4 for _TABLE_SUFFIX:

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod194*`
WHERE
  max != 9999.9 # code for missing data
  AND ( _TABLE_SUFFIX = '0'
    OR _TABLE_SUFFIX = '4' )
ORDER BY
  max DESC

Using _TABLE_SUFFIX can greatly reduce the number of bytes scanned, which reduces the cost of running your queries.

Filters on _TABLE_SUFFIX that include subqueries cannot be used to limit the number of tables scanned for a wildcard table. For example, the following query does not limit the tables scanned for the wildcard table bigquery-public-data.noaa_gsod.gsod19*

#standardSQL
# Scans all tables that match the prefix `gsod19`
SELECT
  ROUND((max-32)*5/9,1) celsius
FROM
  `bigquery-public-data.noaa_gsod.gsod19*`
WHERE
  _TABLE_SUFFIX = (SELECT SUBSTR(MAX(table_id), LENGTH('gsod19') + 1)
      FROM `bigquery-public-data.noaa_gsod.__TABLES_SUMMARY__`
      WHERE table_id LIKE 'gsod194%')

The following query limits the scan based on the filter condition, _TABLE_SUFFIX BETWEEN '40' and '60', but does not limit it based on the condition involving the subquery.

#standardSQL
# Scans all tables with names that fall between `gsod1940` and `gsod1960`
SELECT
  ROUND((max-32)*5/9,1) celsius
FROM
  `bigquery-public-data.noaa_gsod.gsod19*`
WHERE
  _TABLE_SUFFIX BETWEEN '40' AND '60'
  AND _TABLE_SUFFIX = (SELECT SUBSTR(MAX(table_id), LENGTH('gsod19') + 1)
      FROM `bigquery-public-data.noaa_gsod.__TABLES_SUMMARY__`
      WHERE table_id LIKE 'gsod194%')

To limit it based on the condition involving the subquery, you can perform two separate queries.

First query:

#standardSQL
# Get the list of tables that match the required table name prefixes
SELECT SUBSTR(MAX(table_id), LENGTH('gsod19') + 1)
      FROM `bigquery-public-data.noaa_gsod.__TABLES_SUMMARY__`
      WHERE table_id LIKE 'gsod194%'

Second query:

#standardSQL
# Construct the second query based on the values from the first query
SELECT
  ROUND((max-32)*5/9,1) celsius
FROM
  `bigquery-public-data.noaa_gsod.gsod19*`
WHERE _TABLE_SUFFIX = '49'

Scanning a range of tables using _TABLE_SUFFIX

To scan a range of tables, use the _TABLE_SUFFIX pseudo column along with the BETWEEN clause. For example, to find the maximum temperature reported in the years between 1929 and 1935 inclusive, use the table wildcard to represent the last two digits of the year:

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod19*`
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX BETWEEN '29' and '35'
ORDER BY
  max DESC

Querying all tables in a dataset

To scan all tables in a dataset, you can use an empty prefix and the table wildcard, which means that the _TABLE_SUFFIX pseudo column contains full table names. For example, the following FROM clause scans all tables in the GSOD dataset:

FROM
  `bigquery-public-data.noaa_gsod.*`

With an empty prefix, the _TABLE_SUFFIX pseudo column contains full table names. For example, the following query is equivalent to the previous example that finds the maximum temperature between the years 1929 and 1935, but uses full table names in the WHERE clause:

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.*`
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX BETWEEN 'gsod1929' and 'gsod1935'
ORDER BY
  max DESC

Note, however, that longer prefixes generally perform better. For more information, see Best practices.

Query execution details

Schema used for query evaluation

In order to execute a standard SQL query that uses a wildcard table, BigQuery automatically infers the schema for that table. BigQuery uses the schema for the most recently created table that matches the wildcard as the schema for the wildcard table. If the schema is inconsistent across tables matched by the wildcard table, BigQuery returns an error.

Best practices

Longer prefixes generally perform better than shorter prefixes. For example, the following query uses a long prefix (gsod200):

#standardSQL
SELECT
  max
FROM
  `bigquery-public-data.noaa_gsod.gsod200*`
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX BETWEEN '0' AND '1'
ORDER BY
  max DESC

The following query generally performs worse because it uses an empty prefix:

#standardSQL
SELECT
  max
FROM
  `bigquery-public-data.noaa_gsod.*`
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX BETWEEN 'gsod2000' AND 'gsod2001'
ORDER BY
  max DESC

Limitations

Wildcard table queries are subject to the following limitations.

  • The wildcard table does not support views. The query returns an error if the wildcard table matches any views in the dataset.
  • Currently, cached results are not supported for queries against multiple tables using a wildcard function (even if the Use Cached Results option is checked). If you run the same wildcard query multiple times, you are billed for each query.
  • Wildcard tables support native BigQuery storage only. You cannot use wildcards when querying an external table or a view.

Next steps

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...