Querying Multiple Tables Using a Wildcard Table

Before you begin

Querying sets of tables using wildcard tables

Wildcard tables enable you to query several tables concisely. For example, the National Oceanic and Atmospheric Administration (NOAA) GSOD weather dataset contains annual tables that all share the common prefix gsod followed by the four-digit year. The tables are named gsod1929, gsod1930, gsod1931, and so on.

To query a group of tables that share a common prefix, use the table wildcard symbol (*) after the table prefix in your FROM statement. For example, the following query finds the maximum temperature reported during the 1940s:

SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod194*`
WHERE
  max != 9999.9 # code for missing data
ORDER BY
  max DESC

Filtering selected tables using _TABLE_SUFFIX

To restrict the query so that it scans an arbitrary set of tables, use the _TABLE_SUFFIX pseudo column in the WHERE clause. The _TABLE_SUFFIX pseudo column contains the values matched by the table wildcard. For example, the previous sample query that scans all tables from the 1940s uses a table wildcard to represent the last digit of the year:

FROM
  `bigquery-public-data.noaa_gsod.gsod194*`

The corresponding _TABLE_SUFFIX pseudo column contains values in the range 0 through 9, representing the tables gsod1940 through gsod1949. These _TABLE_SUFFIX values can be used in the WHERE clause to filter for specific tables.

For example, to filter for the maximum temperature in the years 1940 and 1944, use the values 0 and 4 for _TABLE_SUFFIX:

SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod194*`
WHERE
  max != 9999.9 # code for missing data
  AND ( _TABLE_SUFFIX = '0'
    OR _TABLE_SUFFIX = '4' )
ORDER BY
  max DESC

Using _TABLE_SUFFIX can greatly reduce the number of bytes scanned, which reduces the cost of running your queries.

Scanning a range of tables using _TABLE_SUFFIX

To scan a range of tables, use the _TABLE_SUFFIX pseudo column along with the BETWEEN clause. For example, to find the maximum temperature reported in the years between 1929 and 1935 inclusive, use the table wildcard to represent the last two digits of the year:

SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod19*`
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX BETWEEN '29' and '35'
ORDER BY
  max DESC

Querying all tables in a dataset

To scan all tables in a dataset, you can use an empty prefix and the table wildcard, which means that the _TABLE_SUFFIX pseudo column contains full table names. For example, the following FROM clause scans all tables in the GSOD dataset:

FROM
  `bigquery-public-data.noaa_gsod.*`

With an empty prefix, the _TABLE_SUFFIX pseudo column contains full table names. For example, the following query is equivalent to the previous example that finds the maximum temperature between the years 1929 and 1935, but uses full table names in the WHERE clause:

SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.*`
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX BETWEEN 'gsod1929' and 'gsod1935'
ORDER BY
  max DESC

Note, however, that longer prefixes generally perform better. For more information, see Best practices.

Migrating legacy SQL table wildcard functions

In legacy SQL, you can use the following table wildcard functions to query multiple tables.

  • TABLE_DATE_RANGE() and TABLE_DATE_RANGE_STRICT()
  • TABLE_QUERY()

The TABLE_DATE_RANGE() functions

The legacy SQL TABLE_DATE_RANGE() functions work on tables that conform to a specific naming scheme: <prefix>YYYYMMDD, where the <prefix> represents the first part of a table name and YYYYMMDD represents the date associated with that table's data.

For example, the following legacy SQL query finds the average temperature from a set of daily tables that contain Seattle area weather data:

# Legacy SQL query
SELECT
  ROUND(AVG(TemperatureF),1) AS AVG_TEMP_F
FROM
  TABLE_DATE_RANGE([mydataset.sea_weather_],
                    TIMESTAMP("2016-05-01"),
                    TIMESTAMP("2016-05-09"))

In standard SQL, an equivalent query uses a table wildcard and the BETWEEN clause.

# Standard SQL query
SELECT
  ROUND(AVG(TemperatureF),1) AS AVG_TEMP_F
FROM
  `mydataset.sea_weather_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160501' AND '20160509'

The TABLE_QUERY() function

The legacy SQL TABLE_QUERY() function enables you to find table names based on patterns. When migrating a TABLE_QUERY() function to standard SQL, which does not support the TABLE_QUERY() function, you can instead filter using the _TABLE_SUFFIX pseudo column. Keep the following differences in mind when migrating:

  • In legacy SQL, you place the TABLE_QUERY() function in the FROM clause, whereas in standard SQL, you filter using the _TABLE_SUFFIX pseudo column in the WHERE clause.

  • In legacy SQL, the TABLE_QUERY() function operates on the entire table name (or table_id), whereas in standard SQL, the _TABLE_SUFFIX pseudo column contains part or all of the table name, depending on how you use the wildcard character.

Filter in the WHERE clause

When migrating from legacy SQL to standard SQL, move the filter to the WHERE clause. For example, the following query finds the maximum temperatures across all years that end in the number 0:

# Legacy SQL query
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  year
FROM
  TABLE_QUERY([bigquery-public-data:noaa_gsod],
               'REGEXP_MATCH(table_id, r"0$")')
WHERE
  max != 9999.9 # code for missing data
  AND max > 100 # to improve ORDER BY performance
ORDER BY
  max DESC

In standard SQL, an equivalent query uses a table wildcard and places the regular expression function, REGEXP_CONTAINS(), in the WHERE clause:

# Standard SQL query
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod*`
WHERE
  max != 9999.9 # code for missing data
  AND max > 100 # to improve ORDER BY performance
  AND REGEXP_CONTAINS(_TABLE_SUFFIX, r"0$")
ORDER BY
  max DESC

Differences between table_id and _TABLE_SUFFIX

In the legacy SQL TABLE_QUERY(dataset, expr) function, the second parameter is an expression that operates over the entire table name, using the value table_id. When migrating to standard SQL, the filter that you create in the WHERE clause operates on the value of _TABLE_SUFFIX, which can include part or all of the table name, depending on your use of the wildcard character.

For example, the following legacy SQL query uses the entire table name in a regular expression to find the maximum temperatures across all years that end in the number 0:

# Legacy SQL query
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  year
FROM
  TABLE_QUERY([bigquery-public-data:noaa_gsod],
               'REGEXP_MATCH(table_id, r"gsod\d{3}0")')
WHERE
  max != 9999.9 # code for missing data
  AND max > 100 # to improve ORDER BY performance
ORDER BY
  max DESC

In standard SQL, an equivalent query can use the entire table name or only a part of the table name. You can use an empty prefix in standard SQL so that your filter operates over the entire table name:

# Standard SQL empty prefix
FROM
  `bigquery-public-data.noaa_gsod.*`

However, longer prefixes perform better than empty prefixes, so the following example uses a longer prefix, which means that the value of _TABLE_SUFFIX is only part of the table name.

# Standard SQL query
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod*`
WHERE
  max != 9999.9 # code for missing data
  AND max > 100 # to improve ORDER BY performance
  AND REGEXP_CONTAINS(_TABLE_SUFFIX, r"\d{3}0")
ORDER BY
  max DESC

Query execution details

Schema used for query evaluation

In order to execute a standard SQL query that uses a wildcard table, BigQuery automatically infers the schema for that table. BigQuery uses the schema for the most recently created table that matches the wildcard as the schema for the wildcard table. If the schema is inconsistent across tables matched by the wildcard table, BigQuery returns an error.

Wildcard tables do not support views

The wildcard table does not support views. The query returns an error if the wildcard table matches any views in the dataset.

Next steps

Send feedback about...

BigQuery Documentation