Wildcard Tables

Wildcard tables enable you to query multiple tables using concise SQL statements. Wildcard tables are available only in standard SQL. For equivalent functionality in legacy SQL, see Table Wildcard Functions.

Wildcard table syntax

Wildcard table syntax:

SELECT
FROM
  `<project-id>.<dataset-id>.<table-prefix>*`
WHERE
  bool_expression
<project-id>
Cloud Platform project ID. Optional if you use your default project ID.
<dataset-id>
BigQuery dataset ID.
<table-prefix>
A string that is common across all tables that are matched by the wildcard character. The table prefix is optional. Omitting the table prefix matches all tables in the dataset. See Best practices for information about omitting the table prefix.
* (wildcard character)
The wildcard character, "*", represents one more characters of a table name. The wildcard character can appear only as the final character of a wildcard table name.

The wildcard table WHERE clause also supports the _TABLE_SUFFIX pseudo column, which contains values matched by the wildcard character. You can use _TABLE_SUFFIX to filter your query using a comparison operator. For example, the following WHERE clauses use comparison operators:

WHERE
  _TABLE_SUFFIX BETWEEN '29' AND '40'

WHERE
  _TABLE_SUFFIX = '1929'

WHERE
  _TABLE_SUFFIX < '1941'

For more information about the _TABLE_SUFFIX pseudo column, see Filtering selected tables using _TABLE_SUFFIX.

Enclose table names with wildcards in backticks

The wildcard table name contains the special character (*), which means that you must enclose the wildcard table name in backtick (`) characters. For example, the following query is valid because it uses backticks:

#standardSQL
/* Valid standard SQL query */
SELECT
  max
FROM
  `bigquery-public-data.noaa_gsod.gsod*`
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX = '1929'
ORDER BY
  max DESC

The following query is NOT valid because it isn't properly quoted with backticks:

#standardSQL
/* Syntax error: Expected end of statement but got "-" at [4:11] */
SELECT
  max
FROM
  # missing backticks
  bigquery-public-data.noaa_gsod.gsod*
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX = '1929'
ORDER BY
  max DESC

Quotation marks do not work:

#standardSQL
/* Syntax error: Unexpected string literal: 'bigquery-public-data.noaa_gsod.gsod*' at [4:3] */
SELECT
  max
FROM
  # quotes are not backticks
  'bigquery-public-data.noaa_gsod.gsod*'
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX = '1929'
ORDER BY
  max DESC

Migrating legacy SQL table wildcard functions

In legacy SQL, you can use the following table wildcard functions to query multiple tables.

  • TABLE_DATE_RANGE() and TABLE_DATE_RANGE_STRICT()
  • TABLE_QUERY()

The TABLE_DATE_RANGE() functions

The legacy SQL TABLE_DATE_RANGE() functions work on tables that conform to a specific naming scheme: <prefix>YYYYMMDD, where the <prefix> represents the first part of a table name and YYYYMMDD represents the date associated with that table's data.

For example, the following legacy SQL query finds the average temperature from a set of daily tables that contain Seattle area weather data:

#legacySQL
SELECT
  ROUND(AVG(TemperatureF),1) AS AVG_TEMP_F
FROM
  TABLE_DATE_RANGE([mydataset.sea_weather_],
                    TIMESTAMP("2016-05-01"),
                    TIMESTAMP("2016-05-09"))

In standard SQL, an equivalent query uses a table wildcard and the BETWEEN clause.

#standardSQL
SELECT
  ROUND(AVG(TemperatureF),1) AS AVG_TEMP_F
FROM
  `mydataset.sea_weather_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160501' AND '20160509'

The TABLE_QUERY() function

The legacy SQL TABLE_QUERY() function enables you to find table names based on patterns. When migrating a TABLE_QUERY() function to standard SQL, which does not support the TABLE_QUERY() function, you can instead filter using the _TABLE_SUFFIX pseudo column. Keep the following differences in mind when migrating:

  • In legacy SQL, you place the TABLE_QUERY() function in the FROM clause, whereas in standard SQL, you filter using the _TABLE_SUFFIX pseudo column in the WHERE clause.

  • In legacy SQL, the TABLE_QUERY() function operates on the entire table name (or table_id), whereas in standard SQL, the _TABLE_SUFFIX pseudo column contains part or all of the table name, depending on how you use the wildcard character.

Filter in the WHERE clause

When migrating from legacy SQL to standard SQL, move the filter to the WHERE clause. For example, the following query finds the maximum temperatures across all years that end in the number 0:

#legacySQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  year
FROM
  TABLE_QUERY([bigquery-public-data:noaa_gsod],
               'REGEXP_MATCH(table_id, r"0$")')
WHERE
  max != 9999.9 # code for missing data
  AND max > 100 # to improve ORDER BY performance
ORDER BY
  max DESC

In standard SQL, an equivalent query uses a table wildcard and places the regular expression function, REGEXP_CONTAINS(), in the WHERE clause:

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod*`
WHERE
  max != 9999.9 # code for missing data
  AND max > 100 # to improve ORDER BY performance
  AND REGEXP_CONTAINS(_TABLE_SUFFIX, r"0$")
ORDER BY
  max DESC

Differences between table_id and _TABLE_SUFFIX

In the legacy SQL TABLE_QUERY(dataset, expr) function, the second parameter is an expression that operates over the entire table name, using the value table_id. When migrating to standard SQL, the filter that you create in the WHERE clause operates on the value of _TABLE_SUFFIX, which can include part or all of the table name, depending on your use of the wildcard character.

For example, the following legacy SQL query uses the entire table name in a regular expression to find the maximum temperatures across all years that end in the number 0:

#legacySQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  year
FROM
  TABLE_QUERY([bigquery-public-data:noaa_gsod],
               'REGEXP_MATCH(table_id, r"gsod\d{3}0")')
WHERE
  max != 9999.9 # code for missing data
  AND max > 100 # to improve ORDER BY performance
ORDER BY
  max DESC

In standard SQL, an equivalent query can use the entire table name or only a part of the table name. You can use an empty prefix in standard SQL so that your filter operates over the entire table name:

# Standard SQL empty prefix
FROM
  `bigquery-public-data.noaa_gsod.*`

However, longer prefixes perform better than empty prefixes, so the following example uses a longer prefix, which means that the value of _TABLE_SUFFIX is only part of the table name.

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod*`
WHERE
  max != 9999.9 # code for missing data
  AND max > 100 # to improve ORDER BY performance
  AND REGEXP_CONTAINS(_TABLE_SUFFIX, r"\d{3}0")
ORDER BY
  max DESC

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...