Wildcard tables
A wildcard table enables you to query multiple tables using concise SQL statements. A wildcard table represents a union of all the tables that match the wildcard expression.
Wildcard tables are available only in Google Standard SQL. For equivalent functionality in legacy SQL, see Table wildcard functions.
For an overview of wildcard tables, see Querying multiple tables using a wildcard table.
Wildcard table syntax
Wildcard table syntax:
SELECT FROM `<project-id>.<dataset-id>.<table-prefix>*` WHERE bool_expression
- <project-id>
- Cloud Platform project ID. Optional if you use your default project ID.
- <dataset-id>
- BigQuery dataset ID.
- <table-prefix>
- A string that is common across all tables that are matched by the wildcard character. The table prefix is optional. Omitting the table prefix matches all tables in the dataset.
- * (wildcard character)
- The wildcard character, "*", represents one or more characters of a table name. The wildcard character can appear only as the final character of a wildcard table name.
Queries with wildcard tables support the _TABLE_SUFFIX
pseudo column in the
WHERE
clause. This column contains the values matched by the wildcard
character, so that queries can filter which tables are accessed. For example,
the following WHERE
clauses use
comparison operators
to filter the matched tables:
WHERE
_TABLE_SUFFIX BETWEEN '29' AND '40'
WHERE
_TABLE_SUFFIX = '1929'
WHERE
_TABLE_SUFFIX < '1941'
For more information about the _TABLE_SUFFIX
pseudo column, see
Filtering selected tables using _TABLE_SUFFIX.
Enclose table names with wildcards in backticks
The wildcard table name contains the special character (*), which means that you must enclose the wildcard table name in backtick (`) characters. For example, the following query is valid because it uses backticks:
#standardSQL /* Valid SQL query */ SELECT max FROM `bigquery-public-data.noaa_gsod.gsod*` WHERE max != 9999.9 # code for missing data AND _TABLE_SUFFIX = '1929' ORDER BY max DESC
The following query is NOT valid because it isn't properly quoted with backticks:
#standardSQL /* Syntax error: Expected end of statement but got "-" at [4:11] */ SELECT max FROM # missing backticks bigquery-public-data.noaa_gsod.gsod* WHERE max != 9999.9 # code for missing data AND _TABLE_SUFFIX = '1929' ORDER BY max DESC
Quotation marks do not work:
#standardSQL /* Syntax error: Unexpected string literal: 'bigquery-public-data.noaa_gsod.gsod*' at [4:3] */ SELECT max FROM # quotes are not backticks 'bigquery-public-data.noaa_gsod.gsod*' WHERE max != 9999.9 # code for missing data AND _TABLE_SUFFIX = '1929' ORDER BY max DESC
Migrating legacy SQL table wildcard functions
In legacy SQL, you can use the following table wildcard functions to query multiple tables.
TABLE_DATE_RANGE()
andTABLE_DATE_RANGE_STRICT()
TABLE_QUERY()
The TABLE_DATE_RANGE() functions
The legacy SQL TABLE_DATE_RANGE()
functions work on tables that conform to a
specific naming scheme: <prefix>YYYYMMDD
, where the <prefix>
represents the
first part of a table name and YYYYMMDD
represents the date associated with
that table's data.
For example, the following legacy SQL query finds the average temperature from a set of daily tables that contain Seattle area weather data:
#legacySQL
SELECT
ROUND(AVG(TemperatureF),1) AS AVG_TEMP_F
FROM
TABLE_DATE_RANGE([mydataset.sea_weather_],
TIMESTAMP("2016-05-01"),
TIMESTAMP("2016-05-09"))
In Google Standard SQL, an equivalent query uses a table wildcard and the BETWEEN
clause.
#standardSQL
SELECT
ROUND(AVG(TemperatureF),1) AS AVG_TEMP_F
FROM
`mydataset.sea_weather_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160501' AND '20160509'
The TABLE_QUERY() function
The legacy SQL TABLE_QUERY()
function enables you to find table names based
on patterns. When migrating a TABLE_QUERY()
function to Google Standard SQL, which
does not support the TABLE_QUERY()
function, you can instead filter using
the _TABLE_SUFFIX
pseudo column. Keep the following differences in mind when
migrating:
In legacy SQL, you place the
TABLE_QUERY()
function in theFROM
clause, whereas in Google Standard SQL, you filter using the_TABLE_SUFFIX
pseudo column in theWHERE
clause.In legacy SQL, the
TABLE_QUERY()
function operates on the entire table name (ortable_id
), whereas in Google Standard SQL, the_TABLE_SUFFIX
pseudo column contains part or all of the table name, depending on how you use the wildcard character.
Filter in the WHERE clause
When migrating from legacy SQL to Google Standard SQL, move the filter to the WHERE
clause. For example, the following query finds the maximum temperatures across
all years that end in the number 0
:
#legacySQL
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
year
FROM
TABLE_QUERY([bigquery-public-data:noaa_gsod],
'REGEXP_MATCH(table_id, r"0$")')
WHERE
max != 9999.9 # code for missing data
AND max > 100 # to improve ORDER BY performance
ORDER BY
max DESC
In Google Standard SQL, an equivalent query uses a table wildcard and places the
regular expression function, REGEXP_CONTAINS()
, in the WHERE
clause:
#standardSQL
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
year
FROM
`bigquery-public-data.noaa_gsod.gsod*`
WHERE
max != 9999.9 # code for missing data
AND max > 100 # to improve ORDER BY performance
AND REGEXP_CONTAINS(_TABLE_SUFFIX, r"0$")
ORDER BY
max DESC
Differences between table_id and _TABLE_SUFFIX
In the legacy SQL TABLE_QUERY(dataset, expr)
function, the second parameter
is an expression that operates over the entire table name, using the value
table_id
. When migrating to Google Standard SQL, the filter that you create in the
WHERE
clause operates on the value of _TABLE_SUFFIX
, which can include part
or all of the table name, depending on your use of the wildcard character.
For example, the following legacy SQL query uses the entire table name in a
regular expression to find the maximum temperatures across all years that end
in the number 0
:
#legacySQL
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
year
FROM
TABLE_QUERY([bigquery-public-data:noaa_gsod],
'REGEXP_MATCH(table_id, r"gsod\d{3}0")')
WHERE
max != 9999.9 # code for missing data
AND max > 100 # to improve ORDER BY performance
ORDER BY
max DESC
In Google Standard SQL, an equivalent query can use the entire table name or only a part of the table name. You can use an empty prefix in Google Standard SQL so that your filter operates over the entire table name:
# Empty prefix
FROM
`bigquery-public-data.noaa_gsod.*`
However, longer prefixes perform better than empty prefixes, so the following
example uses a longer prefix, which means that the value of _TABLE_SUFFIX
is only part of the table name.
#standardSQL
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
year
FROM
`bigquery-public-data.noaa_gsod.gsod*`
WHERE
max != 9999.9 # code for missing data
AND max > 100 # to improve ORDER BY performance
AND REGEXP_CONTAINS(_TABLE_SUFFIX, r"\d{3}0")
ORDER BY
max DESC
Limitations
See the limitation section in Querying multiple tables using a wildcard table.