This page provides an overview of how to use federated queries in BigQuery.
A federated query is a way to send a query statement to an external database
and get the result back as a temporary table. Federated queries use the
BigQuery Connection API to establish a connection with the external database.
In your standard SQL query, you use the
EXTERNAL_QUERY function to send a
query statement to the external database, using that database's SQL dialect.
The results are converted to BigQuery standard SQL data types.
You can use federated queries with the following external databases:
After the initial one-time setup, you can write a query with the EXTERNAL_QUERY SQL function.
- Choose a Google Cloud project that includes the data source that you want to query.
bigquery.adminuser creates a connection resource in BigQuery.
- The admin user grants permission to use the connection resource
to user B.
- If the admin and user B are the same person, there is no need to grant permission.
- User B writes a query in BigQuery with the new
Federated query syntax
SQL function to run federated queries.
SELECT * FROM EXTERNAL_QUERY(connection_id, external_database_query[, options]);
Example connection ID
external_database_query (string): A read-only query in the external database's SQL dialect. The query is executed in the external database.
options (string): An optional string of a JSON format map with key-value pairs of option name and value (both are case-sensitive).
Option Name Description "default_type_for_decimal_columns" Can be "float64", "numeric", "bignumeric" or "string". With this option, the MySQL Decimal type or PostgreSQL Numeric type is mapped to the provided BigQuery type. When this option is not provided, the MySQL Decimal type or PostgreSQL Numeric type is mapped to BigQuery NUMERIC type.
EXTERNAL_QUERY executes the query in the external data source and returns results as a
temporary table. The source database data type is
converted to BigQuery data type in the temporary result table. The
conversion depends on the source database. For Cloud SQL data mapping, see
data type mapping.
function is usually used in a
FROM clause. This function is only available in
BigQuery standard SQL.
A BigQuery table.
Suppose that you store a customer table in BigQuery, while
storing a sales table in Cloud SQL, and want to join the two tables in
a single query. The following example makes a federated query to a
Cloud SQL database named
orders and joins the results with a
BigQuery table named
SELECT c.customer_id, c.name, rq.first_order_date FROM mydataset.customers AS c LEFT OUTER JOIN EXTERNAL_QUERY( 'us.connection_id', '''SELECT customer_id, MIN(order_date) AS first_order_date FROM orders GROUP BY customer_id''') AS rq ON rq.customer_id = c.customer_id GROUP BY c.customer_id, c.name, rq.first_order_date;
The example query includes 3 parts:
- Run the external query
SELECT customer_id, MIN(order_date) AS first_order_date FROM orders GROUP BY customer_idin the operational PostgreSQL database to get the first order date for each customer through the
- Join the external query result table with the customers table in
- Select customer information and first order date.
Order is not preserved
EXTERNAL_QUERY() doesn't honor the ordering of the external query result, even
if your external query includes
ORDER BY. The following example query orders
rows by customer id in Cloud SQL, but BigQuery doesn't output
the result rows in that order.
SELECT * FROM EXTERNAL_QUERY( 'connection_id', '''SELECT * FROM customers AS c ORDER BY c.customer_id'');
Federated queries are only supported in regions that support both the external data source and BigQuery.
Locations or region types
There are two types of locations:
A region is a specific geographic place, such as London.
A multi-region is a large geographic area, such as the United States, that contains two or more geographic places.
You can create a connection and execute a federated query across regions according to the following rules.
A BigQuery multi-region can query any data source region in the same large geographic area (US, EU), for example:
- A query that runs in the BigQuery US multi-region can query any
single region in the US geographic area, such as
- A query that runs in the BigQuery EU multi-region can query any
single region in member states
of the European Union, such as
The location where the query runs must be the same as the location of the connection resource. For example, queries executed from the
USmulti-region must use a connection located in the
query processing location
is the multi-region location, either
For more information about regions and multi-regions, see the Dataset locations page.
A BigQuery single region can only query a resource in the same region. For example:
- The BigQuery single region
us-east4can only query Cloud SQL in
In this example, the query processing location is the BigQuery single region.
For more information, see Cloud SQL federated queries.
Data type mappings
When you execute a federated query, the data from the external data source is converted to BigQuery standard SQL types. For more information, see Cloud SQL federated queries.
Quotas and limits
- Cross-region federated querying: if the
BigQuery query processing location
and the external data source location are different, this is a cross-region
query. You can run up to 1 TB in cross-region queries per project per
day. The following is an example of a cross-region query.
- The Cloud SQL instance is in
us-west1while the BigQuery connection is based in the
USmulti-region. The BigQuery query processing location is
- The Cloud SQL instance is in
- Quota: users should control query quota in the external data source, such as Cloud SQL. There is no extra quota setting for federated querying. To achieve workload isolation, it's recommended to only query a database read replica.
- Maximum bytes billed allowed: this field isn't supported for federated queries at this time. Calculating the bytes billed before actually executing the federated queries is not possible at this time.
- Number of connections: a federated query can have at most 10 unique connections.
When executing federated queries from BigQuery, you are charged for the number of bytes returned from the external query. For more information, see On-demand analysis pricing.