Federated queries

This page provides an overview of how to use federated queries in BigQuery.

Overview

A federated query is a way to send a query statement to an external database and get the result back as a temporary table. Federated queries use the BigQuery Connection API to establish a connection with the external database. In your standard SQL query, you use the EXTERNAL_QUERY function to send a query statement to the external database, using that database's SQL dialect. The results are converted to BigQuery standard SQL data types.

You can use federated queries with the following external databases:

After the initial one-time set up, you can write a query with the EXTERNAL_QUERY SQL function.

Workflow

  • Choose a Google Cloud project that includes the data source that you want to query.
  • A bigquery.admin user creates a connection resource in BigQuery.
  • The admin user grants permission to use the connection resource to user B.
    • If the admin and user B are the same person, there is no need to grant permission.
  • User B writes a query in BigQuery with the new EXTERNAL_QUERY function.

Federated query syntax

Use the EXTERNAL_QUERY SQL function to run federated queries.

Syntax

    SELECT * FROM EXTERNAL_QUERY(connection_id, external_database_query[, options]);
  • connection_id (string): The name of the database connection resource which you create in the Cloud Console, bq command-line tool, or BigQuery API.

    Example connection ID bigquery-federation-test.us.test-mysql

  • external_database_query (string): a read-only query in the external database's SQL dialect. The query is executed in the external database.

  • options (string): an optional string of a JSON format map with key value pairs of option name and value (both are case sensitive).

    Example options {"default_type_for_decimal_columns":"numeric"}

    Supported options:

    Option Name Description
    "default_type_for_decimal_columns" Can be "float64", "numeric", "bignumeric" or "string". With this option, the MySQL Decimal type or PostgreSQL Numeric type will be mapped to the provided BigQuery type. When this option is not provided, the MySQL Decimal type or PostgreSQL Numeric type will be mapped to BigQuery NUMERIC type.

Description

EXTERNAL_QUERY executes the query in the external data source and returns results as a temporary table. The source database data type is converted to BigQuery data type in the temporary result table. The conversion depends on the source database. For Cloud SQL data mapping, see data type mapping.

The EXTERNAL_QUERY function is usually used in a FROM clause. This function is only available in BigQuery standard SQL.

Return type

A BigQuery table.

Example query

Suppose that you store a customer table in BigQuery, while storing a sales table in Cloud SQL, and want to join the two tables in a single query. The following example makes a federated query to a Cloud SQL database named orders and joins the results with a BigQuery table named mydataset.customers

SELECT c.customer_id, c.name, rq.first_order_date
FROM mydataset.customers AS c
LEFT OUTER JOIN EXTERNAL_QUERY(
  'us.connection_id',
  '''SELECT customer_id, MIN(order_date) AS first_order_date
  FROM orders
  GROUP BY customer_id''') AS rq ON rq.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, rq.first_order_date;

The example query includes 3 parts:

  1. Run the external query SELECT customer_id, MIN(order_date) AS first_order_date FROM orders GROUP BY customer_id in the operational PostgreSQL database to get the first order date for each customer through the EXTERNAL_QUERY() function.
  2. Join the external query result table with the customers table in BigQuery by customer_id.
  3. Select customer information and first order date.

Order is not preserved

EXTERNAL_QUERY() won't honor the ordering of the external query result, even if your external query includes ORDER BY. The following example query orders rows by customer id in Cloud SQL, but BigQuery will not output the result rows in that order.

SELECT * FROM EXTERNAL_QUERY(
'connection_id',
'''SELECT * FROM customers AS c ORDER BY c.customer_id'');

Supported regions

Federated queries are only supported in regions that support both the external data source and BigQuery.

Locations or region types

There are two types of locations:

  • A region is a specific geographic place, such as London.

  • A multi-region is a large geographic area, such as the United States, that contains two or more geographic places.

You can create a connection and execute a federated query across regions according to the following rules.

Multi-regions

A BigQuery multi-region can query any data source region in the same large geographic area (US, EU), for example:

  • The BigQuery US multi-region can query any single region in the US geographic area, such as us-central1, us-east4, us-west2, and so on.
  • The BigQuery EU multi-region can query any single region in member states of the European Union, such as europe-north1, europe-west3, and so on.
  • The connection used in the query must reside in the same location as the query location. For example queries executed from the US multi-region must reference a connection located in the US multi-region.

The query processing location is the multi-region location, either US or EU.

For more information about regions and multi-regions, see the Dataset locations page.

Single regions

A BigQuery single region can only query a resource in the same region. For example:

  • The BigQuery single region us-east4 can only query Cloud SQL in us-east4.

In this example, the query processing location is the BigQuery single region.

For more information, see Cloud SQL federated queries.

Data type mappings

When you execute a federated query, the data from the external data source is converted to BigQuery standard SQL types. For more information, see Cloud SQL federated queries.

Quotas and limits

  • Cross-region federated querying: if the BigQuery query processing location and the external data source location are different, this is a cross-region query. You can run up to 1 TB in cross-region queries per project per day. The following is an example of a cross-region query.
    • The Cloud SQL instance is in us-west1 while the BigQuery connection is based in the US multi-region. The BigQuery query processing location is US.
  • Quota: users should control query quota in the external data source, such as Cloud SQL. There is no extra quota setting for federated querying. To achieve workload isolation, it's recommended to only query a database read replica.
  • Maximum bytes billed allowed: this field isn't supported for federated queries at this time. Calculating the bytes billed before actually executing the federated queries is not possible at this time.
  • Number of connections: a federated query can have at most 10 unique connections.

Pricing

When executing federated queries from BigQuery, you are charged for the number of bytes returned from the external query. For more information, see On-demand analysis pricing.