Stay organized with collections Save and categorize content based on your preferences.

Introduction to federated queries

This page introduces how to use federated queries and provides guidance on querying Cloud Spanner and Cloud SQL data from BigQuery.

Federated queries let you send a query statement to Cloud Spanner or Cloud SQL databases and get the result back as a temporary table. Federated queries use the BigQuery Connection API to establish a connection with Spanner or Cloud SQL. In your query, you use the EXTERNAL_QUERY function to send a query statement to the external database, using that database's SQL dialect. The results are converted to Google Standard SQL data types.

Supported data stores

You can use federated queries with the following data stores:

Workflow

  • Identify the Google Cloud project that includes the data source that you want to query.
  • A bigquery.admin user creates a connection resource in BigQuery.
  • The admin user grants permission to use the connection resource to user B.
    • If the admin and user B are the same person, there is no need to grant permission.
  • User B writes a query in BigQuery with the new EXTERNAL_QUERY SQL function.

Supported regions

Federated queries are only supported in regions that support both the external data source and BigQuery. For a list of supported locations, see the following sections:

You can create a connection and run a federated query across regions according to the following rules:

Single regions

A BigQuery single region can only query a resource in the same region.

For example, if your dataset in us-east4, you can query Cloud SQL instances or Spanner databases that are only located in us-east4. The query processing location is the BigQuery single region.

Multi-regions

A BigQuery multi-region can query any data source region in the same large geographic area (US, EU). Multi-regional locations are not available for Cloud SQL instances because these are only used for backups. A BigQuery multi-region can also query a Spanner instance in the same multi-region.

  • A query that runs in the BigQuery US multi-region can query any single region in the US geographic area, such as us-central1, us-east4, or us-west2.

  • A query that runs in the BigQuery EU multi-region can query any single region in member states of the European Union, such as europe-north1 or europe-west3.

  • The location where the query runs must be the same as the location of the connection resource. For example, queries executed from the US multi-region must use a connection located in the US multi-region.

The query performance varies based on the proximity between the dataset and the external data source. For example, a federated query between a dataset in the US multi-region and a Cloud SQL instance in us-central1 is fast. However, if you run the same query between the US multi-region and a Cloud SQL instance in us-east4, the performance might be slower.

The query processing location is the multi-region location, either US or EU.

Data type mappings

When you execute a federated query, the data from the external data source is converted to Google Standard SQL types. For more information, see Cloud SQL federated queries.

Quotas and limits

  • Cross-region federated querying: if the BigQuery query processing location and the external data source location are different, this is a cross-region query. You can run up to 1 TB in cross-region queries per project per day. The following is an example of a cross-region query.
    • The Cloud SQL instance is in us-west1 while the BigQuery connection is based in the US multi-region. The BigQuery query processing location is US.
  • Quota: users should control query quota in the external data source, such as Cloud SQL. There is no extra quota setting for federated querying. To achieve workload isolation, it's recommended to only query a database read replica.
  • Maximum bytes billed allowed: this field isn't supported for federated queries at this time. Calculating the bytes billed before actually executing the federated queries is not possible at this time.
  • Number of connections: a federated query can have at most 10 unique connections.
  • Cloud SQL MySQL and PostgreSQL quotas and limitations apply.

Limitations

Federated queries are subject to the following limitations:

  • Performance. A federated query is likely to not be as fast as querying only BigQuery storage. BigQuery needs to wait for the source database to execute the external query and temporarily move data from the external data source to BigQuery. Also, the source database might not be optimized for complex analytical queries.

    The query performance also varies based on the proximity between the dataset and the external data source. For more information, see Supported regions.

  • Federated queries are read-only. The external query that is executed in the source database must be read-only. Therefore, DML or DDL statements are not supported.

  • Unsupported data types. If your external query contains a data type that is unsupported in BigQuery, the query fails immediately. You can cast the unsupported data type to a different supported data type.

  • Limited Cloud SQL instances. Federated querying is only supported by Cloud SQL instances with public IP (versus private IP).

  • Project. You must create the connection resource in the same project as the Cloud SQL instance.

Pricing

When executing federated queries from BigQuery, you are charged for the number of bytes returned from the external query. For more information, see On-demand analysis pricing.

What's next