Cloud SQL federated queries

This page describes how to query data in BigQuery and Cloud SQL with a federated query.

Overview

Data is often scattered in many places. You may store a customer table in BigQuery, while storing a sales table in Cloud SQL, and want to join the two tables in a single query.

BigQuery Cloud SQL federation enables BigQuery to query data residing in Cloud SQL in real-time, without copying or moving data. It supports both MySQL (2nd generation) and PostgreSQL instances in Cloud SQL.

After the initial one-time set up, you can write a query with the new SQL function EXTERNAL_QUERY().

Workflow

Federated query syntax

Federated querying introduces a new function: EXTERNAL_QUERY.

Syntax

    SELECT * FROM EXTERNAL_QUERY(connection_id, external_database_query);
  • connection_id (string): The name of the database connection resource which you create in the Cloud Console, bq command-line tool, or BigQuery API.

    Example connection ID bigquery-federation-test.us.test-mysql

  • external_database_query (string): a read-only query in the external database's SQL dialect (MySQL or PostgreSQL). The query is executed in the external database in Cloud SQL.

Description

EXTERNAL_QUERY executes the query in Cloud SQL and returns results as a temporary table. The source database (MySQL or PostgreSQL) data type is converted to BigQuery data type in the temporary result table with the following data type mapping. The EXTERNAL_QUERY function is usually used in a FROM clause. This function is only available in BigQuery standard SQL.

Return type

A BigQuery table.

Example query

Suppose you need the date of the first order for each of your customers to include in the report we described in the Overview. This data is not currently in BigQuery but is available in your operational PostgreSQL database in Cloud SQL. The following federated query example accomplishes this.

SELECT c.customer_id, c.name, SUM(t.amount) AS total_revenue,
rq.first_order_date
FROM customers AS c
INNER JOIN transaction_fact AS t ON c.customer_id = t.customer_id
LEFT OUTER JOIN EXTERNAL_QUERY(
  'connection_id',
  '''SELECT customer_id, MIN(order_date) AS first_order_date
  FROM orders
  GROUP BY customer_id''') AS rq ON rq.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, rq.first_order_date;

The example query includes 3 parts:

  1. Run the external query SELECT customer_id, MIN(order_date) AS first_order_date FROM orders GROUP BY customer_id in the operational PostgreSQL database to get the first order date for each customer through the EXTERNAL_QUERY() function.
  2. Join external query result table with customers table in BigQuery by customer_id.
  3. Select customer information and first order date.

Order is not preserved

EXTERNAL_QUERY() won't honor the ordering of the external query result, even if your external query includes ORDER BY. The following example query orders rows by customer id in Cloud SQL, but BigQuery will not output the result rows in that order.

SELECT * FROM EXTERNAL_QUERY(
'connection_id',
'''SELECT * FROM customers AS c ORDER BY c.customer_id'');

Before you begin

Enable the BigQuery connection service

  1. Open the BigQuery Connection API page in the API library.
  2. From the drop-down menu, select the project that contains your Cloud SQL instance.
  3. Click the ENABLE button.

    BigQuery Connection API

Service account

BigQuery uses a service account to connect to your Cloud SQL instance. When you enable the BigQuery Connection API, a Google Cloud–managed Identity and Access Management (IAM) service account is automatically created on your behalf. The service account has the following roles:

Role Description
cloudsql.client Connect to a Cloud SQL instance
logging.logWriter Write to cloud-logging
metrics.metricWriter Write to cloud-monitoring

Permissions

  • To create and maintain a connection resource, the user must have the bigquery.admin predefined IAM role.

  • The bigquery.admin role includes the following BigQuery connection service permissions:

    • bigquery.connections.create
    • bigquery.connections.get
    • bigquery.connections.list
    • bigquery.connections.update
    • bigquery.connections.use
    • bigquery.connections.delete

To grant permissions to another user so they can use the connection resource for Cloud SQL queries, see Granting permissions to another user.

Granting bigquery.admin access

To grant the bigquery.admin role:

Console

  1. Open the IAM page in the Cloud Console

    Open the IAM page

  2. Click Select a project.

  3. Select a project and click Open.

  4. Click Add to add new members to the project and set their permissions.

  5. In the Add members dialog:

    • For Members, enter the email address of the user or group.
    • In the Select a role drop-down, click BigQuery > BigQuery Admin.
    • Click Add.

      Grant admin

gcloud

You can use the gcloud command-line tool to grant a user or group the bigquery.admin role.

To add a single binding to your project's IAM policy, type the following command. To add a user, supply the --member flag in the format user:user@example.com. To add a group, supply the --member flag in the format group:group@example.com.

gcloud projects add-iam-policy-binding project_id \
--member group/user:address \
--role roles/bigquery.admin

Where:

  • project_id is your project ID.
  • group/user is either group or user.
  • address is the user or group's email address.

For example:

gcloud projects add-iam-policy-binding myproject \
--member group:group@example.com \
--role roles/bigquery.admin

The command outputs the updated policy:

    bindings:
    - members:
      - group:group@example.com
        role: roles/bigquery.admin
    

For more information on IAM roles in BigQuery, see Predefined roles and permissions.

Public IP

BigQuery Cloud SQL federation only supports Cloud SQL instances with public IP connectivity. Please configure public IP connectivity for your Cloud SQL instance.

Setting up Cloud SQL database connections

To avoid writing database credentials as cleartext in a federated query, you need to first create a database connection resource per database in BigQuery and then reference the connection resource in your federated query.

The connection resource has a set of IAM permissions, which you can grant to other users. The connection resource is encrypted and stored securely in the BigQuery connection service, and it can only be used for federated queries.

For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Create a connection resource

Console

  1. To create a connection resource, go to the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. In the Add data menu, select External data source.

    Create connection resource.

  3. In the External data source pane, enter the following information:

    • For Connection type, select MySQL or Postgres.
    • For Connection ID, enter an identifier for the connection resource. Letter, numbers, and underscores are allowed.
    • For Connection location, select a BigQuery location (or region) that is compatible with your Cloud SQL instance region.
    • (Optional): For Friendly name, enter a user-friendly name for the connection, such as My connection resource. The friendly name can be any value that helps you identify the connection resource if you need to modify it later.
    • (Optional): For Description, enter a description for this connection resource.
    • For Cloud SQL instance ID, enter the full name of the Cloud SQL instance, usually in the format project-id:location-id:instance-id. You can find the instance ID on the detail page of the Cloud SQL instance you want to query.
    • For Database name, enter the name of the database.
    • For Username, enter the username for the database.
    • For Password, enter the password for the database.

      • (Optional) Check Show password to reveal the password.

      New connection resource.

  4. Click Create connection.

bq

Enter the bq mk command and supply the connection flag: --connection. The following flags are also required:

  • --connection_type
    • --connection_type is always CLOUD_SQL
  • --properties
  • --connection_credential
  • --project_id
  • --location
  • A name for the connection. Enter a name to identify the connection, using letters, numbers, and underscores only.

    bq mk --connection --connection_type='CLOUD_SQL' --properties=PROPERTIES --connection_credential=CREDENTIALS --project_id=PROJECT_ID --location=LOCATION new_connection
    

Replace the following:

  • PROPERTIES: the parameters for the created connection in JSON format. For example: --properties='{"param":"param_value"}'. For creating a connection resource, you must supply the instanceID, database, and type parameters. The params friendly_name and description are optional.
  • CREDENTIALS: the parameters username and password.
  • PROJECT_ID: your project ID.
  • LOCATION: the region your Cloud SQL instance is located in.

For example, the following command creates a new connection resource named my_new_connection (friendly name: "My new connection") in a project with the ID federation-test.

bq mk --connection --connection_type='CLOUD_SQL' --properties='{"instanceId":"federation-test:us-central1:mytestsql","database":"mydatabase","type":"MYSQL"}' --connection_credential='{"username":"myusername", "password":"mypassword"}' --project_id=federation-test --location=us my_new_connection

API

Within the BigQuery Connection API, you can invoke CreateConnection within the ConnectionService to instantiate a connection. See the client library page for more details.

Troubleshooting

This section helps you troubleshoot issues you might encounter when setting up a new connection. This section does not encompass all possible error messages or issues.

When diagnosing general connection issues, verify the following:

  • You have completed all the steps in the Before you begin section.
  • The connection configuration properties are correct.
  • You have the appropriate permissions to create a connection.

If your connection properties are correct, and the appropriate permissions are granted, refer to the following for solutions to common issues.

Issue: BigQuery and Cloud SQL are not colocated.
Resolution: Cloud SQL federated querying is only supported in regions that support both Cloud SQL and BigQuery. The BigQuery dataset and the Cloud SQL instance must be in the same region, or, the BigQuery dataset must be in a multi-region location such as US and EU in the same geographic area that is compatible with the Cloud SQL region. See Supported regions for more information about regions and region compatibility.
Issue: Newly created connection is not showing up in the project.
Resolution: There might be a slight delay before new connections appear in the Cloud Console. Toggle to a different project and wait for up to 30 seconds, then toggle back. Your new connection should appear.
Issue: Performance is slower than expected.
Resolution: Performance of federated queries is not as high as querying data stored in BigQuery because the federated query has to externally query Cloud SQL, return the data to a temporary BigQuery table, map the data to a BigQuery data type, then execute the query in BigQuery. Although the query performance is not as high, the data does not need to be copied, moved, or stored again.
Issue: How to format the connection name.
Resolution: The connection name should include project, location, and connection ID. The connection ID should conform to this pattern: project_id.location_id.connection_id, for example, federation-test.us.my_new_connection

Managing connection resources

For information about viewing, listing, sharing, updating, and deleting connection resources, see Working with connections.

Audit logging

For audit logging of connection resources, see BigQuery audit logs overview.

Supported regions

Cloud SQL federated queries are only supported in regions that support both Cloud SQL and BigQuery.

Locations or region types

There are two types of locations:

  • A region is a specific geographic place, such as London.

  • A multi-region is a large geographic area, such as the United States, that contains two or more geographic places.

You can create a connection and execute a federated query across regions according to the following rules.

Multi-regions

A BigQuery multi-region can query any Cloud SQL region in the same large geographic area (US, EU), for example:

  • The BigQuery US multi-region can query any Cloud SQL single region in the US geographic area, such as us-central1, us-east4, us-west2, and so on.
  • The BigQuery EU multi-region can query any Cloud SQL single region in member states of the European Union, such as europe-north1, europe-west3, and so on.

The query processing location is the multi-region location, either US or EU.

For more information about regions and multi-regions, see the Dataset locations page.

Single regions

A BigQuery single region can only query Cloud SQL in the same region. For example:

  • The BigQuery single region us-east4 can only query Cloud SQL in us-east4.

In this example, the query processing location is the BigQuery single region.

See the following table for a detailed mapping of which regions are supported for each product.

Regional locations

Region description Cloud SQL region Compatible BigQuery region Compatible BigQuery multi-region
Americas
Iowa us-central Not supported: This region of Cloud SQL instance is V1.
Federated queries only support V2 instances of Cloud SQL.
Iowa us-central1 Not supported US
Las Vegas us-west4 us-west4 US
Los Angeles us-west2 us-west2 US
Montréal northamerica-northeast1 northamerica-northeast1 US
Northern Virginia us-east4 us-east4 US
Oregon us-west1 us-west1 US
Salt Lake City us-west3 us-west3 US
São Paulo southamerica-east1 southamerica-east1
South Carolina us-east1 us-east1 US
Europe
Belgium europe-west1 europe-west1 EU
Finland europe-north1 europe-north1 EU
Frankfurt europe-west3 europe-west3 EU
London europe-west2 europe-west2 EU
Netherlands europe-west4 europe-west4 EU
Zürich europe-west6 europe-west6 EU
Asia Pacific
Hong Kong asia-east2 asia-east2
Jakarta asia-southeast2 asia-southeast2
Mumbai asia-south1 asia-south1
Osaka asia-northeast2 asia-northeast2
Seoul asia-northeast3 asia-northeast3
Singapore asia-southeast1 asia-southeast1
Sydney australia-southeast1 australia-southeast1
Taiwan asia-east1 asia-east1
Tokyo asia-northeast1 asia-northeast1

Multi-regional locations

Multi-regional locations are not available for Cloud SQL instances. Cloud SQL multi-regions cannot be used for federated querying.

Data located in the EU multi-region is not stored in the europe-west2 (London) or europe-west6 (Zürich) data centers.

Limitations

Limited Cloud SQL instances

Federated querying is only supported by the Cloud SQL V2 instance with public IP (versus private IP).

Performance

Federated query is likely to not be as fast as querying only BigQuery storage. BigQuery needs to wait for the source database to execute the external query and temporarily move data from Cloud SQL to BigQuery. Source databases like MySQL or PostgreSQL are not usually optimized for complex analytical queries.

External querying is read-only

The external query that will be executed in the source database must be read-only. Therefore, DML or DDL statements are not supported.

Unsupported data types

If your external query contains a data type that is unsupported in BigQuery, the query will fail immediately. You can cast the unsupported data type to a different supported MySQL or PostgreSQL data type.

  • Unsupported MySQL data type

    • Error message: Invalid table-valued function external_query Found unsupported MySQL type in BigQuery at [1:15]
    • Unsupported type: GEOMETRY, BIT
    • Resolution: Cast the unsupported data type to STRING.
    • Example: SELECT ST_AsText(ST_GeomFromText('POINT(1 1)')); This command casts the unsupported data type GEOMETRY to STRING.
  • Unsupported PostgreSQL data type

    • Error message: Invalid table-valued function external_query Postgres type (OID = 790) is not supported now at [1:15]
    • Unsupported type: money, time with time zone, inet, path, pg_lsn, point, polygon, tsquery, tsvector, txid_snapshot, uuid, box, cidr, circle, interval, jsonb, line, lseg, macaddr, macaddr8
    • Resolution: Cast the unsupported data type to STRING.
    • Example: SELECT CAST('12.34'::float8::numeric::money AS varchar(30)); This command casts the unsupported data type money to STRING.

Quotas and limits

  • Cross-region federated querying: if the BigQuery query processing location and the Cloud SQL instance location are different, this is a cross-region query. You can run up to 1 TB in cross-region queries per project per day.
    • Here is an example of a cross-region query:
      • The Cloud SQL instance is in us-west1 while the BigQuery connection is based in the US multi-region. The BigQuery query processing location is US.
  • Quota: users should control query quota in Cloud SQL. There is no extra quota setting for federated querying. To achieve workload isolation, it's recommended to only query a database read replica.
  • Maximum bytes billed allowed: this field isn't supported for federated queries at this time. Calculating the bytes billed before actually executing the federated queries is not possible at this time.
  • Cloud SQL MySQL and PostgreSQL quotas and limitations apply.
  • Number of connections: a federated query can have at most 10 unique connections.

Pricing

When querying Cloud SQL from BigQuery, you are charged for the number of bytes read by the query. For more information, see Query pricing.

Reference

View a Cloud SQL table schema

You can use the EXTERNAL_QUERY() function to query information_schema tables to access database metadata, such as list all tables in the database or show table schema. The following example information_schema queries work in both MySQL and PostgreSQL. You can learn more from MySQL information_schema tables and PostgreSQL information_schema tables.

-- List all tables in a database.
SELECT * FROM EXTERNAL_QUERY("connection_id",
"select * from information_schema.tables;");
-- List all columns in a table.
SELECT * FROM EXTERNAL_QUERY("connection_id",
"select * from information_schema.columns where table_name='x';");

Connection resource detail

Property name Value Description
name string Name of the connection resource in the format: project_id.location_id.connection_id.
location string Location of the connection, which is the same as the Cloud SQL instance location with the following exceptions: Cloud SQL us-central1 maps to BigQuery US, Cloud SQL europe-west1 maps to BigQuery EU.
friendlyName string A user-friendly display name for the connection.
description string Description of the connection.
cloudSql.type string Can be "POSTGRES" or "MYSQL".
cloudSql.instanceId string Name of the Cloud SQL instance, usually in the format of:

Project-id:location-id:instance-id

You can find the instance ID in the Cloud SQL instance detail page.
cloudSql.database string The Cloud SQL database that you want to connect to.

Connection credential resource detail

Property name Value Description
username string Database username
password string Database password

Data type mappings

When you execute a Cloud SQL federated query, the data from Cloud SQL (in MySQL or PostgreSQL data types) is converted to BigQuery standard SQL types. The following data type mappings are from MySQL to BigQuery and PostgreSQL to BigQuery.

Things to know about mapping:

  • Most MySQL data types can be matched to the same BigQuery data type, with a few exceptions such as decimal, timestamp, and time.
  • PostgreSQL supports many non-standard data types that are not supported in BigQuery, for example money, path, uuid, boxer, and others.
  • The BigQuery numeric value range is smaller than in MySQL and PostgreSQL.

Error handling

If your external query contains a data type that is unsupported in BigQuery, the query fails immediately. You can cast the unsupported data type to a different MySQL / PostgreSQL data type that is supported. For more information about how to cast, see Unsupported data types under Troubleshooting.

MySQL to BigQuery type mapping

MySQL type MySQL description BigQuery type Type difference
Integer
INT 4 bytes, 2^32 - 1 INT64
TINYINT 1 byte, 2^8 - 1 INT64
SMALLINT 2 bytes, 2^16 - 1 INT64
MEDIUMINT 3 bytes, 2^24 - 1 INT64
BIGINT 8 bytes, 2^64 - 1 INT64
UNSIGNED BIGINT 8 bytes, 2^64 - 1 NUMERIC
Exact numeric
DECIMAL (M,D) A decimal represented by (M,D) where M is the total number of digits and D is the number of decimals.
M <= 65
NUMERIC

BigQuery NUMERIC range is smaller, supporting only 38 decimal digits of precision and 9 decimal digits of scale.
Approximate numeric
FLOAT (M,D) 4 bytes, M <= 23 FLOAT64
DOUBLE (M,D) 8 bytes, M <= 53 FLOAT64
Date and time
TIMESTAMP '1970-01-01 00:00:01'UTC to '2038-01-19 03:14:07' UTC. TIMESTAMP MySQL TIMESTAMP is retrieved as UTC timezone no matter where you call BigQuery from.
DATETIME '1000-01-01 00:00:00' to '9999-12-31 23:59:59'. DATETIME
DATE '1000-01-01' to '9999-12-31'. DATE
TIME Time in 'HH:MM:SS' format
'-838:59:59' to '838:59:59'.
TIME
BigQuery TIME range is smaller, from 00:00:00 to 23:59:59.
YEAR INT64
Character and strings
ENUM String object with a value chosen from a list of permitted values. STRING
CHAR (M) A fixed-length string between 1 and 255 characters. STRING
VARCHAR (M) A variable-length string between 1 and 255 characters. STRING
TEXT A field with a maximum length of 65,535 characters. STRING
TINYTEXT A TEXT column with a maximum length of 255 characters. STRING
MEDIUMTEXT A TEXT column with a maximum length of 16,777,215 characters. STRING
LONGTEXT A TEXT column with a maximum length of 4,294,967,295 characters. STRING
Binary
BLOB A binary large object with a maximum length of 65,535 characters. BYTE
MEDIUM_BLOB A BLOB with a maximum length of 16,777,215 characters. BYTE
LONG_BLOB A BLOB with a maximum length of 4,294,967,295 characters. BYTE
TINY_BLOB A BLOB with a maximum length of 255 characters. BYTE
Other
SET When you declare the SET column, predefine some values. Then use INSERT to add any set of predefined values to this column. STRING
GEOMETRY GEOMETRY NOT YET SUPPORTED
BIT INT64 NOT YET SUPPORTED

PostgreSQL to BigQuery type mapping

Name Description BigQuery type Type difference
Integer
smallint 2 bytes, -32,768 to +32,767. INT64
smallserial See smallint. INT64
integer 4 bytes, -2,147,483,648 to +2,147,483,647. INT64
serial See integer. INT64
bigint 8 bytes, -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. INT64
bigserial See bigint. INT64
Exact numeric
numeric [ (p, s) ] Precision up to 1,000. NUMERIC BigQuery NUMERIC range is smaller, supporting only 38 decimal digits of precision and 9 decimal digits of scale.
Decimal [ (p, s) ] See numeric. NUMERIC See numeric.
money 8 bytes, 2 digit scale, -92,233,720,368,547,758.08 to +92,233,720,368,547,758.07 NOT SUPPORTED.
Approximate numeric.
real 4 bytes, single precision floating-point number. FLOAT64
double precision 8 bytes, double precision floating-point number. FLOAT64
Date and time
date Calendar date (year, month, day). DATE
time [ (p) ] [ without time zone ] Time of day (no time zone). TIME
time [ (p) ] with time zone Time of day, including time zone. NOT SUPPORTED
timestamp [ (p) ] [ without time zone ] Date and time (no time zone). DATETIME
timestamp [ (p) ] with time zone Date and time, including time zone. TIMESTAMP PostgreSQL TIMESTAMP is retrieved as UTC timezone no matter where you call BigQuery from.
interval A time duration. NOT SUPPORTED
Character and strings
character [ (n) ] Fixed-length character string. STRING
character varying [ (n) ] Variable-length character string. STRING
text Variable-length character string. STRING
Binary
bytea Binary data ("byte array"). BYTES
bit [ (n) ] Fixed-length bit string. BYTES
bit varying [ (n) ] Variable-length bit string. BYTES
Other
boolean Logical Boolean (true/false). BOOL
inet IPv4 or IPv6 host address. NOT SUPPORTED
path Geometric path on a plane. NOT SUPPORTED
pg_lsn PostgreSQL Log Sequence Number. NOT SUPPORTED
point Geometric point on a plane. NOT SUPPORTED
polygon Closed geometric path on a plane. NOT SUPPORTED
tsquery Text search query. NOT SUPPORTED
tsvector Text search document. NOT SUPPORTED
txid_snapshot User-level transaction ID snapshot. NOT SUPPORTED
uuid Universally unique identifier. NOT SUPPORTED
xml XML data. STRING
box Rectangular box on a plane. NOT SUPPORTED
cidr IPv4 or IPv6 network address. NOT SUPPORTED
circle Circle on a plane. NOT SUPPORTED
interval [ fields ] [ (p) ] Time span. NOT SUPPORTED
json Textual JSON data. STRING
jsonb Binary JSON data, decomposed. NOT SUPPORTED
line Infinite line on a plane. NOT SUPPORTED
lseg Line segment on a plane. NOT SUPPORTED
macaddr MAC (Media Access Control) address. NOT SUPPORTED
macaddr8 MAC (Media Access Control) address (EUI-64 format). NOT SUPPORTED