Cloud SQL federated queries

This page describes how to query data in BigQuery and Cloud SQL with a federated query.

Overview

Data is often scattered in many places. You may store a customer table in BigQuery, while storing a sales table in Cloud SQL, and want to join the two tables in a single query.

BigQuery Cloud SQL federation enables BigQuery to query data residing in Cloud SQL in real-time, without copying or moving data. It supports both MySQL (2nd generation) and PostgreSQL instances in Cloud SQL.

After the initial one-time set up, you can write a query with the new SQL function EXTERNAL_QUERY().

Workflow

Federated query syntax

Federated query introduces a new function: EXTERNAL_QUERY.

Syntax

SELECT * FROM EXTERNAL_QUERY(connection_id, external_database_query);
  • connection_id (string): The name of the database connection resource which you create in the Web UI, CLI or API.

    Example connection ID bigquery-federation-test.us.test-mysql

  • external_database_query (string): a read-only query in the external database's SQL dialect (MySQL or PostgreSQL). The query is executed in the external database in Cloud SQL.

Description

EXTERNAL_QUERY executes the query in Cloud SQL and returns results as a temporary table. The source database (MySQL or PostgreSQL) data type is converted to BigQuery data type in the temporary result table with the following data type mapping. The EXTERNAL_QUERY function is usually used in a FROM clause. This function is only available in BigQuery standard SQL.

Return type

A BigQuery table.

Example query

Suppose you need the date of the first order for each of your customers to include in the report we described in the Overview. This data is not currently in BigQuery but is available in your operational PostgreSQL database in Cloud SQL. The following federated query example accomplishes this.

SELECT c.customer_id, c.name, SUM(t.amount) AS total_revenue,
rq.first_order_date
FROM customers AS c
INNER JOIN transaction_fact AS t ON c.customer_id = t.customer_id
LEFT OUTER JOIN EXTERNAL_QUERY(
  'connection_id',
  '''SELECT customer_id, MIN(order_date) AS first_order_date
  FROM orders
  GROUP BY customer_id''') AS rq ON rq.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, rq.first_order_date;

The example query includes 3 parts:

  1. Run the external query SELECT customer_id, MIN(order_date) AS first_order_date FROM orders GROUP BY customer_id in the operational PostgreSQL database to get the first order date for each customer through the EXTERNAL_QUERY() function.
  2. Join external query result table with customers table in BigQuery by customer_id.
  3. Select customer information and first order date.

Order is not preserved

EXTERNAL_QUERY() won't honor the ordering of the external query result, even if your external query includes ORDER BY. The following example query orders rows by customer id in Cloud SQL, but BigQuery will not output the result rows in that order.

SELECT * EXTERNAL_QUERY(
'connection_id',
'''SELECT * FROM customers AS c ORDER BY c.customer_id'');

Before you begin

Enable the BigQuery connection service

  • Open the BigQuery connection API page in API library.
  • From the dropdown menu, select the project that contains your Cloud SQL instance.
  • Click the ENABLE button.

Service account

BigQuery uses a service account to connect to your Cloud SQL instance. When you enable the BigQuery connection API, a GCP-managed IAM service account will automatically be created on your behalf. The service account has the following roles:

Role Description
cloudsql.client Connect to a Cloud SQL instance
logging.logWriter Write to cloud-logging
metrics.metricWriter Write to cloud-monitoring

Permissions

  • To create and maintain a connection resource, the user must have the bigquery.admin predefined IAM role.

  • The bigquery.admin role includes the following BigQuery connection service permissions:

    • bigquery.connections.create
    • bigquery.connections.get
    • bigquery.connections.list
    • bigquery.connections.update
    • bigquery.connections.use
    • bigquery.connections.delete

To grant permissions to another user so they can use the connection resource for Cloud SQL queries, see Granting permissions to another user.

Granting bigquery.admin access

To grant the bigquery.admin role:

Console

  1. Open the Cloud IAM page in the GCP Console

    Open the IAM page

  2. Click Select a project.

  3. Select a project and click Open.

  4. Click Add to add new members to the project and set their permissions.

  5. In the Add members dialog:

    • For Members, enter the email address of the user or group.
    • In the Select a role drop-down, click BigQuery > BigQuery Admin.
    • Click Add.

      Grant admin

CLI

You can use the gcloud command-line tool to grant a user or group the bigquery.admin role.

To add a single binding to your project's Cloud IAM policy, type the following command. To add a user, supply the --member flag in the format user:user@example.com. To add a group, supply the --member flag in the format group:group@example.com.

gcloud projects add-iam-policy-binding project_id \
--member group/user:address \
--role roles/bigquery.admin

Where:

  • project_id is your project ID.
  • group/user is either group or user.
  • address is the user or group's email address.

For example:

gcloud projects add-iam-policy-binding myproject \
--member group:group@example.com \
--role roles/bigquery.admin

The command outputs the updated policy:

    bindings:
    - members:
      - group:group@example.com
        role: roles/bigquery.admin
    

For more information on Cloud IAM roles in BigQuery, see Predefined roles and permissions.

Public IP

BigQuery Cloud SQL federation only supports Cloud SQL instances with public IP connectivity. Please configure public IP connectivity for your Cloud SQL instance.

Setting up Cloud SQL database connections

To avoid writing database credentials as cleartext in a federated query, you need to first create a database connection resource per database in BigQuery and then reference the connection resource in your federated query.

The connection resource has a set of IAM permissions, which you can grant to other users. The connection resource is encrypted and stored securely in the BigQuery connection service, and it can only be used for federated queries.

For more information on Cloud IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Create a connection resource

Console

  1. To create a connection resource, go to the BigQuery web UI in the GCP Console.

    Go to the GCP Console

  2. Under +ADD DATA, select Create connection from the menu.

    Create connection resource

  3. On the Create connection pane:

    • For Connection type, select MySQL or PostgresSQL.
    • For Connection ID, enter an identifier for the connection resource. Letter, numbers, and underscores are allowed.
    • For Connection location, select Cloud SQL instance location with below exceptions: for Cloud SQL instance in us-central1 region, select US location, for europe-west1 region, select EU location.
    • (Optional): For Friendly name, enter a user-friendly name for the connection, such as My connection resource. The friendly name can be any value that allows you to easily identify the connection resource if you need to modify it later.
    • (Optional): For Description, enter a description for this connection resource.
    • For Cloud SQL instance ID, enter the full name of the Cloud SQL instance, usually in the format project-id:location-id:instance-id. You can find the instance ID on the detail page of the Cloud SQL Instance you want to query.
    • For Database name, enter the name of the database.
    • For Username, enter the username for the database.
    • For Password, enter the password for the database.

      • (Optional) Check Show password to reveal the password.

      New connection resource

  4. Click Create connection.

COMMAND-LINE

Enter the bq mk command and supply the connection flag: --connection. The following flags are also required:

  • --connection_type
  • --properties
  • --connection_credential
  • --project_id
  • --location
  • (name of the connection)

    bq mk --connection --connection_type='CLOUD_SQL' --properties=[PROPERTIES] --connection_credential=[CREDENTIALS] --project_id=[PROJECT_ID] --location=[LOCATION] new_connection
    

Where:

  • --connection_type is always CLOUD_SQL
  • --properties contains the parameters for the created connection in JSON format. For example: --properties='{"param":"param_value"}'. For creating a connection resource, you must supply the instanceID, database and type parameters. The params `friendly_name and description are optional.
  • --connection_credential must contain the parameters username and password.
  • --project_id is your project ID.
  • --location is the region your Cloud SQL instance is located in.
  • Enter a name to identify the connection, using letters, numbers, and underscores only.

For example, the following command creates a new connection resource named my_new_connection (friendly name: "My new connection") in a project with the ID federation-test.

bq mk --connection --connection_type='CLOUD_SQL' --properties='{"instanceId":"federation-test:us-central1:mytestsql","database":"mydatabase","type":"MYSQL"}' --connection_credential='{"username":"myusername", "password":"mypassword"}' --project_id=federation-test --location=us my_new_connection

API

Use the projects.locations.transferConfigs.create method and supply an instance of the TransferConfig resource.

Viewing connection resources

Console

  1. To view the status of your connection resources, go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Connection resources will be listed at the top level under your project, under a group called External connections.

  3. Click on a connection to see information about that connection resource, such as connection ID and Cloud SQL instance ID.

    View connection resources

COMMAND-LINE

Enter the bq show command and supply the connection flag — --connection. The fully qualified connection_id is required.

    bq show --connection project:location.connection_id

For example, the following command creates a new connection resource named my_new_connection in a project with the ID federation-test located in the region us.

    bq show --connection federation-test:us.my_new_connection

API

Use the projects.locations.connections.list method and supply an instance of the list resource. See the reference section REST API.

Granting permissions to another user

For another user to use the connection resource for Cloud SQL federated queries, the bigquery.admin user should grant the following role in IAM: BigQuery Connection User. This IAM role includes the following permissions:

  • bigquery.connections.get
  • bigquery.connections.list
  • bigquery.connections.use
  • bigquery.connections.getIamPolicy

The bigquery.admin user can also grant another role to other users, called BigQuery Connection Admin which includes all the same permissions as BigQuery Connection User plus additional permissions to create, update, and delete existing connections, as well as set IAM policy on connections.

  • bigquery.connections.create
  • bigquery.connections.update
  • bigquery.connections.setIamPolicy
  • bigquery.connections.delete

Pricing

When querying Cloud SQL from BigQuery, you are charged for the number of bytes read by the query. For more information, see Query pricing. There are no extra charges or quota for Cloud SQL federated queries during the Beta period.

Reference

View a Cloud SQL table schema

You can use the EXTERNAL_QUERY() function to query information_schema tables to access database metadata, such as list all tables in the database or show table schema. The following example information_schema queries work in both MySQL and PostgreSQL. You can learn more from MySQL information_schema tables and PostgreSQL information_schema tables.

// List all tables in a database
SELECT * FROM EXTERNAL_QUERY("connection_id",
"select * from information_schema.tables;");
// List all columns in a table.
SELECT * FROM EXTERNAL_QUERY("connection_id",
"select * from information_schema.columns where table_name='x';");

Connection resource detail

Property name Value Description
name string Name of the connection resource in the format: project_id.location_id.connection_id
location string Location of the connection, which is the same as Cloud SQL instance location with following exceptions: Cloud SQL us-central1 maps to BigQuery US, Cloud SQL europe-west1 maps to BigQuery EU.
friendlyName string A user-friendly display name for the connection
description string Description of the connection
cloudSql.type string Can be "POSTGRES" or "MYSQL"
cloudSql.instanceId string Name of the Cloud SQL instance, usually in the format of:

Project-id:location-id:instance-id

You can find the instance ID in Cloud SQL Instance detail page.
cloudSql.database string The Cloud SQL database you want to connect to

Connection credential resource detail

Property name Value Description
username string Database username
password string Database password

Data type mappings

When you execute a Cloud SQL federated query, the data from Cloud SQL (in MySQL or PostgreSQL data types) are converted to BigQuery standard SQL types. Below are the data type mappings from MySQL to BigQuery and PostgreSQL to BigQuery.

Things to know about mapping:

  • Most MySQL data types can be matched to the same BigQuery data type, with a few exceptions such as decimal, timestamp, and time.
  • PostgreSQL supports many non-standard data types which are not supported in BigQuery, for example money, path, uuid, boxer, and others.
  • The BigQuery numeric value range is smaller than in MySQL and PostgreSQL.

Error handling

If your external query contains a data type that is unsupported in BigQuery, the query will fail immediately. You can cast the unsupported data type to a different MySQL / PostgreSQL data type that is supported. See Unsupported data types under Troubleshooting for more information on how to cast.

MySQL to BigQuery type mapping

MySQL type MySQL Description BigQuery type Type difference
Integer
INT 4 bytes, 2^32 - 1 INT64
TINYINT 1 byte, 2^8 - 1 INT64
SMALLINT 2 bytes, 2^16 - 1 INT64
MEDIUMINT 3 bytes, 2^24 - 1 INT64
BIGINT 8 bytes, 2^64 - 1 INT64
UNSIGNED BIGINT 8 bytes, 2^64 - 1 NUMERIC
Exact numeric
DECIMAL (M,D) A decimal represents by (M,D) where M is the total number of digits and D is the number of decimals. M <= 65 NUMERIC

BigQuery NUMERIC range is smaller, only supports 38 decimal digits of precision and 9 decimal digits of scale.
Approximate numeric
FLOAT (M,D) 4 bytes, M <= 23 FLOAT64
DOUBLE (M,D) 8 bytes, M <= 53 FLOAT64
Date and time
TIMESTAMP '1970-01-01 00:00:01'UTC to '2038-01-19 03:14:07' UTC. TIMESTAMP MySQL TIMESTAMP is retrieved as UTC timezone no matter where user call BigQuery
DATETIME '1000-01-01 00:00:00' to '9999-12-31 23:59:59' DATETIME
DATE '1000-01-01' to '9999-12-31'. DATE
TIME Time in 'HH:MM:SS' format
'-838:59:59' to '838:59:59'.
TIME
BigQuery TIME range is smaller, from 00:00:00 to 23:59:59
YEAR INT64
Character and strings
ENUM string object with a value chosen from a list of permitted values STRING
CHAR (M) A fixed-length string between 1 and 255 characters STRING
VARCHAR (M) A variable-length string between 1 and 255 characters in length. STRING
TEXT A field with a maximum length of 65535 characters. STRING
TINYTEXT TEXT column with a maximum length of 255 characters. STRING
MEDIUMTEXT TEXT column with a maximum length of 16777215 characters. STRING
LONGTEXT TEXT column with a maximum length of 4294967295 characters. STRING
Binary
BLOB A binary large object with a maximum length of 65535 characters. BYTE
MEDIUM_BLOB A BLOB with a maximum length of 16777215 characters. BYTE
LONG_BLOB A BLOB with a maximum length of 4294967295 characters. BYTE
TINY_BLOB A BLOB with a maximum length of 255 characters. BYTE
Other
SET when declare SET column, predefine some values. Then INSERT any set of predefined values into this column STRING
GEOMETRY GEOMETRY NOT YET SUPPORTED
BIT INT64 NOT YET SUPPORTED

PostgresSQL to BigQuery type mapping

Name Description BigQuery type Type difference
Integer
smallint 2 bytes, -32768 to +32767 INT64
smallserial See smallint INT64
integer 4 bytes, -2147483648 to +2147483647 INT64
serial See integer INT64
bigint 8 bytes, -9223372036854775808 to 9223372036854775807 INT64
bigserial See bigint INT64
Exact numeric
numeric [ (p, s) ] Precision up to 1,000. NUMERIC BigQuery NUMERIC range is smaller, only supports 38 decimal digits of precision and 9 decimal digits of scale.
Decimal [ (p, s) ] See numeric NUMERIC See numeric
money 8 bytes, 2 digit scale, -92233720368547758.08 to +92233720368547758.07 NOT SUPPORTED
Approximate numeric
real 4 bytes, single precision floating-point number FLOAT64
double precision 8 bytes, double precision floating-point number FLOAT64
Date and time
date calendar date (year, month, day) DATE
time [ (p) ] [ without time zone ] time of day (no time zone) TIME
time [ (p) ] with time zone time of day, including time zone NOT SUPPORTED
timestamp [ (p) ] [ without time zone ] date and time (no time zone) DATETIME
timestamp [ (p) ] with time zone date and time, including time zone TIMESTAMP PostgreSQL TIMESTAMP is retrieved as UTC timezone no matter where user call BigQuery
interval A time duration NOT SUPPORTED
Character and strings
character [ (n) ] fixed-length character string STRING
character varying [ (n) ] variable-length character string STRING
text variable-length character string STRING
Binary
bytea binary data ("byte array") BYTES
bit [ (n) ] fixed-length bit string BYTES
bit varying [ (n) ] variable-length bit string BYTES
Other
boolean logical Boolean (true/false) BOOL
inet IPv4 or IPv6 host address NOT SUPPORTED
path geometric path on a plane NOT SUPPORTED
pg_lsn PostgreSQL Log Sequence Number NOT SUPPORTED
point geometric point on a plane NOT SUPPORTED
polygon closed geometric path on a plane NOT SUPPORTED
tsquery text search query NOT SUPPORTED
tsvector text search document NOT SUPPORTED
txid_snapshot user-level transaction ID snapshot NOT SUPPORTED
uuid universally unique identifier NOT SUPPORTED
xml XML data STRING
box rectangular box on a plane NOT SUPPORTED
cidr IPv4 or IPv6 network address NOT SUPPORTED
circle circle on a plane NOT SUPPORTED
interval [ fields ] [ (p) ] time span NOT SUPPORTED
json textual JSON data STRING
jsonb binary JSON data, decomposed NOT SUPPORTED
line infinite line on a plane NOT SUPPORTED
lseg line segment on a plane NOT SUPPORTED
macaddr MAC (Media Access Control) address NOT SUPPORTED
macaddr8 MAC (Media Access Control) address (EUI-64 format) NOT SUPPORTED

Known issues and limitations

Regions

Cloud SQL federated queries are only supported in regions that support both Cloud SQL and BigQuery. (Query federation is supported in all BigQuery regions, but not all Cloud SQL regions.)

You can execute a federated query across regions, according to the following rules.

Multi-regions

A BigQuery multi-region can query any Cloud SQL region in the same location (US, EU). For example:

  • The BigQuery US multi-region can query Cloud SQL us-central1, us-east4, us-west2, and so on.
  • The BigQuery EU multi-region can query Cloud SQL europe-north1, europe-west2, and so on.

Single regions

A BigQuery single region can only query Cloud SQL in the same region. For example:

  • BigQuery us-east4 can only query Cloud SQL us-east4.

Although BigQuery and Cloud SQL are available in the same single regions as of this Beta, some regions are not supported. See the following table for a detailed mapping.

Cloud SQL region BigQuery multi-region/region
northamerica-northeast1 US / Northamerica-northeast1
us-central NOT SUPPORTED: This region of Cloud SQL instance is V1.
Federated querying is only supported with the V2 instance.
us-central1 US
us-east1 NOT SUPPORTED
us-east4 US / us-east4
us-west1 NOT SUPPORTED
us-west2 US / us-west2
southamerica-east1 southamerica-east1
europe-north1 EU / europe-north1
europe-west1 EU
europe-west2 EU / europe-west2
europe-west3 NOT SUPPORTED
europe-west4 NOT SUPPORTED
europe-west6 EU / europe-west6
asia-east1 asia-east1
asia-east2 asia-east2
asia-northeast1 asia-northeast1
asia-south1 asia-south1
asia-southeast1 asia-southeast1
australia-southeast1 australia-southeast1

Quotas and other limits

  • Performance: a federated query will most likely not be as fast as only querying BigQuery storage. BigQuery needs to wait for the source database to execute the external query and temporarily move data from Cloud SQL to BigQuery. Source databases like MySQL or PostgreSQL are not usually optimized for complex analytical queries.
  • Quota: users should control query quota in Cloud SQL. There is no extra quota setting for federated querying. To achieve workload isolation, it's recommended to only query a database read replica.
  • Maximum bytes billed allowed: this field isn't supported for federated queries at this time. Calculating the bytes billed before actually executing the federated queries is not possible at this time.
  • Limited Cloud SQL instances: federated querying is only supported by the Cloud SQL v2 instance with public IP (vs private IP).
  • Subject to Cloud SQL MySQL and PostgreSQL quotas and limitations.
  • Number of connections: a federated query can have at most 10 unique connections.

Troubleshooting

This section is intended to help you troubleshoot the most common issues encountered when setting up a connection. This section does not encompass all possible error messages or issues.

General issues

When diagnosing general connection issues, verify the following:

  • Verify that you have completed all the steps in the "Before You Begin" section of the documentation page for your connection.
  • The connection configuration properties are correct.

If your connection configuration is correct, and the appropriate permissions are granted, refer to the following for solutions to commonly encountered issues.

Issue: BigQuery and Cloud SQL are not co-located.
Resolution: Cloud SQL federated querying is only supported in regions that support both Cloud SQL and BigQuery. The BigQuery dataset and the Cloud SQL instance must be in the same region, or same location if the dataset is in a multi-region location such as US and EU. See Known issues and limitations for more information about regions.
Issue: performance is slower than expected.
Resolution: Performance of federated queries is not as high as querying data stored in BigQuery, because the federated query has to externally query Cloud SQL, return the data to a temporary BigQuery table, map the data to BigQuery data type, then execute the query in BigQuery. The benefit is that although the query is not as high in performance, the data does not need to be copied, moved or stored again.
Issue: how to format the connection name?
Resolution: The connection name should include project, location and connection_id. The connection name should conform to this pattern: project_id.location_id.connection_id For example: federation-test.us.my_new_connection

Unsupported data types

If your external query contains a data type that is unsupported in BigQuery, the query will fail immediately. You can cast the unsupported data type to a different supported MySQL / PostgreSQL data type.

  • Unsupported MySQL data type

    • Error message: Invalid table-valued function external_query Found unsupported MySQL type in BigQuery. at [1:15]
    • Unsupported type: GEOMETRY, BIT
    • Resolution: Cast the unsupported data type to STRING.
    • Example: SELECT ST_AsText(ST_GeomFromText('POINT(1 1)')); This command casts the unsupported data type GEOMETRY to STRING.
  • Unsupported PostgresSQL data type

    • Error message: Invalid table-valued function external_query Postgres type (OID = 790) is not supported now at [1:15]
    • Unsupported type: money, time with time zone, inet, path, pg_lsn, point, polygon, tsquery, tsvector, txid_snapshot, uuid, box, cidr, circle, interval, jsonb, line, lseg, macaddr, macaddr8
    • Resolution: Cast the unsupported data type to STRING.
    • Example: SELECT CAST('12.34'::float8::numeric::money AS varchar(30)); This command casts the unsupported data type money to string.
Oliko tästä sivusta apua? Kerro mielipiteesi

Palautteen aihe:

Tämä sivu
BigQuery
Tarvitsetko apua? Siirry tukisivullemme.