Cloud SQL federated queries

This page describes how to query data in Cloud SQL from BigQuery using federated queries.

Overview

BigQuery Cloud SQL federation enables BigQuery to query data residing in Cloud SQL in real-time, without copying or moving data. Query federation supports both MySQL (2nd generation) and PostgreSQL instances in Cloud SQL.

After the initial one-time set up, you can write a query with the SQL function EXTERNAL_QUERY.

Before you begin

Enable the BigQuery connection service

  1. Open the BigQuery Connection API page in the API library.
  2. From the drop-down menu, select the project that contains your external data source.
  3. Click the ENABLE button.

    BigQuery Connection API

Service account

A service account is automatically created when you enable the BigQuery Connection API. When you enable the BigQuery Connection API in a project that has your Cloud SQL data source, the following roles are applied:

Role Description
cloudsql.client Connect to a Cloud SQL instance
logging.logWriter Write to cloud-logging
metrics.metricWriter Write to cloud-monitoring

For more information about service accounts, see Service agents.

Public IP

BigQuery Cloud SQL federation only supports Cloud SQL instances with public IP connectivity. Please configure public IP connectivity for your Cloud SQL instance.

Setting up Cloud SQL database connections

Once the BigQuery Connection API is enabled, create a connection to the Cloud SQL database.

Console

  1. To create a connection resource, go to the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. In the Add data menu, select External data source.

    Create connection resource.

  3. In the External data source pane, enter the following information:

    • For Connection type, select the type of source, for example MySQL or Postgres.
    • For Connection ID, enter an identifier for the connection resource. Letter, numbers, and underscores are allowed.
    • For Connection location, select a BigQuery location (or region) that is compatible with your external data source region.
    • (Optional): For Friendly name, enter a user-friendly name for the connection, such as My connection resource. The friendly name can be any value that helps you identify the connection resource if you need to modify it later.
    • (Optional): For Description, enter a description for this connection resource.
    • If you chose Cloud SQL MySQL or Postgres for the connection type, for Cloud SQL instance ID, enter the full name of the Cloud SQL instance, usually in the format project-id:location-id:instance-id. You can find the instance ID on the detail page of the Cloud SQL instance you want to query.
    • For Database name, enter the name of the database.
    • For Username, enter the username for the database.
    • For Password, enter the password for the database.

      • (Optional) Check Show password to reveal the password.

      New connection resource.

  4. Click Create connection.

bq

Enter the bq mk command and supply the connection flag: --connection. The following flags are also required:

  • --connection_type
  • --properties
  • --connection_credential
  • --project_id
  • --location

The following flags are optional:

  • --display_name The friendly name for the connection.
  • --description A description of the connection.

The connection id is an optional parameter that can be added as the last argument of the command which is used for storage internally. If a connection id is not provided a unique id is automatically generated. The connection id can contain letters, numbers and underscores.

    bq mk --connection --display_name='friendly name' --connection_type=TYPE \
      --properties=PROPERTIES --connection_credential=CREDENTIALS \
      --project_id=PROJECT_ID --location=LOCATION \
      CONNECTION_ID

Replace the following:

  • TYPE: the type of the external data source.
  • PROPERTIES: the parameters for the created connection in JSON format. For example: --properties='{"param":"param_value"}'. For creating a connection resource, you must supply the instanceID, database, and type parameters.
  • CREDENTIALS: the parameters username and password.
  • PROJECT_ID: your project ID.
  • LOCATION: the region your Cloud SQL instance is located in.
  • CONNECTION_ID: the connection identifier.

For example, the following command creates a new connection resource named my_new_connection (friendly name: "My new connection") in a project with the ID federation-test.

bq mk --connection --display_name='friendly name' --connection_type='CLOUD_SQL' \
  --properties='{"instanceId":"federation-test:us-central1:mytestsql","database":"mydatabase","type":"MYSQL"}' \
  --connection_credential='{"username":"myusername", "password":"mypassword"}' \
  --project_id=federation-test --location=us my_connection_id

API

Within the BigQuery Connection API, you can invoke CreateConnection within the ConnectionService to instantiate a connection. See the client library page for more details.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.connection.v1.CloudSqlCredential;
import com.google.cloud.bigquery.connection.v1.CloudSqlProperties;
import com.google.cloud.bigquery.connection.v1.Connection;
import com.google.cloud.bigquery.connection.v1.CreateConnectionRequest;
import com.google.cloud.bigquery.connection.v1.LocationName;
import com.google.cloud.bigqueryconnection.v1.ConnectionServiceClient;
import java.io.IOException;

// Sample to create a connection with cloud MySql database
public class CreateConnection {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String location = "MY_LOCATION";
    String connectionId = "MY_CONNECTION_ID";
    String database = "MY_DATABASE";
    String instance = "MY_INSTANCE";
    String instanceLocation = "MY_INSTANCE_LOCATION";
    String username = "MY_USERNAME";
    String password = "MY_PASSWORD";
    String instanceId = String.format("%s:%s:%s", projectId, instanceLocation, instance);
    CloudSqlCredential cloudSqlCredential =
        CloudSqlCredential.newBuilder().setUsername(username).setPassword(password).build();
    CloudSqlProperties cloudSqlProperties =
        CloudSqlProperties.newBuilder()
            .setType(CloudSqlProperties.DatabaseType.MYSQL)
            .setDatabase(database)
            .setInstanceId(instanceId)
            .setCredential(cloudSqlCredential)
            .build();
    Connection connection = Connection.newBuilder().setCloudSql(cloudSqlProperties).build();
    createConnection(projectId, location, connectionId, connection);
  }

  public static void createConnection(
      String projectId, String location, String connectionId, Connection connection)
      throws IOException {
    try (ConnectionServiceClient client = ConnectionServiceClient.create()) {
      LocationName parent = LocationName.of(projectId, location);
      CreateConnectionRequest request =
          CreateConnectionRequest.newBuilder()
              .setParent(parent.toString())
              .setConnection(connection)
              .setConnectionId(connectionId)
              .build();
      Connection response = client.createConnection(request);
      System.out.println("Connection created successfully :" + response.getName());
    }
  }
}

For information about viewing, listing, sharing, updating, and deleting connection resources, see Working with connections.

Example

Suppose that you store a customer table in BigQuery, while storing a sales table in Cloud SQL, and want to join the two tables in a single query. The following example makes a federated query to a Cloud SQL database named orders and joins the results with a BigQuery table named mydataset.customers.

SELECT c.customer_id, c.name, rq.first_order_date
FROM mydataset.customers AS c
LEFT OUTER JOIN EXTERNAL_QUERY(
  'us.connection_id',
  '''SELECT customer_id, MIN(order_date) AS first_order_date
  FROM orders
  GROUP BY customer_id''') AS rq ON rq.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, rq.first_order_date;

The example query includes 3 parts:

  1. Run the external query SELECT customer_id, MIN(order_date) AS first_order_date FROM orders GROUP BY customer_id in the operational PostgreSQL database to get the first order date for each customer through the EXTERNAL_QUERY() function.
  2. Join the external query result table with the customers table in BigQuery by customer_id.
  3. Select customer information and first order date.

Supported regions

The following table shows which regions are supported for BigQuery and Cloud SQL.

Regional locations

Region description Cloud SQL region Compatible BigQuery region Compatible BigQuery multi-region
Americas
Iowa us-central Not supported: This region of Cloud SQL instance is V1.
Federated queries only support V2 instances of Cloud SQL.
Iowa us-central1 us-central1 US
Las Vegas us-west4 us-west4 US
Los Angeles us-west2 us-west2 US
Montréal northamerica-northeast1 northamerica-northeast1 US
Northern Virginia us-east4 us-east4 US
Oregon us-west1 us-west1 US
Salt Lake City us-west3 us-west3 US
São Paulo southamerica-east1 southamerica-east1
Santiago southamerica-west1 southamerica-west1
South Carolina us-east1 us-east1 US
Toronto northamerica-northeast2 northamerica-northeast2 US
Europe
Belgium europe-west1 europe-west1 EU
Finland europe-north1 europe-north1 EU
Frankfurt europe-west3 europe-west3 EU
London europe-west2 europe-west2 EU
Netherlands europe-west4 europe-west4 EU
Warsaw europe-central2 europe-central2 EU
Zürich europe-west6 europe-west6 EU
Asia Pacific
Delhi asia-south2 asia-south2
Hong Kong asia-east2 asia-east2
Jakarta asia-southeast2 asia-southeast2
Melbourne australia-southeast2 australia-southeast2
Mumbai asia-south1 asia-south1
Osaka asia-northeast2 asia-northeast2
Seoul asia-northeast3 asia-northeast3
Singapore asia-southeast1 asia-southeast1
Sydney australia-southeast1 australia-southeast1
Taiwan asia-east1 asia-east1
Tokyo asia-northeast1 asia-northeast1

Multi-regional locations

Multi-regional locations are not available for Cloud SQL instances. Cloud SQL multi-regions cannot be used for federated querying.

Data located in the EU multi-region is not stored in the europe-west2 (London) or europe-west6 (Zürich) data centers.

Limitations

Cloud SQL federated queries are subject to the following limitations:

  • Performance. A federated query is likely to not be as fast as querying only BigQuery storage. BigQuery needs to wait for the source database to execute the external query and temporarily move data from the external data source to BigQuery. Also, the source database might not be optimized for complex analytical queries.

  • Federated queries are read-only. The external query that will be executed in the source database must be read-only. Therefore, DML or DDL statements are not supported.

  • Unsupported data types. If your external query contains a data type that is unsupported in BigQuery, the query fails immediately. You can cast the unsupported data type to a different supported data type.

  • Limited Cloud SQL instances. Federated querying is only supported by the Cloud SQL V2 instance with public IP (versus private IP).

Quotas and limits

In addition to the general quotas and limits for federated queries, Cloud SQL databases have the following limitations.

  • Cross-region federated querying: if the BigQuery query processing location and the external data source location are different, this is a cross-region query. You can run up to 1 TB in cross-region queries per project per day. The following is an example of a cross-region query.
    • The Cloud SQL instance is in us-west1 while the BigQuery connection is based in the US multi-region. The BigQuery query processing location is US.
  • Quota: users should control query quota in the external data source, Cloud SQL. There is no extra quota setting for federated querying. To achieve workload isolation, it's recommended to only query a database read replica.
  • Cloud SQL MySQL and PostgreSQL quotas and limitations apply.

Reference

View a Cloud SQL table schema

You can use the EXTERNAL_QUERY() function to query information_schema tables to access database metadata, such as list all tables in the database or show table schema. The following example information_schema queries work in both MySQL and PostgreSQL. You can learn more from MySQL information_schema tables and PostgreSQL information_schema tables.

-- List all tables in a database.
SELECT * FROM EXTERNAL_QUERY("connection_id",
"select * from information_schema.tables;");
-- List all columns in a table.
SELECT * FROM EXTERNAL_QUERY("connection_id",
"select * from information_schema.columns where table_name='x';");

Connection resource detail

Property name Value Description
name string Name of the connection resource in the format: project_id.location_id.connection_id.
location string Location of the connection, which is the same as the Cloud SQL instance location with the following exceptions: Cloud SQL us-central1 maps to BigQuery US, Cloud SQL europe-west1 maps to BigQuery EU.
friendlyName string A user-friendly display name for the connection.
description string Description of the connection.
cloudSql.type string Can be "POSTGRES" or "MYSQL".
cloudSql.instanceId string Name of the Cloud SQL instance, usually in the format of:

Project-id:location-id:instance-id

You can find the instance ID in the Cloud SQL instance detail page.
cloudSql.database string The Cloud SQL database that you want to connect to.

Connection credential resource detail

Property name Value Description
username string Database username
password string Database password

Data type mappings

When you execute a Cloud SQL federated query, the data from Cloud SQL (in MySQL or PostgreSQL data types) is converted to BigQuery standard SQL types. The following data type mappings are from MySQL to BigQuery and PostgreSQL to BigQuery.

Things to know about mapping:

  • Most MySQL data types can be matched to the same BigQuery data type, with a few exceptions such as decimal, timestamp, and time.
  • PostgreSQL supports many non-standard data types that are not supported in BigQuery, for example money, path, uuid, boxer, and others.
  • The numeric data types in MySQL and PostgreSQL will be mapped to BigQuery NUMERIC value by default. The BigQuery NUMERIC value range is smaller than in MySQL and PostgreSQL. It can also be mapped to BIGNUMERIC, FLOAT64, or STRING with "default_type_for_decimal_columns" in EXTERNAL_QUERY options.

Error handling

If your external query contains a data type that is unsupported in BigQuery, the query fails immediately. You can cast the unsupported data type to a different MySQL / PostgreSQL data type that is supported.

You can cast the unsupported data type to a different supported MySQL or PostgreSQL data type.

  • Unsupported MySQL data type

    • Error message: Invalid table-valued function external_query Found unsupported MySQL type in BigQuery at [1:15]
    • Unsupported type: GEOMETRY, BIT
    • Resolution: Cast the unsupported data type to STRING.
    • Example: SELECT ST_AsText(ST_GeomFromText('POINT(1 1)')); This command casts the unsupported data type GEOMETRY to STRING.
  • Unsupported PostgreSQL data type

    • Error message: Invalid table-valued function external_query Postgres type (OID = 790) is not supported now at [1:15]
    • Unsupported type: money, time with time zone, inet, path, pg_lsn, point, polygon, tsquery, tsvector, txid_snapshot, uuid, box, cidr, circle, interval, jsonb, line, lseg, macaddr, macaddr8
    • Resolution: Cast the unsupported data type to STRING.
    • Example: SELECT CAST('12.34'::float8::numeric::money AS varchar(30)); This command casts the unsupported data type money to STRING.

MySQL to BigQuery type mapping

MySQL type MySQL description BigQuery type Type difference
Integer
INT 4 bytes, 2^32 - 1 INT64
TINYINT 1 byte, 2^8 - 1 INT64
SMALLINT 2 bytes, 2^16 - 1 INT64
MEDIUMINT 3 bytes, 2^24 - 1 INT64
BIGINT 8 bytes, 2^64 - 1 INT64
UNSIGNED BIGINT 8 bytes, 2^64 - 1 NUMERIC
Exact numeric
DECIMAL (M,D) A decimal represented by (M,D) where M is the total number of digits and D is the number of decimals.
M <= 65
NUMERIC, BIGNUMERIC, FLOAT64, or STRING

DECIMAL (M,D) will to mapped to NUMERIC by default, or can be mapped to BIGNUMERIC, FLOAT64, or STRING with default_type_for_decimal_columns.
Approximate numeric
FLOAT (M,D) 4 bytes, M <= 23 FLOAT64
DOUBLE (M,D) 8 bytes, M <= 53 FLOAT64
Date and time
TIMESTAMP '1970-01-01 00:00:01'UTC to '2038-01-19 03:14:07' UTC. TIMESTAMP MySQL TIMESTAMP is retrieved as UTC timezone no matter where you call BigQuery from.
DATETIME '1000-01-01 00:00:00' to '9999-12-31 23:59:59'. DATETIME
DATE '1000-01-01' to '9999-12-31'. DATE
TIME Time in 'HH:MM:SS' format
'-838:59:59' to '838:59:59'.
TIME
BigQuery TIME range is smaller, from 00:00:00 to 23:59:59.
YEAR INT64
Character and strings
ENUM String object with a value chosen from a list of permitted values. STRING
CHAR (M) A fixed-length string between 1 and 255 characters. STRING
VARCHAR (M) A variable-length string between 1 and 255 characters. STRING
TEXT A field with a maximum length of 65,535 characters. STRING
TINYTEXT A TEXT column with a maximum length of 255 characters. STRING
MEDIUMTEXT A TEXT column with a maximum length of 16,777,215 characters. STRING
LONGTEXT A TEXT column with a maximum length of 4,294,967,295 characters. STRING
Binary
BLOB A binary large object with a maximum length of 65,535 characters. BYTES
MEDIUM_BLOB A BLOB with a maximum length of 16,777,215 characters. BYTES
LONG_BLOB A BLOB with a maximum length of 4,294,967,295 characters. BYTES
TINY_BLOB A BLOB with a maximum length of 255 characters. BYTES
BINARY A fixed-length binary string between 1 and 255 characters. BYTES
VARBINARY A variable-length binary string between 1 and 255 characters. BYTES
Other
SET When you declare the SET column, predefine some values. Then use INSERT to add any set of predefined values to this column. STRING
GEOMETRY GEOGRAPHY NOT YET SUPPORTED
BIT INT64 NOT YET SUPPORTED

PostgreSQL to BigQuery type mapping

Name Description BigQuery type Type difference
Integer
smallint 2 bytes, -32,768 to +32,767. INT64
smallserial See smallint. INT64
integer 4 bytes, -2,147,483,648 to +2,147,483,647. INT64
serial See integer. INT64
bigint 8 bytes, -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. INT64
bigserial See bigint. INT64
Exact numeric
numeric [ (p, s) ] Precision up to 1,000. NUMERIC, BIGNUMERIC, FLOAT64, or STRING Numeric [ (p, s) ] will to mapped to NUMERIC by default, or can be mapped to BIGNUMERIC, FLOAT64, or STRING with default_type_for_decimal_columns.
decimal [ (p, s) ] See numeric. NUMERIC See numeric.
money 8 bytes, 2 digit scale, -92,233,720,368,547,758.08 to +92,233,720,368,547,758.07 NOT SUPPORTED.
Approximate numeric.
real 4 bytes, single precision floating-point number. FLOAT64
double precision 8 bytes, double precision floating-point number. FLOAT64
Date and time
date Calendar date (year, month, day). DATE
time [ (p) ] [ without time zone ] Time of day (no time zone). TIME
time [ (p) ] with time zone Time of day, including time zone. NOT SUPPORTED
timestamp [ (p) ] [ without time zone ] Date and time (no time zone). DATETIME
timestamp [ (p) ] with time zone Date and time, including time zone. TIMESTAMP PostgreSQL TIMESTAMP is retrieved as UTC timezone no matter where you call BigQuery from.
interval A time duration. NOT SUPPORTED
Character and strings
character [ (n) ] Fixed-length character string. STRING
character varying [ (n) ] Variable-length character string. STRING
text Variable-length character string. STRING
Binary
bytea Binary data ("byte array"). BYTES
bit [ (n) ] Fixed-length bit string. BYTES
bit varying [ (n) ] Variable-length bit string. BYTES
Other
boolean Logical Boolean (true/false). BOOL
inet IPv4 or IPv6 host address. NOT SUPPORTED
path Geometric path on a plane. NOT SUPPORTED
pg_lsn PostgreSQL Log Sequence Number. NOT SUPPORTED
point Geometric point on a plane. NOT SUPPORTED
polygon Closed geometric path on a plane. NOT SUPPORTED
tsquery Text search query. NOT SUPPORTED
tsvector Text search document. NOT SUPPORTED
txid_snapshot User-level transaction ID snapshot. NOT SUPPORTED
uuid Universally unique identifier. NOT SUPPORTED
xml XML data. STRING
box Rectangular box on a plane. NOT SUPPORTED
cidr IPv4 or IPv6 network address. NOT SUPPORTED
circle Circle on a plane. NOT SUPPORTED
interval [ fields ] [ (p) ] Time span. NOT SUPPORTED
json Textual JSON data. STRING
jsonb Binary JSON data, decomposed. NOT SUPPORTED
line Infinite line on a plane. NOT SUPPORTED
lseg Line segment on a plane. NOT SUPPORTED
macaddr MAC (Media Access Control) address. NOT SUPPORTED
macaddr8 MAC (Media Access Control) address (EUI-64 format). NOT SUPPORTED