Cloud Spanner federated queries

This page describes how to query data in Cloud Spanner from BigQuery using federated queries.

Overview

BigQuery Spanner federation enables BigQuery to query data residing in Spanner in real-time, without copying or moving data.

After the initial one-time set up, you can write a query with the SQL EXTERNAL_QUERY function.

Permissions

To create the connection to the Spanner database, you must have the permissions described in Working with connections.

To query the Spanner database, you must have the following permissions:

  • spanner.databases.select
  • spanner.instances.get
  • spanner.sessions.create

To perform parallel reads, you must have the spanner.databases.partitionQuery permission.

Before you begin

Enable the BigQuery connection service

  1. Open the BigQuery Connection API page in the API library.
  2. From the drop-down menu, select the project that contains your external data source.
  3. Click the ENABLE button.

    BigQuery Connection API

Setting up Spanner database connections

Once the BigQuery Connection API is enabled, create a connection to the Spanner database.

Console

  1. To create a connection resource, go to the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. In the Add data menu, select External data source.

    Create connection resource.

  3. In the External data source pane, enter the following information:

    • For Connection type, select Spanner.
    • For Connection ID, enter an identifier for the connection resource. Letter, numbers, and underscores are allowed.
    • For Connection location, select a BigQuery location (or region) that is compatible with your external data source region.
    • Optional: For Friendly name, enter a user-friendly name for the connection, such as My connection resource. The friendly name can be any value that helps you identify the connection resource if you need to modify it later.
    • Optional: For Description, enter a description for this connection resource.
    • For Database name, enter the name of the Spanner in the following format: "projects/PROJECT_ID/instances/INSTANCE/databases/DATABASE"
    • Optional: To perform parallel reads, select Read data in parallel. For more information, see Read data in parallel
  4. Click Create connection.

bq

To create the connection, use the bq mk command with the --connection flag.

bq mk --connection \
  --connection_type=CLOUD_SPANNER \
  --properties='PROPERTIES' \
  --location=LOCATION \
  --display_name='FRIENDLY_NAME' \
  --description 'DESCRIPTION' \
  CONNECTION_ID

Replace the following:

  • PROPERTIES: A JSON object with the following fields:

    • "database": The Spanner database for the connection. Specify as a string with the following format: "projects/PROJECT_ID/instances/INSTANCE/databases/DATABASE".
    • "use_parallelism": Optional. If true, this connection performs parallel reads. The default value is false. For more information, see Read data in parallel
  • LOCATION: A BigQuery location that is compatible with your external data source region.

  • FRIENDLY_NAME: Optional. A user-friendly name for the connection.

  • DESCRIPTION: Optional. A description for this connection.

  • CONNECTION_ID: Optional. An identifier for the connection resource. The connection ID can contain letters, numbers and underscores. If you don't provide a connection ID, BigQuery automatically generates a unique ID.

The following example creates a new connection resource named my_connection_id.

bq mk --connection \
  --connection_type='CLOUD_SPANNER' \
  --properties='{"database":"projects/my_project/instances/my_instance/databases/database1"}' \
  --project_id=federation-test \
  --location=us \
  my_connection_id

API

Within the BigQuery Connection API, call the CreateConnection method within the ConnectionService service to create a connection resource.

Query data in Spanner

To send a federated query to Spanner from a BigQuery standard SQL query, use the EXTERNAL_QUERY function.

The following example makes a federated query to a Spanner database named orders and joins the results with a BigQuery table named mydataset.customers.

SELECT c.customer_id, c.name, rq.first_order_date
FROM mydataset.customers AS c
LEFT OUTER JOIN EXTERNAL_QUERY(
  'my-project.us.example-db',
  '''SELECT customer_id, MIN(order_date) AS first_order_date
  FROM orders
  GROUP BY customer_id''') AS rq ON rq.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, rq.first_order_date;

Read data in parallel

Spanner can divide certain queries into smaller pieces, or partitions, and fetch the partitions in parallel. For more information, see Read data in parallel in the Spanner documentation.

To enable parallel reads in federated queries, configure this setting when you create the connection resource. This option divides the SQL query into smaller partitions and fetches each partition in parallel. However, this option is restricted to queries whose first operator in the execution plan is a distributed union operator. Other queries return an error. To view the query execution plan for a Spanner query, see Understand how Cloud Spanner executes queries.

View a Spanner table schema

You can use the EXTERNAL_QUERY function to query information_schema views to access database metadata, such as listing all tables in the database or showing a table schema. The following example returns information about the columns in the table MyTable:

SELECT * from EXTERNAL_QUERY(
  'my-project.us.example-db',
  '''SELECT t.column_name, t.spanner_type, t.is_nullable
     FROM information_schema.columns AS t
     WHERE
       t.table_catalog = '' AND t.table_schema = '' AND t.table_name = 'MyTable'
     ORDER BY t.ordinal_position
  ''');

For more information, see Information schema in the Spanner documentation.

Data type mappings

When you execute a Spanner federated query, the data from Spanner is converted to BigQuery standard SQL types.

Spanner type BigQuery type
ARRAY ARRAY
BOOL BOOL
BYTES BYTE
DATE DATE
FLOAT64 FLOAT64
INT64 INT64
NUMERIC NUMERIC
STRING STRING
STRUCT Not supported
TIMESTAMP TIMESTAMP with nanoseconds truncated

If your external query contains a data type that is unsupported in BigQuery, the query fails immediately. You can cast the unsupported data type data type that is supported.

Supported regions

Federated queries are only supported in locations that support both Spanner and BigQuery.

There are two types of locations:

  • A region is a specific geographic place, such as London.

  • A multi-region is a large geographic area, such as the United States, that contains two or more geographic places.

You can create a connection and execute a federated query across regions according to the following rules.

Multi-regions

A BigQuery multi-region can query any data source region in the same large geographic area (US, EU), for example:

  • The BigQuery US multi-region can query any single region in the US geographic area, such as us-central1, us-east4, us-west2, and so on.
  • The BigQuery EU multi-region can query any single region in member states of the European Union, such as europe-north1, europe-west3, and so on.
  • The connection used in the query must reside in the same location as the query location. For example queries executed from the US multi-region must reference a connection located in the US multi-region.

The query processing location is the multi-region location, either US or EU.

Single regions

A BigQuery single region can only query a resource in the same region. For example:

  • The BigQuery single region us-east4 can only query Spanner in us-east4.

In this example, the query processing location is the BigQuery single region.

Limitations

Spanner federated queries are subject to the following limitations:

  • Performance. A federated query is likely to not be as fast as querying only BigQuery storage. BigQuery needs to wait for the source database to execute the external query and temporarily move data from the external data source to BigQuery. Also, the source database might not be optimized for complex analytical queries.

  • Federated queries are read-only. The external query that is executed in the source database must be read-only. Therefore, DML or DDL statements are not supported.

  • Unsupported data types. If your external query contains a data type that is unsupported in BigQuery, the query fails immediately. You can cast the unsupported data type to a different supported data type.

Troubleshooting

This section helps you troubleshoot issues you might encounter when sending a federated query to Spanner.

Issue: Query is not root partitionable.
Resolution: If you configure the connection to read data in parallel, the first operator in the query execution plan must be Distributed Union. To resolve this error, view the query execution plan and rewrite the query. For more information, see Understand how Cloud Spanner executes queries.
Issue: Deadline exceeded.
Resolution: Select the option to read data in parallel and rewrite the query to be root partitionable. For more information, see Understand how Cloud Spanner executes queries.