Run federated queries with Data Boost

This page explains how to use Spanner Data Boost when running federated queries from BigQuery to Spanner. With Data Boost, federated queries run with near-zero impact to existing workloads on the provisioned Spanner instance.

Spanner federation lets BigQuery query data residing in Spanner in real time, without copying or moving data.

To learn about Spanner federated queries, see Spanner federated queries.

To learn about Data Boost, see Data Boost overview.

Before you begin

Complete the following tasks before attempting to run federated queries with Data Boost.

Enable the BigQuery connection API

The BigQuery connection API lets you manage BigQuery connections to external data sources.

  • Enable the BigQuery connection API.

    Enable the API

For more information, see BigQuery connection API.

Grant IAM permissions for Data Boost to principals

Principals need the spanner.instances.get and spanner.databases.useDataBoost Identity and Access Management (IAM) permissions to run queries and exports with Data Boost.

We recommend that you create a custom IAM role based on Spanner Database Reader (roles/spanner.databaseReader) and add spanner.instances.get and spanner.databases.useDataBoost to it.

For more information, see Predefined roles.

Create BigQuery connections for Spanner with Data Boost

BigQuery connections let you query data that's stored outside of BigQuery. To establish a connection between BigQuery and Spanner, you create an external data connection. You can then run queries that join BigQuery data with Spanner data.

To create an external data connection to Spanner that uses Data Boost, select one of the following options:

Console

  1. Go to Create Spanner connections in the BigQuery documentation and follow the Console instructions.

  2. In the External data source pane, select both the Read data in parallel and Use Spanner Data Boost checkboxes.

bq

  1. Go to Create Spanner connections in the BigQuery documentation and follow the bq instructions.

  2. Set the following connection properties to true:

  • useParallelism
  • useDataBoost

The following example uses the bq mk command to create a new connection named my_connection with the two required properties for Data Boost:

bq mk --connection --connection_type='CLOUD_SPANNER' --location='us' \
--properties='{"database":"projects/my-project/instances/my-instance/databases/my-database", "useParallelism":true, "useDataBoost": true}' my_connection

Run a federated query

To run a federated query with Data Boost, use a BigQuery connection that specifies the use of Data Boost. For more information, see Create BigQuery connections for Spanner with Data Boost.

You can start from either the Spanner page of the Google Cloud console, or from BigQuery.

Start from the Spanner page of the console

  1. Go to the Spanner Instances page in the Google Cloud console.

    Go to the Instances page

    The console shows a list of your Spanner instances.

  2. Select a Spanner instance, and then select a database.

  3. On the Database overview page, in the navigation menu, click Spanner Studio.

  4. Click the View in BiqQuery tab.

  5. In the View in BigQuery dialog, enter a connection ID.

    This creates a new connection. An error occurs if the ID already exists.

  6. Fill in the rest of the dialog, and select the Read data in parallel and Use Spanner Data Boost checkboxes.

  7. Click View in BigQuery.

    BigQuery Studio opens. Enter and run your federated query there.

    The following example makes a federated query to a Spanner database named orders and joins the results with a BigQuery table named mydataset.customers.

    SELECT c.customer_id, c.name, rq.first_order_date
    FROM mydataset.customers AS c
    LEFT OUTER JOIN EXTERNAL_QUERY(
      'my-project.us.example-db',
      '''SELECT customer_id, MIN(order_date) AS first_order_date
      FROM orders
      GROUP BY customer_id''') AS rq
      ON rq.customer_id = c.customer_id
    GROUP BY c.customer_id, c.name, rq.first_order_date;
    

Start from BigQuery

  • Enter the following URL in your browser:

    https://console.cloud.google.com/bigquery

    BigQuery opens in your most recently accessed project and displays BigQuery Studio. Run your federated query here.

What's next