This page explains how to use Spanner Data Boost when running federated queries from BigQuery to Spanner. With Data Boost, federated queries run with near-zero impact to existing workloads on the provisioned Spanner instance.
Spanner federation lets BigQuery query data residing in Spanner in real time, without copying or moving data.
To learn about Spanner federated queries, see Spanner federated queries.
To learn about Data Boost, see Data Boost overview.
Before you begin
Complete the following tasks before attempting to run federated queries with Data Boost.
Enable the BigQuery connection API
The BigQuery connection API lets you manage BigQuery connections to external data sources.
-
Enable the BigQuery connection API.
For more information, see BigQuery connection API.
Grant IAM permissions for Data Boost to principals
Principals need the spanner.instances.get
and spanner.databases.useDataBoost
Identity and Access Management (IAM) permissions to run
queries and exports with Data Boost.
We recommend that you create a custom IAM role based on
Spanner Database Reader
(roles/spanner.databaseReader
) and add
spanner.instances.get
and spanner.databases.useDataBoost
to it.
For more information, see Predefined roles.
Create BigQuery connections for Spanner with Data Boost
BigQuery connections let you query data that's stored outside of BigQuery. To establish a connection between BigQuery and Spanner, you create an external data connection. You can then run queries that join BigQuery data with Spanner data.
To create an external data connection to Spanner that uses Data Boost, select one of the following options:
Console
Go to Create Spanner connections in the BigQuery documentation and follow the Console instructions.
In the External data source pane, select both the Read data in parallel and Use Spanner Data Boost checkboxes.
bq
Go to Create Spanner connections in the BigQuery documentation and follow the bq instructions.
Set the following connection properties to
true
:
useParallelism
useDataBoost
The following example uses the bq
mk
command to create a new connection named my_connection
with the two
required properties for Data Boost:
bq mk --connection --connection_type='CLOUD_SPANNER' --location='us' \
--properties='{"database":"projects/my-project/instances/my-instance/databases/my-database", "useParallelism":true, "useDataBoost": true}' my_connection
Run a federated query
To run a federated query with Data Boost, use a BigQuery connection that specifies the use of Data Boost. For more information, see Create BigQuery connections for Spanner with Data Boost.
You can start from either the Spanner page of the Google Cloud console, or from BigQuery.
Start from the Spanner page of the console
Go to the Spanner Instances page in the Google Cloud console.
The console shows a list of your Spanner instances.
Select a Spanner instance, and then select a database.
On the Database overview page, in the navigation menu, click Spanner Studio.
Click the View in BiqQuery tab.
In the View in BigQuery dialog, enter a connection ID.
This creates a new connection. An error occurs if the ID already exists.
Fill in the rest of the dialog, and select the Read data in parallel and Use Spanner Data Boost checkboxes.
Click View in BigQuery.
BigQuery Studio opens. Enter and run your federated query there.
The following example makes a federated query to a Spanner database named
orders
and joins the results with a BigQuery table namedmydataset.customers
.SELECT c.customer_id, c.name, rq.first_order_date FROM mydataset.customers AS c LEFT OUTER JOIN EXTERNAL_QUERY( 'my-project.us.example-db', '''SELECT customer_id, MIN(order_date) AS first_order_date FROM orders GROUP BY customer_id''') AS rq ON rq.customer_id = c.customer_id GROUP BY c.customer_id, c.name, rq.first_order_date;
Start from BigQuery
Enter the following URL in your browser:
https://console.cloud.google.com/bigquery
BigQuery opens in your most recently accessed project and displays BigQuery Studio. Run your federated query here.
What's next
- Learn about Data Boost in Data Boost overview.
- Use Data Boost in your applications
- Monitor Data Boost usage
- Monitor and manage Data Boost quota usage