This page explains how to use Spanner Data Boost when you run federated queries from BigQuery to a Spanner database. With Data Boost, federated queries run with minimal impact to existing workloads on the provisioned Spanner instance. The Data Boost queries from BigQuery to a Spanner database can join BigQuery data with Spanner data.
Spanner federation lets BigQuery query data residing in Spanner in real time, without copying or moving data. To learn more about Spanner federated queries, see Spanner federated queries. To learn about Data Boost, see Data Boost overview.
Before you begin
Before you can run federated queries with Data Boost, you need to complete the following tasks:
Create a Spanner instance and database
If you don't have a Spanner instance and database, follow the steps in Create and query a database using the Google Cloud console to create them.
Enable the BigQuery connection API
The BigQuery connection API lets you manage BigQuery connections to external data sources such as a Spanner database.
-
Enable the BigQuery connection API.
For more information, see BigQuery connection API in the BigQuery documentation.
Grant IAM permissions for Data Boost to principals
A principal must be granted the following permissions to run federated queries with Data Boost:
spanner.instances.get
- lets you get the configuration of an instance.spanner.databases.useDataBoost
- lets you use the compute resources of Spanner Data Boost to process partitioned queries
For more information about Spanner permissions, see Identity and Access Management (IAM) permissions.
To grant these required permissions, we recommend that you use the
Cloud Spanner Database Reader With DataBoost
(roles/spanner.databaseReaderWithDataBoost
) IAM role.
You can add that role to any principal that needs to be able to run federated queries with Data Boost. To learn more
about predefined roles in Spanner, see
Predefined roles. To learn how to create a
custom IAM role, see
Create a custom role.
Run a federated Data Boost query
To run a Data Boost query from BigQuery to an external source, you need a BigQuery connection to the external source and the ID of the connection. When you run a federated Spanner query with Data Boost, the external source is a Spanner database. After you create your connection ID, it's used by BigQuery to run a Data Boost query of a Spanner database.
Use one of the following options to create a BigQuery connection ID, and then use the connection ID to run a Data Boost query from BigQuery:
Start in Spanner - Create the BigQuery external connection ID in the Spanner console. After your connection ID is created in the Spanner console, you're redirected to the BigQuery console to run a federated Data Boost query to a Spanner database.
Start in BigQuery - Create the Data Boost external connection ID in the BigQuery console or using the
bq
command-line tool. After you create the connection ID, you stay in the BigQuery console to run a federated Data Boost query to a Spanner database.
Start in Spanner to run a Data Boost query
To run a federated Data Boost query starting in the Spanner Studio, do the following:
Go to the Spanner Instances page in the Google Cloud console.
The console shows a list of your Spanner instances.
Select a Spanner instance, and then select a database.
On the Database overview page, in the navigation menu, click Spanner Studio.
Click View in BiqQuery.
In the View in BigQuery dialog, enter a connection ID.
The connection ID is used to create a new BigQuery external connection to your Spanner database. You reference your external connection using the following pattern:
PROJECT-ID.LOCATION.CONNECTION-ID
An error occurs if the ID already exists.
Fill in the rest of the dialog and do the following:
- Select Read data in parallel.
- Select Use Spanner Data Boost.
Click View in BigQuery.
BigQuery Studio opens with the following query:
SELECT * FROM EXTERNAL_QUERY("PROJECT-ID.LOCATION.CONNECTION-ID", "SELECT * FROM INFORMATION_SCHEMA.TABLES;");
You can replace this with your federated query. For example, you might make a query that's similar to the following example. This example makes a federated query from a table named
orders
in a Spanner database and joins the results with a BigQuery table namedmydataset.customers
.SELECT c.customer_id, c.name, rq.first_order_date FROM mydataset.customers AS c LEFT OUTER JOIN EXTERNAL_QUERY( 'my-project.us.example-db', '''SELECT customer_id, MIN(order_date) AS first_order_date FROM orders GROUP BY customer_id''') AS rq ON rq.customer_id = c.customer_id GROUP BY c.customer_id, c.name, rq.first_order_date;
Start in BigQuery to run a Data Boost query
To create an external data connection from BigQuery to a Spanner database and use that connection to run a federated Data Boost query from BigQuery, select one of the following options:
Console
Go to Create Spanner connections in the BigQuery documentation and follow the instructions on the Console tab.
In the External data source pane, do the following:
- Select Read data in parallel.
- Select Use Spanner Data Boost.
bq
Go to Create Spanner connections in the BigQuery documentation and follow instructions on the bq* tab.
Set the following connection properties to
true
:useParallelism
useDataBoost
The following example uses the
bq mk
command to
create a new connection named my_connection
with the two required
properties for Data Boost:
bq mk --connection --connection_type='CLOUD_SPANNER' --location='us' \
--properties='{"database":"projects/my-project/instances/my-instance/databases/my-database", "useParallelism":true, "useDataBoost": true}' my_connection
What's next
- Learn about Data Boost in Data Boost overview
- Use Data Boost in your applications
- Monitor Data Boost usage
- Monitor and manage Data Boost quota usage