Real-time analytics made easy with Cloud Spanner federated queries with BigQuery
As enterprises compete for market share, their need for real-time insights has given rise to increased demand for transactional databases to support data analytics. Whether it's to provide dashboards that inform rapid decision-making, to perform analysis on a lookup table stored in a transactional database, or to conduct complex hybrid transactional analytical workloads, there is a growing demand for analyzing large volumes of data in transactional databases.
Cloud Spanner is Google Cloud's fully managed relational database for transactional workloads, and today, with the general availability of Spanner federated queries with BigQuery, it gets even more powerful. BigQuery is Google Cloud’s market-leading serverless, highly scalable, multi-cloud data warehouse that makes analytics easy by bringing together data from multiple sources. With Spanner’s BigQuery federation, you can query data residing in Spanner in real time without moving or copying the data, bridging the gap between operational data and analytics and creating a unified data lifecycle.
It’s already been possible to use tools like Dataflow to copy data from Spanner over to BigQuery, but if you haven’t set up these ETL workflows -- or you simply need to do a quick lookup on data that’s in Spanner -- you can now take advantage of BigQuery’s query federation support to run real-time queries on data that’s stored in Spanner.
In this post, we’ll look at how to set up a federated query in BigQuery that accesses data stored in Spanner.
How to run a federated query
Suppose you’re an online retailer that uses Spanner to store your shopping transactions, and you have a Customer 360 application built in BigQuery. You can now use federated queries to include this customer’s shopping transactions in your Customer 360 application without needing to copy the data over to BigQuery from Spanner.
To run a Customer 360 query in BigQuery that includes the shopping transactions that are stored in Spanner, follow these steps:
Launch BigQuery and choose the Google Cloud project that contains the Spanner instance that includes the shopping transactions database.
Set up an external data source for the Spanner shopping database in BigQuery. You’ll need to have bigquery.admin permissions to set this up.
Write a query in BigQuery that accesses the shopping data in the Spanner data source. If you’d like other users to access this external data source in BigQuery, simply grant them permission to use the connection resource you just created.
Setting up an external data source
To setup a Spanner external data source, select “Add data” and choose “External Data Source”
From here, add the connection settings for your Spanner database.
Writing a query that accesses the Spanner data source
Once you’ve created the external data source, it will be listed as one of the external connections in your project in the BigQuery Explorer.
Now, you simply use the
EXTERNAL_QUERY function to send a query statement to Spanner, using Spanner’s native SQL dialect. The results are converted to BigQuery standard SQL data types and can be joined with other data in BigQuery.
Here’s the syntax for using
SELECT * FROM EXTERNAL_QUERY(connection_id, external_database_query);
Connection_id (string): The name of the Spanner database connection resource. It is of the form: projects/projectID/locations/us/connections/ConnectionID. ConnectionID is the connection ID you created when you set up the external data source.
external_database_query (string): a read-only query in Spanner’s SQL dialect. The query is executed in Spanner and returned as a table in BigQuery.
SELECT * FROM EXTERNAL_QUERY(
"SELECT * FROM orders where custID = 182748387"
And it’s as simple as that. There’s no need to copy data from Spanner over to BigQuery --
EXTERNAL_QUERY takes care of everything. Moreover,
EXTERNAL_QUERY returns a table that is no different from any other table in BigQuery, so you can JOIN to it, add it to materialized and authorized views, populate a dashboard with it, and even schedule the query using BigQuery’s scheduler.
Advantages of using Spanner federated queries with BigQuery
When you use query federation, the query is run by Spanner, and the results are returned to BigQuery as a table.
The data is accessed in real-time; there’s no need to wait for an ETL job to complete for the freshest of data.
The query will be executed by Spanner, so you’ll be able to take advantage of the same query optimizer and SQL capabilities that you’ll find in Spanner. You’ll also have the option to take advantage of Spanner’s PartitionQuery capabilities.
You’ll likely find
EXTERNAL_QUERYis useful for lookups and simple analytics in Spanner tables rather than intensive analytics that would benefit from BigQuery’s strong analytic capabilities.
For more intensive analytics, you can still use
EXTERNAL_QUERY to copy the data over to BigQuery rather than writing ETL jobs, and continue to run the analytics in BigQuery, such as the following:
CREATE TABLE abc as SELECT * FROM EXTERNAL_QUERY(
"SELECT * FROM orders"