Query a Bigtable change log in BigQuery
This page provides guidance and example queries to help you process a Bigtable change log in BigQuery.
This page is intended for users who have completed the following:
- Set up a Bigtable change stream-enabled table.
- Run the Dataflow template that writes a change log to BigQuery. You can follow the quickstart to learn how to set this up.
This guide assumes some knowledge of BigQuery. To learn more, you can follow the quickstart that shows how to load and query data.
Open the change log table
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project.
Expand your dataset.
Click the table with the suffix:
_changelog
.
Table format
The entire output schema contains several columns. This guide focuses on connecting the rows to their columns and values, and parsing values into analyzable formats.
Basic queries
The examples in this section use a Bigtable table for tracking
credit card sales. The table has one column family (cf
) and the following
columns:
- Row key with the format
credit card number
#transaction timestamp
- Merchant
- Amount
- Category
- Transaction date
Query one column
Filter the results to just one column family and one column using a WHERE
clause.
SELECT row_key, column_family, column, value, timestamp,
FROM your_dataset.your_table
WHERE
mod_type="SET_CELL"
AND column_family="cf"
AND column="merchant"
LIMIT 1000
Parse values
All values are stored as strings or byte strings. You can cast a value to its intended type with conversion functions.
SELECT row_key, column_family, column, value, CAST(value AS NUMERIC) AS amount
FROM your_dataset.your_table
WHERE
mod_type="SET_CELL"
AND column_family="cf"
AND column="amount"
LIMIT 1000
Perform aggregations
You can perform more operations, such as aggregations on numeric values.
SELECT SUM(CAST(value AS NUMERIC)) as total_amount
FROM your_dataset.your_table
WHERE
mod_type="SET_CELL"
AND column_family="cf"
AND column="amount"
Pivot the data
To perform queries that involve multiple Bigtable columns, you need to pivot the table. Each new BigQuery row includes one data change record returned by the change stream from its corresponding row in your Bigtable table. Depending on your schema, you can use a combination of the row key and timestamp to group the data.
SELECT * FROM (
SELECT row_key, timestamp, column, value
FROM your_dataset.your_table
)
PIVOT (
MAX(value)
FOR column in ("merchant", "amount", "category", "transaction_date")
)
Pivoting with a dynamic columnset
If you have a dynamic set of columns, you can do some additional processing to get all the columns and put them into the query programmatically.
DECLARE cols STRING;
SET cols = (
SELECT CONCAT('("', STRING_AGG(DISTINCT column, '", "'), '")'),
FROM your_dataset.your_table
);
EXECUTE IMMEDIATE format("""
SELECT * FROM (
SELECT row_key, timestamp, column, value
FROM your_dataset.your_table
)
PIVOT (
MAX(value)
FOR column in %s
)""", cols);
JSON Data
If you are setting all values with JSON, you need to parse them and extract the values based on the keys. You can use parsing functions once you have derived the value from the JSON object. These examples use the credit card sales data introduced earlier, but instead of writing data to multiple columns, the data is written as a single column as a JSON object.
SELECT
row_key,
JSON_VALUE(value, "$.category") as category,
CAST(JSON_VALUE(value, "$.amount") AS NUMERIC) as amount
FROM your_dataset.your_table
LIMIT 1000
Aggregation queries with JSON
You can perform aggregation queries with JSON values.
SELECT
JSON_VALUE(value, "$.category") as category,
SUM(CAST(JSON_VALUE(value, "$.amount") AS NUMERIC)) as total_amount
FROM your_dataset.your_table
GROUP BY category
What's next
- Learn more about Bigtable change streams.
- Refer to the Bigtable change stream to BigQuery template reference page.
- Learn how to set up the Bigtable change stream to BigQuery template.