Query a Bigtable change log in BigQuery

This page provides guidance and example queries to help you process a Bigtable change log in BigQuery.

This page is intended for users who have completed the following:

Set up a Bigtable change stream-enabled table.
Run the Dataflow template that writes a change log to BigQuery. You can follow the quickstart to learn how to set this up.

This guide assumes some knowledge of BigQuery. To learn more, you can follow the quickstart that shows how to load and query data.

Open the change log table

In the Google Cloud console, go to the BigQuery page.

Go to BigQuery
In the Explorer pane, expand your project.
Expand your dataset.
Click the table with the suffix: _changelog.

Table format

The entire output schema contains several columns. This guide focuses on connecting the rows to their columns and values, and parsing values into analyzable formats.

Basic queries

The examples in this section use a Bigtable table for tracking credit card sales. The table has one column family (cf) and the following columns:

Row key with the format credit card number#transaction timestamp
Merchant
Amount
Category
Transaction date

Query one column

Filter the results to just one column family and one column using a WHERE clause.

SELECT row_key, column_family, column, value, timestamp,
FROM your_dataset.your_table
WHERE
  mod_type="SET_CELL"
  AND column_family="cf"
  AND column="merchant"
LIMIT 1000

Parse values

All values are stored as strings or byte strings. You can cast a value to its intended type with conversion functions.

SELECT row_key, column_family, column, value, CAST(value AS NUMERIC) AS amount
FROM your_dataset.your_table
WHERE
  mod_type="SET_CELL"
  AND column_family="cf"
  AND column="amount"
LIMIT 1000

Perform aggregations

You can perform more operations, such as aggregations on numeric values.

SELECT SUM(CAST(value AS NUMERIC)) as total_amount
FROM your_dataset.your_table
WHERE
  mod_type="SET_CELL"
  AND column_family="cf"
  AND column="amount"

Pivot the data

To perform queries that involve multiple Bigtable columns, you need to pivot the table. Each new BigQuery row includes one data change record returned by the change stream from its corresponding row in your Bigtable table. Depending on your schema, you can use a combination of the row key and timestamp to group the data.

SELECT * FROM (
  SELECT row_key, timestamp, column, value
  FROM your_dataset.your_table
)
PIVOT (
  MAX(value)
  FOR column in ("merchant", "amount", "category", "transaction_date")
)

Pivoting with a dynamic columnset

If you have a dynamic set of columns, you can do some additional processing to get all the columns and put them into the query programmatically.

DECLARE cols STRING;
SET cols = (
  SELECT CONCAT('("', STRING_AGG(DISTINCT column, '", "'), '")'),
  FROM your_dataset.your_table
);

EXECUTE IMMEDIATE format("""
SELECT * FROM (
  SELECT row_key, timestamp, column, value
  FROM your_dataset.your_table
)
PIVOT (
  MAX(value)
  FOR column in %s
)""", cols);

JSON Data

If you are setting all values with JSON, you need to parse them and extract the values based on the keys. You can use parsing functions once you have derived the value from the JSON object. These examples use the credit card sales data introduced earlier, but instead of writing data to multiple columns, the data is written as a single column as a JSON object.

SELECT
  row_key,
  JSON_VALUE(value, "$.category") as category,
  CAST(JSON_VALUE(value, "$.amount") AS NUMERIC) as amount
FROM your_dataset.your_table
LIMIT 1000

Aggregation queries with JSON

You can perform aggregation queries with JSON values.

SELECT
  JSON_VALUE(value, "$.category") as category,
  SUM(CAST(JSON_VALUE(value, "$.amount") AS NUMERIC)) as total_amount
FROM your_dataset.your_table
GROUP BY category

What's next

Learn more about Bigtable change streams.
Refer to the Bigtable change stream to BigQuery template reference page.
Learn how to set up the Bigtable change stream to BigQuery template.