Query protobuf data
This document provides examples of common query patterns for reading and querying protocol buffer (protobuf) data stored in Bigtable.
Before you read this page, familiarize yourself with the following:
Example data
The following examples use a Music table that stores information about albums
and artists. The data is stored in a column family named album_details, with a
column qualifier album. The column qualifier contains protobuf messages.
The protobuf schema is defined in the following proto files:
artist.proto:syntax = "proto3"; package package_name; message Artist { string name = 1; }album.proto:syntax = "proto3"; package package_name; import "artist.proto"; message Album { string title = 1; Artist artist = 2; int32 release_year = 3; }
As a result, Bigtable creates a schema bundle for this table that contains the following descriptor set for these protobuf definitions:
file {
name: "artist.proto"
package: "package_name"
message_type {
name: "Artist"
field {
name: "name"
number: 1
label: LABEL_OPTIONAL
type: TYPE_STRING
json_name: "name"
}
}
syntax: "proto3"
}
file {
name: "album.proto"
package: "package_name"
dependency: "artist.proto"
message_type {
name: "Album"
field {
name: "title"
number: 1
label: LABEL_OPTIONAL
type: TYPE_STRING
json_name: "title"
}
field {
name: "artist"
number: 2
label: LABEL_OPTIONAL
type: TYPE_MESSAGE
type_name: ".package_name.Artist"
json_name: "artist"
}
field {
name: "release_year"
number: 3
label: LABEL_OPTIONAL
type: TYPE_INT32
json_name: "releaseYear"
}
}
syntax: "proto3"
}
Example queries
The following examples show how to query protobuf data using GoogleSQL for Bigtable and BigQuery external tables.
Cast a column to a protobuf message
You can use the CAST operator to interpret a BYTES value as a protobuf message.
To do this, you must provide the full name of the protobuf message in the
following format:
SCHEMA_BUNDLE_ID.FULLY_QUALIFIED_MESSAGE_NAME.
Replace the following:
SCHEMA_BUNDLE_ID: The unique ID you assigned to your schema bundle when you created it.FULLY_QUALIFIED_MESSAGE_NAME: The full name of the message, which must include the package name defined in your proto file–for example,package_name.message_name.
The following sample query casts the album column to the Album protobuf
message. This message is defined in the package_name package and is part of a
schema bundle named bundle_name:
SELECT
CAST(album_details['album'] AS bundle_name.package_name.Album).title
FROM
Music;
Access nested fields
You can access nested fields within a protobuf message using dot notation.
The following query retrieves the name of the artist from the nested Artist
message within the Album message:
SELECT
CAST(album_details['album'] AS bundle_name.package_name.Album).artist.name
FROM
Music;
Filter based on protobuf fields
You can use the WHERE clause to filter rows based on the values of fields
within a protobuf message.
The following query selects all albums by the artist Dana A.:
SELECT
*
FROM
Music
WHERE
CAST(album_details['album'] AS bundle_name.package_name.Album).artist.name = 'Dana A.';
Aggregate protobuf fields
You can use aggregate functions like SUM, AVG, MIN, MAX, and COUNT on
numeric fields within your protobuf messages.
The following query calculates the average release year of all albums in the table:
SELECT
AVG(CAST(album_details['album'] AS bundle_name.package_name.Album).release_year)
FROM
Music;
Order by protobuf fields
You can use the ORDER BY clause to sort the result set based on a field in
your protobuf message.
The following query retrieves all albums and sorts them by their release year in descending order:
SELECT
*
FROM
Music
ORDER BY
CAST(album_details['album'] AS bundle_name.package_name.Album).release_year DESC;
Use with BigQuery external tables
You can query protobuf data stored in Bigtable from BigQuery by
creating an external table. When creating the external table, you specify the
column type as JSON, its encoding as PROTO_BINARY, and you associate it with a
schema bundle. This process converts the protobuf message bytes into the
equivalent JSON data, allowing you to query its fields directly.
The following is an example of a table definition file for
creating a BigQuery external table
over the Music table:
{
"sourceFormat": "BIGTABLE",
"sourceUris": [
"https://googleapis.com/bigtable/projects/PROJECT_ID/instances/INSTANCE_ID/tables/Music"
],
"bigtableOptions": {
"columnFamilies" : [
{
"familyId": "album_details",
"columns": [
{
"qualifierString": "album",
"type": "JSON",
"encoding": "PROTO_BINARY",
"protoConfig": {
"schemaBundleId": "bundle_name",
"protoMessageName": "package_name.Album"
}
}
]
}
]
}
}
Once the external table is created, you can query the protobuf data as a JSON column in BigQuery.
The following query retrieves the title of all albums released in 2022 from the BigQuery external table:
SELECT JSON_VALUE(value, '$.title') AS title
FROM
`PROJECT_ID.DATASET.TABLE_NAME` AS t,
UNNEST(t.album_details.album.cell)
WHERE INT64(JSON_EXTRACT(value, '$.releaseYear')) = 2022;
What's next
- Read about schema design best practices for protobufs.