BigQuery subscriptions

Stay organized with collections Save and categorize content based on your preferences.

A BigQuery subscription writes messages to an existing BigQuery table as they are received. You're not required to configure a subscriber client separately. Use the Google Cloud console, the Google Cloud CLI, the client libraries, or the Pub/Sub API to create, update, list, detach, or delete a BigQuery subscription.

As an alternative for simple data ingestion pipelines that often use Dataflow to write to BigQuery, the BigQuery subscription has the following advantages:

  • Simple deployment. You can set up a BigQuery subscription through a single workflow in the console, Google Cloud CLI, client library, or Pub/Sub API.

  • Offers low costs. Removes the additional cost and latency of similar Pub/Sub pipelines that include Dataflow jobs. This cost optimization is useful for messaging systems that do not require additional processing before storage.

  • Minimizes monitoring. BigQuery subscriptions are part of the multi-tenant Pub/Sub service and do not require you to run separate monitoring jobs.

Before you begin

Before reading this document, ensure that you're familiar with the following:

  • How Pub/Sub works and the different Pub/Sub terms.

  • The different kinds of subscriptions that Pub/Sub supports and why you might want to use a BigQuery subscription.

  • How BigQuery works and how to configure and manage BigQuery tables.

In addition to your familiarity with Pub/Sub and BigQuery, ensure that you meet the following prerequisites before you create a BigQuery subscription:

  • A BigQuery table exists or you create one when you create the BigQuery subscription. For more information, see BigQuery subscription

  • Compatibility between the schema of the Pub/Sub topic and the BigQuery table. If you add a non-compatible BigQuery table, you get a compatibility-related error message. For more information, see the next sections.

Properties of a BigQuery subscription

The properties that you configure for a BigQuery subscription determine the BigQuery table to which Pub/Sub writes messages and the type of schema of that table.

When you select a subscription delivery type as Write to BigQuery, you can specify the following additional properties:

  • Use topic schema. This option lets Pub/Sub use the schema of the Pub/Sub topic to which the subscription is attached. In addition, Pub/Sub writes the fields in messages to the corresponding columns in the BigQuery table. When you use this option, remember to check the following additional requirements:

    • The fields in the topic schema and the BigQuery schema must have the same names and their types must be compatible with each other.

    • Any optional field in the topic schema must also be optional in the BigQuery schema.

    • Required fields in the topic schema do not need to be required in the BigQuery schema.

    • If there are BigQuery fields that are not present in the topic schema, these BigQuery fields must be in mode NULLABLE.

    • If the topic schema has additional fields that are not present in the BigQuery schema and these fields can be dropped, select the option Drop unknown fields.

    If you do not select the Use topic schema option, ensure that the BigQuery table has a column called data of type BYTES, STRING, or JSON. Pub/Sub writes the message to this BigQuery column.

Parameters
subscription_name

STRING

Name of a subscription.

message_id

STRING

ID of a message

publish_time

TIMESTAMP

The time of publishing a message.

data

BYTES, STRING, or JSON

The message body.

The data field is required for all destination BigQuery tables.

attributes

STRING or JSON

A JSON object containing all message attributes. It also contains additional fields that are part of the Pub/Sub message including the ordering key, if present.

  • Drop unknown fields. This option is used with the Use topic schema option. This option lets Pub/Sub drop any field that is present in the topic schema but not in the BigQuery schema. Without Drop unknown fields set, messages with extra fields are not written to BigQuery and remain in the subscription backlog. The subscription ends up in an error state.

For the list of properties common to all types of subscriptions and to create the BigQuery subscription, see Create and use subscriptions.

Schema compatibility

Pub/Sub and BigQuery use different ways to define their schemas. Pub/Sub schemas are defined in Apache Avro or Protocol Buffer format while BigQuery schemas are defined in ZetaSQL format. The following is a list of important information regarding the schema compatibility between a Pub/Sub topic and a BigQuery table.

  • Any message that contains an improperly formatted field is not written to BigQuery.

  • In the BigQuery schema, INT, SMALLINT, INTEGER, BIGINT, TINYINT, and BYTEINT are aliases for INT64.

  • BigQuery does not have any unsigned type and so uint64 is not representable. Therefore, any Pub/Sub schema that contains a uint64 or fixed64 type cannot be connected to a BigQuery table.

  • When the type in the topic schema is a string and the type in the BigQuery table is TIMESTAMP, DATETIME, DATE, or TIME, then any value for this field in a Pub/Sub message must adhere to the format specified for the BigQuery data type.

  • Some Avro logical types are supported, as specified in the following table. Any logical types not listed only match the equivalent Avro type that they annotate, as detailed in the Avro specification.

The following is a collection of mapping of different schema formats.

Avro to ZetaSQL

Avro Type ZetaSQL Type
null Any NULLABLE
boolean BOOLEAN
int INTEGER
long INTEGER
float FLOAT
double FLOAT64
bytes BYTES
string STRING or JSON
record RECORD/STRUCT
array ARRAY
map ARRAY<KeyType, ValueType>
union NULLABLE nested types
fixed BYTES
enum INT64

Protobuf to ZetaSQL

Protocol Buffer Type ZetaSQL Type
double FLOAT64
float FLOAT
int32 INT64
int64 INT64
uint32 INT64
uint64 Unmappable
sint32 INT64
sint64 INT64
fixed32 INT64
fixed64 Unmappable
sfixed32 INT64
sfixed64 INT64
bool BOOLEAN
string STRING or JSON
bytes BYTES
enum INT64
message RECORD/STRUCT
oneof NULLABLE nested types
map ARRAY<KeyType, ValueType>
enum INT64
repeated/array ARRAY

Avro logical to ZetaSQL

Avro Logical Type ZetaSQL Type
timestamp-micros TIMESTAMP
date DATE
time-micros TIME

Pub/Sub service account permissions

To create a BigQuery subscription, the Pub/Sub service account must have permission to write to the specific BigQuery table and to read the table metadata. For more information, see Assign BigQuery roles to the Pub/Sub service account.

Handle message failures

When a Pub/Sub message cannot be written to BigQuery, the message cannot be acknowledged. To forward such undeliverable messages, configure a dead-letter topic on the BigQuery subscription.

Quotas

There are quota limitations on the BigQuery subscriber throughput per region. For more information, see Pub/Sub quotas and limits.

BigQuery subscriptions write data by using the BigQuery Storage Write API. For information about the quotas and limits for the Storage Write API, see BigQuery Storage Write API requests. BigQuery subscriptions only consume the throughput quota for the Storage Write API. You can ignore the other Storage Write API quota considerations in this instance.

Pricing

For the pricing for BigQuery subscriptions, see the Pub/Sub pricing page.

What's next