BigQuery subscriptions

A BigQuery subscription writes messages to an existing BigQuery table as they are received. You're not required to configure a subscriber client separately. Use the console, the Google Cloud CLI, the client libraries, or the Pub/Sub API to create, update, list, detach, or delete a BigQuery subscription.

As an alternative for simple data ingestion pipelines that often use Dataflow to write to BigQuery, the BigQuery subscription has the following advantages:

  • Simple deployment. You can set up a BigQuery subscription through a single workflow in the console, Google Cloud CLI, client library, or Pub/Sub API.

  • Offers low costs. Removes the additional cost and latency of similar Pub/Sub pipelines that include Dataflow jobs. This cost optimization is useful for messaging systems that do not require additional processing before storage.

  • Minimizes monitoring. BigQuery subscriptions are part of the multi-tenant Pub/Sub service and do not require you to run separate monitoring jobs.

Before you begin

Before reading this document, ensure that you're familiar with the following:

  • How Pub/Sub works and the different Pub/Sub terms.

  • The different kinds of subscriptions that Pub/Sub supports and why you might want to use a BigQuery subscription.

  • How BigQuery works and how to configure and manage BigQuery tables.

Properties of a BigQuery subscription

The properties that you configure for a BigQuery subscription determine the BigQuery table to which Pub/Sub writes messages and the type of schema of that table.

When you select a subscription delivery type as Write to BigQuery, you can specify the following additional properties:

  • Use topic schema. This option lets Pub/Sub use the schema of the Pub/Sub topic to which the subscription is attached. In addition, Pub/Sub writes the fields in messages to the corresponding columns in the BigQuery table. When you use this option, remember to check the following additional requirements:

    • The fields in the topic schema and the BigQuery schema must have the same names and their types must be compatible with each other.

    • Any optional field in the topic schema must also be optional in the BigQuery schema.

    • Required fields in the topic schema do not need to be required in the BigQuery schema.

    • If there are BigQuery fields that are not present in the topic schema, these BigQuery fields must be in mode NULLABLE.

    • If the topic schema has additional fields that are not present in the BigQuery schema and these fields can be dropped, select the option Drop unknown fields.

    If you do not select the Use topic schema option, ensure that the BigQuery table has a column called data of type BYTES or STRING. Pub/Sub writes the message to this BigQuery column.

  • Write metadata. This option lets Pub/Sub write the metadata of each message to additional columns in the BigQuery table. Else, the metadata is not written to the BigQuery table.

    If you select the Write metadata option, ensure that the BigQuery table has the fields described in the following table.

    If you do not select the Write metadata option, then the destination BigQuery table only requires the data field unless use_topic_schema is true.

Parameters
subscription_name

STRING

Name of a subscription.

message_id

STRING

ID of a message

publish_time

TIMESTAMP

The time of publishing a message.

data

BYTES or STRING

The message body.

The data field is required for all destination BigQuery tables.

attributes

STRING

A JSON object containing all message attributes. It also contains additional fields that are part of the Pub/Sub message including the ordering key, if present.

  • Drop unknown fields. This option is used with the Use topic schema option. This option lets Pub/Sub drop any field that is present in the topic schema but not in the BigQuery schema. Without Drop unknown fields set, messages with extra fields are not written to BigQuery and remain in the subscription backlog. The subscription ends up in an error state.

For the list of properties common to all types of subscriptions and to create the BigQuery subscription, see Create and use subscriptions.

Schema compatibility

Before you create a BigQuery subscription, you must ensure that a BigQuery table exists. Also, ensure there is compatibility between the schema of the Pub/Sub topic and the BigQuery table. If you add a non-compatible BigQuery table, you get a compatibility-related error message. If you do not enable the topic schema or choose not to use the topic schema, messages are written to a column called data in the BigQuery table. The data column is of type BYTES or STRING.

Pub/Sub and BigQuery use different ways to define their schemas. Pub/Sub schemas are defined in Apache Avro or Protocol Buffer format while BigQuery schemas are defined in ZetaSQL format.

In the BigQuery schema, INT, SMALLINT, INTEGER, BIGINT, TINYINT, and BYTEINT are aliases for INT64.

BigQuery does not have any unsigned type and so uint64 is not representable. Therefore, any Pub/Sub schema that contains a uint64 or fixed64 type cannot be connected to a BigQuery table.

The following is a collection of mapping of different schema formats.

Avro to ZetaSQL

Avro Type ZetaSQL Type
null Any NULLABLE
boolean BOOLEAN
int INTEGER
long INTEGER
float FLOAT
double FLOAT64
bytes BYTES
string STRING
record RECORD/STRUCT
array ARRAY
map ARRAY<KeyType, ValueType>
union NULLABLE nested types
fixed BYTES
enum INT64

Protobuf to ZetaSQL

Protocol Buffer Type ZetaSQL Type
double FLOAT64
float FLOAT
int32 INT64
int64 INT64
uint32 INT64
uint64 Unmappable
sint32 INT64
sint64 INT64
fixed32 INT64
fixed64 Unmappable
sfixed32 INT64
sfixed64 INT64
bool BOOLEAN
string STRING
bytes BYTES
enum INT64
message RECORD/STRUCT
oneof NULLABLE nested types
map ARRAY<KeyType, ValueType>
enum INT64
repeated/array ARRAY

Pub/Sub service account permissions

To create a BigQuery subscription, the Pub/Sub service account must have permission to write to the specific BigQuery table and to read the table metadata. For more information, see Assign BigQuery roles to the Pub/Sub service account.

Quotas

There are quota limitations on the BigQuery subscriber throughput per region. For more information, see Pub/Sub quotas and limits.

BigQuery subscriptions write data by using the BigQuery Storage Write API. For information about the quotas and limits for the Storage Write API, see BigQuery Storage Write API requests. BigQuery subscriptions only consume the throughput quota for the Storage Write API. You can ignore the other Storage Write API quota considerations in this instance.

Pricing

For the pricing for BigQuery subscriptions, see the Pub/Sub pricing page.

What's next