A BigQuery subscription writes messages to an existing BigQuery table as they are received. You're not required to configure a subscriber client separately. Use the console, the Google Cloud CLI, the client libraries, or the Pub/Sub API to create, update, list, detach, or delete a BigQuery subscription.
As an alternative for simple data ingestion pipelines that often use Dataflow to write to BigQuery, the BigQuery subscription has the following advantages:
Simple deployment. You can set up a BigQuery subscription through a single workflow in the console, Google Cloud CLI, client library, or Pub/Sub API.
Offers low costs. Removes the additional cost and latency of similar Pub/Sub pipelines that include Dataflow jobs. This cost optimization is useful for messaging systems that do not require additional processing before storage.
Minimizes monitoring. BigQuery subscriptions are part of the multi-tenant Pub/Sub service and do not require you to run separate monitoring jobs.
Before you begin
Before reading this document, ensure that you're familiar with the following:
How Pub/Sub works and the different Pub/Sub terms.
The different kinds of subscriptions that Pub/Sub supports and why you might want to use a BigQuery subscription.
How BigQuery works and how to configure and manage BigQuery tables.
Properties of a BigQuery subscription
The properties that you configure for a BigQuery subscription determine the BigQuery table to which Pub/Sub writes messages and the type of schema of that table.
When you select a subscription delivery type as Write to BigQuery, you can specify the following additional properties:
Use topic schema. This option lets Pub/Sub use the schema of the Pub/Sub topic to which the subscription is attached. In addition, Pub/Sub writes the fields in messages to the corresponding columns in the BigQuery table. When you use this option, remember to check the following additional requirements:
The fields in the topic schema and the BigQuery schema must have the same names and their types must be compatible with each other.
Any optional field in the topic schema must also be optional in the BigQuery schema.
Required fields in the topic schema do not need to be required in the BigQuery schema.
If there are BigQuery fields that are not present in the topic schema, these BigQuery fields must be in mode
NULLABLE
.If the topic schema has additional fields that are not present in the BigQuery schema and these fields can be dropped, select the option Drop unknown fields.
If you do not select the Use topic schema option, ensure that the BigQuery table has a column called
data
of typeBYTES
orSTRING
. Pub/Sub writes the message to this BigQuery column.Write metadata. This option lets Pub/Sub write the metadata of each message to additional columns in the BigQuery table. Else, the metadata is not written to the BigQuery table.
If you select the Write metadata option, ensure that the BigQuery table has the fields described in the following table.
If you do not select the Write metadata option, then the destination BigQuery table only requires the
data
field unlessuse_topic_schema
is true.
Parameters | |
---|---|
subscription_name | STRING Name of a subscription. |
message_id |
STRING ID of a message |
publish_time | TIMESTAMP The time of publishing a message. |
data |
BYTES or STRING The message body. The |
attributes | STRING A JSON object containing all message attributes. It also contains additional fields that are part of the Pub/Sub message including the ordering key, if present. |
- Drop unknown fields. This option is used with the Use topic schema option. This option lets Pub/Sub drop any field that is present in the topic schema but not in the BigQuery schema. Without Drop unknown fields set, messages with extra fields are not written to BigQuery and remain in the subscription backlog. The subscription ends up in an error state.
For the list of properties common to all types of subscriptions and to create the BigQuery subscription, see Create and use subscriptions.
Schema compatibility
Before you create a BigQuery subscription, you must ensure that
a BigQuery table exists. Also, ensure there is compatibility
between the schema of the Pub/Sub topic and the BigQuery table. If you add a non-compatible BigQuery table, you get a compatibility-related error message.
If you do not enable the topic schema
or choose not to use the topic schema, messages are written to a
column called data
in the BigQuery table.
The data
column is of type BYTES
or STRING
.
Pub/Sub and BigQuery use different ways to define their schemas. Pub/Sub schemas are defined in Apache Avro or Protocol Buffer format while BigQuery schemas are defined in ZetaSQL format.
In the BigQuery schema, INT
, SMALLINT
, INTEGER
, BIGINT
,
TINYINT
, and BYTEINT
are aliases for INT64
.
BigQuery does not have any unsigned
type and so uint64
is not representable.
Therefore, any Pub/Sub schema that contains a uint64
or fixed64
type cannot be
connected to a BigQuery table.
The following is a collection of mapping of different schema formats.
Avro to ZetaSQL
Avro Type | ZetaSQL Type |
null
|
Any NULLABLE
|
boolean
|
BOOLEAN
|
int
|
INTEGER
|
long
|
INTEGER
|
float
|
FLOAT
|
double
|
FLOAT64
|
bytes
|
BYTES
|
string
|
STRING
|
record
|
RECORD/STRUCT
|
array
|
ARRAY
|
map
|
ARRAY<KeyType, ValueType>
|
union
|
NULLABLE nested types
|
fixed
|
BYTES
|
enum
|
INT64
|
Protobuf to ZetaSQL
Protocol Buffer Type | ZetaSQL Type |
double
|
FLOAT64
|
float
|
FLOAT
|
int32
|
INT64
|
int64
|
INT64
|
uint32
|
INT64
|
uint64
|
Unmappable |
sint32
|
INT64
|
sint64
|
INT64
|
fixed32
|
INT64
|
fixed64
|
Unmappable |
sfixed32
|
INT64
|
sfixed64
|
INT64
|
bool
|
BOOLEAN
|
string
|
STRING
|
bytes
|
BYTES
|
enum
|
INT64
|
message
|
RECORD/STRUCT
|
oneof
|
NULLABLE nested types
|
map
|
ARRAY<KeyType, ValueType>
|
enum
|
INT64
|
repeated/array
|
ARRAY
|
Pub/Sub service account permissions
To create a BigQuery subscription, the Pub/Sub service account must have permission to write to the specific BigQuery table and to read the table metadata. For more information, see Assign BigQuery roles to the Pub/Sub service account.
Quotas
There are quota limitations on the BigQuery subscriber throughput per region. For more information, see Pub/Sub quotas and limits.
BigQuery subscriptions write data by using the BigQuery Storage Write API. For information about the quotas and limits for the Storage Write API, see BigQuery Storage Write API requests. BigQuery subscriptions only consume the throughput quota for the Storage Write API. You can ignore the other Storage Write API quota considerations in this instance.
Pricing
For the pricing for BigQuery subscriptions, see the Pub/Sub pricing page.
What's next
Create a subscription, such as a BigQuery subscription.
Troubleshoot a BigQuery subscription.
Read about BigQuery.
Review pricing for Pub/Sub, including BigQuery subscriptions.
Create or modify a subscription with
gcloud
CLI commands.Create or modify a subscription with REST APIs.