Stream data using the Storage Write API
This document describes how to use the BigQuery Storage Write API to stream data into BigQuery.
In streaming scenarios, data arrives continuously and should be available for reads with minimal latency. When using the BigQuery Storage Write API for streaming workloads, consider what guarantees you need:
- If your application only needs at-least-once semantics, then use the default stream.
- If you need exactly-once semantics, then create one or more streams in committed type and use stream offsets to guarantee exactly-once writes.
In committed type, data written to the stream is available for query as soon as the server acknowledges the write request. The default stream also uses committed type, but does not provide exactly-once guarantees.
Use the default stream for at-least-once semantics
If your application can accept the possibility of duplicate records appearing in the destination table, then we recommend using the default stream for streaming scenarios.
The following code shows how to write data to the default stream:
Java
To learn how to install and use the client library for BigQuery, see
BigQuery client libraries.
For more information, see the
BigQuery Java API
reference documentation.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for client libraries.
Node.js
To learn how to install and use the client library for BigQuery, see
BigQuery client libraries.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for client libraries.
Use multiplexing
You enable
multiplexing
at the stream writer level for default stream only. To enable multiplexing in
Java, call the setEnableConnectionPool
method when you construct a
StreamWriter
or JsonStreamWriter
object:
// One possible way for constructing StreamWriter StreamWriter.newBuilder(streamName) .setWriterSchema(protoSchema) .setEnableConnectionPool(true) .build(); // One possible way for constructing JsonStreamWriter JsonStreamWriter.newBuilder(tableName, bigqueryClient) .setEnableConnectionPool(true) .build();
To enable multiplexing in Go, see Connection Sharing (Multiplexing).
Use committed type for exactly-once semantics
If you need exactly-once write semantics, create a write stream in committed type. In committed type, records are available for query as soon as the client receives acknowledgement from the back end.
Committed type provides exactly-once delivery within a stream through the use of
record offsets. By using record offsets, the application specifies the next
append offset in each call to AppendRows
. The write operation is
only performed if the offset value matches the next append offset. For more
information, see
Manage stream offsets to achieve exactly-once semantics.
If you don't provide an offset, then records are appended to the current end of the stream. In that case, if an append request returns an error, retrying it could result in the record appearing more than once in the stream.
To use committed type, perform the following steps:
Java
- Call
CreateWriteStream
to create one or more streams in committed type. - For each stream, call
AppendRows
in a loop to write batches of records. - Call
FinalizeWriteStream
for each stream to release the stream. After you call this method, you cannot write any more rows to the stream. This step is optional in committed type, but helps to prevent exceeding the limit on active streams. For more information, see Limit the rate of stream creation.
Node.js
- Call
createWriteStreamFullResponse
to create one or more streams in committed type. - For each stream, call
appendRows
in a loop to write batches of records. - Call
finalize
for each stream to release the stream. After you call this method, you cannot write any more rows to the stream. This step is optional in committed type, but helps to prevent exceeding the limit on active streams. For more information, see Limit the rate of stream creation.
You cannot delete a stream explicitly. Streams follow the system-defined time to live (TTL):
- A committed stream has a TTL of three days if there is no traffic on the stream.
- A buffered stream by default has a TTL of seven days if there is no traffic on the stream.
The following code shows how to use committed type:
Java
To learn how to install and use the client library for BigQuery, see
BigQuery client libraries.
For more information, see the
BigQuery Java API
reference documentation.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for client libraries.
Node.js
To learn how to install and use the client library for BigQuery, see
BigQuery client libraries.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for client libraries.