Index
BigQueryRead
(interface)BigQueryWrite
(interface)AppendRowsRequest
(message)AppendRowsRequest.ProtoData
(message)AppendRowsResponse
(message)AppendRowsResponse.AppendResult
(message)ArrowRecordBatch
(message)ArrowSchema
(message)ArrowSerializationOptions
(message)ArrowSerializationOptions.CompressionCodec
(enum)AvroRows
(message)AvroSchema
(message)BatchCommitWriteStreamsRequest
(message)BatchCommitWriteStreamsResponse
(message)CreateReadSessionRequest
(message)CreateWriteStreamRequest
(message)DataFormat
(enum)FinalizeWriteStreamRequest
(message)FinalizeWriteStreamResponse
(message)FlushRowsRequest
(message)FlushRowsResponse
(message)GetWriteStreamRequest
(message)ProtoRows
(message)ProtoSchema
(message)ReadRowsRequest
(message)ReadRowsResponse
(message)ReadSession
(message)ReadSession.TableModifiers
(message)ReadSession.TableReadOptions
(message)ReadStream
(message)SplitReadStreamRequest
(message)SplitReadStreamResponse
(message)StorageError
(message)StorageError.StorageErrorCode
(enum)StreamStats
(message)StreamStats.Progress
(message)TableFieldSchema
(message)TableFieldSchema.Mode
(enum)TableFieldSchema.Type
(enum)TableSchema
(message)ThrottleState
(message)WriteStream
(message)WriteStream.Type
(enum)
BigQueryRead
BigQuery Read API.
The Read API can be used to read data from BigQuery.
CreateReadSession |
---|
Creates a new read session. A read session divides the contents of a BigQuery table into one or more streams, which can then be used to read data from the table. The read session also specifies properties of the data to be read, such as a list of columns or a push-down filter describing the rows to be returned. A particular row can be read by at most one stream. When the caller has reached the end of each stream in the session, then all the data in the table has been read. Data is assigned to each stream such that roughly the same number of rows can be read from each stream. Because the server-side unit for assigning data is collections of rows, the API does not guarantee that each stream will return the same number or rows. Additionally, the limits are enforced based on the number of pre-filtered rows, so some filters can lead to lopsided assignments. Read sessions automatically expire 6 hours after they are created and do not require manual clean-up by the caller. |
ReadRows |
---|
Reads rows from the stream in the format prescribed by the ReadSession. Each response contains one or more table rows, up to a maximum of 100 MiB per response; read requests which attempt to read individual rows larger than 100 MiB will fail. Each request also returns a set of stream statistics reflecting the current state of the stream. |
SplitReadStream |
---|
Splits a given Moreover, the two child streams will be allocated back-to-back in the original |
BigQueryWrite
BigQuery Write API.
The Write API can be used to write data to BigQuery.
For supplementary information about the Write API, see: https://cloud.google.com/bigquery/docs/write-api
AppendRows |
---|
Appends data to the given stream. If The response contains an optional offset at which the append happened. No offset information will be returned for appends to a default stream. Responses are received in the same order in which requests are sent. There will be one response for each successful inserted request. Responses may optionally embed error information if the originating AppendRequest was not successfully processed. The specifics of when successfully appended data is made visible to the table are governed by the type of stream:
Note: For users coding against the gRPC api directly, it may be necessary to supply the x-goog-request-params system parameter with More information about system parameters: https://cloud.google.com/apis/docs/system-parameters |
BatchCommitWriteStreams |
---|
Atomically commits a group of Streams must be finalized before commit and cannot be committed multiple times. Once a stream is committed, data in the stream becomes available for read operations. |
CreateWriteStream |
---|
Creates a write stream to the given table. Additionally, every table has a special stream named '_default' to which data can be written. This stream doesn't need to be created using CreateWriteStream. It is a stream that can be used simultaneously by any number of clients. Data written to this stream is considered committed as soon as an acknowledgement is received. |
FinalizeWriteStream |
---|
Finalize a write stream so that no new data can be appended to the stream. Finalize is not supported on the '_default' stream. |
FlushRows |
---|
Flushes rows to a BUFFERED stream. If users are appending rows to BUFFERED stream, flush operation is required in order for the rows to become available for reading. A Flush operation flushes up to any previously flushed offset in a BUFFERED stream, to the offset specified in the request. Flush is not supported on the _default stream, since it is not BUFFERED. |
GetWriteStream |
---|
Gets information about a write stream. |
AppendRowsRequest
Request message for AppendRows
.
Due to the nature of AppendRows being a bidirectional streaming RPC, certain parts of the AppendRowsRequest need only be specified for the first request sent each time the gRPC network connection is opened/reopened.
Fields | |
---|---|
write_stream |
Required. The write_stream identifies the target of the append operation, and only needs to be specified as part of the first request on the gRPC connection. If provided for subsequent requests, it must match the value of the first request. For explicitly created write streams, the format is:
For the special default stream, the format is:
Authorization requires the following IAM permission on the specified resource
|
offset |
If present, the write is only performed if the next append offset is same as the provided value. If not present, the write is performed at the current end of stream. Specifying a value for this field is not allowed when calling AppendRows for the '_default' stream. |
trace_id |
Id set by client to annotate its identity. Only initial request setting is respected. |
proto_rows |
Rows in proto format. |
ProtoData
ProtoData contains the data rows and schema when constructing append requests.
Fields | |
---|---|
writer_schema |
Proto schema used to serialize the data. This value only needs to be provided as part of the first request on a gRPC network connection, and will be ignored for subsequent requests on the connection. |
rows |
Serialized row data in protobuf message format. Currently, the backend expects the serialized rows to adhere to proto2 semantics when appending rows, particularly with respect to how default values are encoded. |
AppendRowsResponse
Response message for AppendRows
.
Fields | |
---|---|
updated_schema |
If backend detects a schema update, pass it to user so that user can use it to input new type of message. It will be empty when no schema updates have occurred. |
Union field
|
|
append_result |
Result if the append is successful. |
error |
Error returned when problems were encountered. If present, it indicates rows were not accepted into the system. Users can retry or continue with other append requests within the same connection. Additional information about error signalling: ALREADY_EXISTS: Happens when an append specified an offset, and the backend already has received data at this offset. Typically encountered in retry scenarios, and can be ignored. OUT_OF_RANGE: Returned when the specified offset in the stream is beyond the current end of the stream. INVALID_ARGUMENT: Indicates a malformed request or data. ABORTED: Request processing is aborted because of prior failures. The request can be retried if previous failure is addressed. INTERNAL: Indicates server side error(s) that can be retried. |
AppendResult
AppendResult is returned for successful append requests.
Fields | |
---|---|
offset |
The row offset at which the last append occurred. The offset will not be set if appending using default streams. |
ArrowRecordBatch
Arrow RecordBatch.
Fields | |
---|---|
serialized_record_batch |
IPC-serialized Arrow RecordBatch. |
row_count |
[Deprecated] The count of rows in |
ArrowSchema
Arrow schema as specified in https://arrow.apache.org/docs/python/api/datatypes.html and serialized to bytes using IPC: https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc
See code samples on how this message can be deserialized.
Fields | |
---|---|
serialized_schema |
IPC serialized Arrow schema. |
ArrowSerializationOptions
Contains options specific to Arrow Serialization.
Fields | |
---|---|
buffer_compression |
The compression codec to use for Arrow buffers in serialized record batches. |
CompressionCodec
Compression codec's supported by Arrow.
Enums | |
---|---|
COMPRESSION_UNSPECIFIED |
If unspecified no compression will be used. |
LZ4_FRAME |
LZ4 Frame (https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md) |
ZSTD |
Zstandard compression. |
AvroRows
Avro rows.
Fields | |
---|---|
serialized_binary_rows |
Binary serialized rows in a block. |
row_count |
[Deprecated] The count of rows in the returning block. Please use the format-independent ReadRowsResponse.row_count instead. |
AvroSchema
Avro schema.
Fields | |
---|---|
schema |
Json serialized schema, as described at https://avro.apache.org/docs/1.8.1/spec.html. |
BatchCommitWriteStreamsRequest
Request message for BatchCommitWriteStreams
.
Fields | |
---|---|
parent |
Required. Parent table that all the streams should belong to, in the form of Authorization requires the following IAM permission on the specified resource
|
write_streams[] |
Required. The group of streams that will be committed atomically. |
BatchCommitWriteStreamsResponse
Response message for BatchCommitWriteStreams
.
Fields | |
---|---|
commit_time |
The time at which streams were committed in microseconds granularity. This field will only exist when there are no stream errors. Note if this field is not set, it means the commit was not successful. |
stream_errors[] |
Stream level error if commit failed. Only streams with error will be in the list. If empty, there is no error and all streams are committed successfully. If non empty, certain streams have errors and ZERO stream is committed due to atomicity guarantee. |
CreateReadSessionRequest
Request message for CreateReadSession
.
Fields | |
---|---|
parent |
Required. The request project that owns the session, in the form of Authorization requires the following IAM permission on the specified resource
|
read_session |
Required. Session to be created. Authorization requires the following IAM permission on the specified resource
|
max_stream_count |
Max initial number of streams. If unset or zero, the server will provide a value of streams so as to produce reasonable throughput. Must be non-negative. The number of streams may be lower than the requested number, depending on the amount parallelism that is reasonable for the table. Error will be returned if the max count is greater than the current system max limit of 1,000. Streams must be read starting from offset 0. |
CreateWriteStreamRequest
Request message for CreateWriteStream
.
Fields | |
---|---|
parent |
Required. Reference to the table to which the stream belongs, in the format of Authorization requires the following IAM permission on the specified resource
|
write_stream |
Required. Stream to be created. |
DataFormat
Data format for input or output data.
Enums | |
---|---|
DATA_FORMAT_UNSPECIFIED |
|
AVRO |
Avro is a standard open source row based file format. See https://avro.apache.org/ for more details. |
ARROW |
Arrow is a standard open source column-based message format. See https://arrow.apache.org/ for more details. |
FinalizeWriteStreamRequest
Request message for invoking FinalizeWriteStream
.
Fields | |
---|---|
name |
Required. Name of the stream to finalize, in the form of Authorization requires the following IAM permission on the specified resource
|
FinalizeWriteStreamResponse
Response message for FinalizeWriteStream
.
Fields | |
---|---|
row_count |
Number of rows in the finalized stream. |
FlushRowsRequest
Request message for FlushRows
.
Fields | |
---|---|
write_stream |
Required. The stream that is the target of the flush operation. Authorization requires the following IAM permission on the specified resource
|
offset |
Ending offset of the flush operation. Rows before this offset(including this offset) will be flushed. |
FlushRowsResponse
Respond message for FlushRows
.
Fields | |
---|---|
offset |
The rows before this offset (including this offset) are flushed. |
GetWriteStreamRequest
Request message for GetWriteStreamRequest
.
Fields | |
---|---|
name |
Required. Name of the stream to get, in the form of Authorization requires the following IAM permission on the specified resource
|
ProtoRows
Fields | |
---|---|
serialized_rows[] |
A sequence of rows serialized as a Protocol Buffer. See https://developers.google.com/protocol-buffers/docs/overview for more information on deserializing this field. |
ProtoSchema
ProtoSchema describes the schema of the serialized protocol buffer data rows.
Fields | |
---|---|
proto_descriptor |
Descriptor for input message. The descriptor has to be self contained, including all the nested types, excepted for proto buffer well known types See: https://developers.google.com/protocol-buffers/docs/reference/google.protobuf . |
ReadRowsRequest
Request message for ReadRows
.
Fields | |
---|---|
read_stream |
Required. Stream to read rows from. Authorization requires the following IAM permission on the specified resource
|
offset |
The offset requested must be less than the last row read from Read. Requesting a larger offset is undefined. If not specified, start reading from offset zero. |
ReadRowsResponse
Response from calling ReadRows
may include row data, progress and throttling information.
Fields | |
---|---|
row_count |
Number of serialized rows in the rows block. |
stats |
Statistics for the stream. |
throttle_state |
Throttling state. If unset, the latest response still describes the current throttling status. |
Union field rows . Row data is returned in format specified during session creation. rows can be only one of the following: |
|
avro_rows |
Serialized row data in AVRO format. |
arrow_record_batch |
Serialized row data in Arrow RecordBatch format. |
Union field schema . The schema for the read. If read_options.selected_fields is set, the schema may be different from the table schema as it will only contain the selected fields. This schema is equivelant to the one returned by CreateSession. This field is only populated in the first ReadRowsResponse RPC. schema can be only one of the following: |
|
avro_schema |
Output only. Avro schema. |
arrow_schema |
Output only. Arrow schema. |
ReadSession
Information about the ReadSession.
Fields | |
---|---|
name |
Output only. Unique identifier for the session, in the form |
expire_time |
Output only. Time at which the session becomes invalid. After this time, subsequent requests to read this Session will return errors. The expire_time is automatically assigned and currently cannot be specified or updated. |
data_format |
Immutable. Data format of the output data. |
table |
Immutable. Table that this ReadSession is reading from, in the form Authorization requires one or more of the following IAM permissions on the specified resource
|
table_modifiers |
Optional. Any modifiers which are applied when reading from the specified table. |
read_options |
Optional. Read options for this session (e.g. column selection, filters). |
streams[] |
Output only. A list of streams created with the session. At least one stream is created with the session. In the future, larger request_stream_count values may result in this list being unpopulated, in that case, the user will need to use a List method to get the streams instead, which is not yet available. |
estimated_total_bytes_scanned |
Output only. An estimate on the number of bytes this session will scan when all streams are completely consumed. This estimate is based on metadata from the table which might be incomplete or stale. |
trace_id |
Optional. ID set by client to annotate a session identity. This does not need to be strictly unique, but instead the same ID should be used to group logically connected sessions (e.g. All using the same ID for all sessions needed to complete a Spark SQL query is reasonable). Maximum length is 256 bytes. |
Union field schema . The schema for the read. If read_options.selected_fields is set, the schema may be different from the table schema as it will only contain the selected fields. schema can be only one of the following: |
|
avro_schema |
Output only. Avro schema. |
arrow_schema |
Output only. Arrow schema. |
TableModifiers
Additional attributes when reading a table.
Fields | |
---|---|
snapshot_time |
The snapshot time of the table. If not set, interpreted as now. |
TableReadOptions
Options dictating how we read a table.
Fields | |
---|---|
selected_fields[] |
Names of the fields in the table that should be read. If empty, all fields will be read. If the specified field is a nested field, all the sub-fields in the field will be selected. The output field order is unrelated to the order of fields in selected_fields. |
row_restriction |
SQL text filtering statement, similar to a WHERE clause in a query. Aggregates are not supported. Examples: "int_field > 5" "date_field = CAST('2014-9-27' as DATE)" "nullable_field is not NULL" "st_equals(geo_field, st_geofromtext("POINT(2, 2)"))" "numeric_field BETWEEN 1.0 AND 5.0" Restricted to a maximum length for 1 MB. |
arrow_serialization_options |
Optional. Options specific to the Apache Arrow output format. |
ReadStream
Information about a single stream that gets data out of the storage system. Most of the information about ReadStream
instances is aggregated, making ReadStream
lightweight.
Fields | |
---|---|
name |
Output only. Name of the stream, in the form |
SplitReadStreamRequest
Request message for SplitReadStream
.
Fields | |
---|---|
name |
Required. Name of the stream to split. Authorization requires the following IAM permission on the specified resource
|
fraction |
A value in the range (0.0, 1.0) that specifies the fractional point at which the original stream should be split. The actual split point is evaluated on pre-filtered rows, so if a filter is provided, then there is no guarantee that the division of the rows between the new child streams will be proportional to this fractional value. Additionally, because the server-side unit for assigning data is collections of rows, this fraction will always map to a data storage boundary on the server side. |
SplitReadStreamResponse
Response message for SplitReadStream
.
Fields | |
---|---|
primary_stream |
Primary stream, which contains the beginning portion of |original_stream|. An empty value indicates that the original stream can no longer be split. |
remainder_stream |
Remainder stream, which contains the tail of |original_stream|. An empty value indicates that the original stream can no longer be split. |
StorageError
Structured custom BigQuery Storage error message. The error can be attached as error details in the returned rpc Status. In particular, the use of error codes allows more structured error handling, and reduces the need to evaluate unstructured error text strings.
Fields | |
---|---|
code |
BigQuery Storage specific error code. |
entity |
Name of the failed entity. |
error_message |
Message that describes the error. |
StorageErrorCode
Error code for StorageError
.
Enums | |
---|---|
STORAGE_ERROR_CODE_UNSPECIFIED |
Default error. |
TABLE_NOT_FOUND |
Table is not found in the system. |
STREAM_ALREADY_COMMITTED |
Stream is already committed. |
STREAM_NOT_FOUND |
Stream is not found. |
INVALID_STREAM_TYPE |
Invalid Stream type. For example, you try to commit a stream that is not pending. |
INVALID_STREAM_STATE |
Invalid Stream state. For example, you try to commit a stream that is not finalized or is garbaged. |
STREAM_FINALIZED |
Stream is finalized. |
SCHEMA_MISMATCH_EXTRA_FIELDS |
There is a schema mismatch and it is caused by user schema has extra field than bigquery schema. |
OFFSET_ALREADY_EXISTS |
Offset already exists. |
OFFSET_OUT_OF_RANGE |
Offset out of range. |
StreamStats
Estimated stream statistics for a given read Stream.
Fields | |
---|---|
progress |
Represents the progress of the current stream. |
Progress
Fields | |
---|---|
at_response_start |
The fraction of rows assigned to the stream that have been processed by the server so far, not including the rows in the current response message. This value, along with Note that if a filter is provided, the |
at_response_end |
Similar to |
TableFieldSchema
TableFieldSchema defines a single field/column within a table schema.
Fields | |
---|---|
name |
Required. The field name. The name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore. The maximum length is 128 characters. |
type |
Required. The field data type. |
mode |
Optional. The field mode. The default value is NULLABLE. |
fields[] |
Optional. Describes the nested schema fields if the type property is set to STRUCT. |
description |
Optional. The field description. The maximum length is 1,024 characters. |
max_length |
Optional. Maximum length of values of this field for STRINGS or BYTES. If max_length is not specified, no maximum length constraint is imposed on this field. If type = "STRING", then max_length represents the maximum UTF-8 length of strings in this field. If type = "BYTES", then max_length represents the maximum number of bytes in this field. It is invalid to set this field if type is not "STRING" or "BYTES". |
precision |
Optional. Precision (maximum number of total digits in base 10) and scale (maximum number of digits in the fractional part in base 10) constraints for values of this field for NUMERIC or BIGNUMERIC. It is invalid to set precision or scale if type is not "NUMERIC" or "BIGNUMERIC". If precision and scale are not specified, no value range constraint is imposed on this field insofar as values are permitted by the type. Values of this NUMERIC or BIGNUMERIC field must be in this range when:
Acceptable values for precision and scale if both are specified:
Acceptable values for precision if only precision is specified but not scale (and thus scale is interpreted to be equal to zero):
If scale is specified but not precision, then it is invalid. |
scale |
Optional. See documentation for precision. |
Mode
Enums | |
---|---|
MODE_UNSPECIFIED |
Illegal value |
NULLABLE |
|
REQUIRED |
|
REPEATED |
Type
Enums | |
---|---|
TYPE_UNSPECIFIED |
Illegal value |
STRING |
64K, UTF8 |
INT64 |
64-bit signed |
DOUBLE |
64-bit IEEE floating point |
STRUCT |
Aggregate type |
BYTES |
64K, Binary |
BOOL |
2-valued |
TIMESTAMP |
64-bit signed usec since UTC epoch |
DATE |
Civil date - Year, Month, Day |
TIME |
Civil time - Hour, Minute, Second, Microseconds |
DATETIME |
Combination of civil date and civil time |
GEOGRAPHY |
Geography object |
NUMERIC |
Numeric value |
BIGNUMERIC |
BigNumeric value |
INTERVAL |
Interval |
JSON |
JSON, String |
TableSchema
Schema of a table.
Fields | |
---|---|
fields[] |
Describes the fields in a table. |
ThrottleState
Information on if the current connection is being throttled.
Fields | |
---|---|
throttle_percent |
How much this connection is being throttled. Zero means no throttling, 100 means fully throttled. |
WriteStream
Information about a single stream that gets data inside the storage system.
Fields | |
---|---|
name |
Output only. Name of the stream, in the form |
type |
Immutable. Type of the stream. |
create_time |
Output only. Create time of the stream. For the _default stream, this is the creation_time of the table. |
commit_time |
Output only. Commit time of the stream. If a stream is of |
table_schema |
Output only. The schema of the destination table. It is only returned in |
Type
Type enum of the stream.
Enums | |
---|---|
TYPE_UNSPECIFIED |
Unknown type. |
COMMITTED |
Data will commit automatically and appear as soon as the write is acknowledged. |
PENDING |
Data is invisible until the stream is committed. |
BUFFERED |
Data is only visible up to the offset to which it was flushed. |