Index
BigQueryRead
(interface)ArrowRecordBatch
(message)ArrowSchema
(message)AvroRows
(message)AvroSchema
(message)CreateReadSessionRequest
(message)DataFormat
(enum)ReadRowsRequest
(message)ReadRowsResponse
(message)ReadSession
(message)ReadSession.TableModifiers
(message)ReadSession.TableReadOptions
(message)ReadStream
(message)SplitReadStreamRequest
(message)SplitReadStreamResponse
(message)StreamStats
(message)StreamStats.Progress
(message)ThrottleState
(message)
BigQueryRead
BigQuery Read API.
The Read API can be used to read data from BigQuery.
CreateReadSession | |
---|---|
Creates a new read session. A read session divides the contents of a BigQuery table into one or more streams, which can then be used to read data from the table. The read session also specifies properties of the data to be read, such as a list of columns or a push-down filter describing the rows to be returned. A particular row can be read by at most one stream. When the caller has reached the end of each stream in the session, then all the data in the table has been read. Data is assigned to each stream such that roughly the same number of rows can be read from each stream. Because the server-side unit for assigning data is collections of rows, the API does not guarantee that each stream will return the same number or rows. Additionally, the limits are enforced based on the number of pre-filtered rows, so some filters can lead to lopsided assignments. Read sessions automatically expire 24 hours after they are created and do not require manual clean-up by the caller. |
ReadRows | |
---|---|
Reads rows from the stream in the format prescribed by the ReadSession. Each response contains one or more table rows, up to a maximum of 100 MiB per response; read requests which attempt to read individual rows larger than 100 MiB will fail. Each request also returns a set of stream statistics reflecting the current state of the stream. |
SplitReadStream | |
---|---|
Splits a given Moreover, the two child streams will be allocated back-to-back in the original |
ArrowRecordBatch
Arrow RecordBatch.
Fields | |
---|---|
serialized_record_batch |
IPC-serialized Arrow RecordBatch. |
row_count |
The count of rows in |
ArrowSchema
Arrow schema as specified in https://arrow.apache.org/docs/python/api/datatypes.html and serialized to bytes using IPC: https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc
See code samples on how this message can be deserialized.
Fields | |
---|---|
serialized_schema |
IPC serialized Arrow schema. |
AvroRows
Avro rows.
Fields | |
---|---|
serialized_binary_rows |
Binary serialized rows in a block. |
row_count |
The count of rows in the returning block. |
AvroSchema
Avro schema.
Fields | |
---|---|
schema |
Json serialized schema, as described at https://avro.apache.org/docs/1.8.1/spec.html. |
CreateReadSessionRequest
Request message for CreateReadSession
.
Fields | |
---|---|
parent |
Required. The request project that owns the session, in the form of Authorization requires the following IAM permission on the specified resource
|
read_session |
Required. Session to be created. Authorization requires the following IAM permission on the specified resource
|
max_stream_count |
Max initial number of streams. If unset or zero, the server will provide a value of streams so as to produce reasonable throughput. Must be non-negative. The number of streams may be lower than the requested number, depending on the amount parallelism that is reasonable for the table. Error will be returned if the max count is greater than the current system max limit of 1,000. Streams must be read starting from offset 0. |
DataFormat
Data format for input or output data.
Enums | |
---|---|
DATA_FORMAT_UNSPECIFIED |
|
AVRO |
Avro is a standard open source row based file format. See https://avro.apache.org/ for more details. |
ARROW |
Arrow is a standard open source column-based message format. See https://arrow.apache.org/ for more details. |
ReadRowsRequest
Request message for ReadRows
.
Fields | |
---|---|
read_stream |
Required. Stream to read rows from. Authorization requires the following IAM permission on the specified resource
|
offset |
The offset requested must be less than the last row read from Read. Requesting a larger offset is undefined. If not specified, start reading from offset zero. |
ReadRowsResponse
Response from calling ReadRows
may include row data, progress and throttling information.
Fields | ||
---|---|---|
row_count |
Number of serialized rows in the rows block. |
|
stats |
Statistics for the stream. |
|
throttle_state |
Throttling state. If unset, the latest response still describes the current throttling status. |
|
Union field rows . Row data is returned in format specified during session creation. rows can be only one of the following: |
||
avro_rows |
Serialized row data in AVRO format. |
|
arrow_record_batch |
Serialized row data in Arrow RecordBatch format. |
ReadSession
Information about the ReadSession.
Fields | ||
---|---|---|
name |
Output only. Unique identifier for the session, in the form |
|
expire_time |
Output only. Time at which the session becomes invalid. After this time, subsequent requests to read this Session will return errors. The expire_time is automatically assigned and currently cannot be specified or updated. |
|
data_format |
Immutable. Data format of the output data. |
|
table |
Immutable. Table that this ReadSession is reading from, in the form Authorization requires one or more of the following IAM permissions on the specified resource
|
|
table_modifiers |
Optional. Any modifiers which are applied when reading from the specified table. |
|
read_options |
Optional. Read options for this session (e.g. column selection, filters). |
|
streams[] |
Output only. A list of streams created with the session. At least one stream is created with the session. In the future, larger request_stream_count values may result in this list being unpopulated, in that case, the user will need to use a List method to get the streams instead, which is not yet available. |
|
Union field schema . The schema for the read. If read_options.selected_fields is set, the schema may be different from the table schema as it will only contain the selected fields. schema can be only one of the following: |
||
avro_schema |
Output only. Avro schema. |
|
arrow_schema |
Output only. Arrow schema. |
TableModifiers
Additional attributes when reading a table.
Fields | |
---|---|
snapshot_time |
The snapshot time of the table. If not set, interpreted as now. |
TableReadOptions
Options dictating how we read a table.
Fields | |
---|---|
selected_fields[] |
Names of the fields in the table that should be read. If empty, all fields will be read. If the specified field is a nested field, all the sub-fields in the field will be selected. The output field order is unrelated to the order of fields in selected_fields. |
row_restriction |
SQL text filtering statement, similar to a WHERE clause in a query. Aggregates are not supported. Examples: "int_field > 5" "date_field = CAST('2014-9-27' as DATE)" "nullable_field is not NULL" "st_equals(geo_field, st_geofromtext("POINT(2, 2)"))" "numeric_field BETWEEN 1.0 AND 5.0" |
ReadStream
Information about a single stream that gets data out of the storage system. Most of the information about ReadStream
instances is aggregated, making ReadStream
lightweight.
Fields | |
---|---|
name |
Output only. Name of the stream, in the form |
SplitReadStreamRequest
Request message for SplitReadStream
.
Fields | |
---|---|
name |
Required. Name of the stream to split. Authorization requires the following IAM permission on the specified resource
|
fraction |
A value in the range (0.0, 1.0) that specifies the fractional point at which the original stream should be split. The actual split point is evaluated on pre-filtered rows, so if a filter is provided, then there is no guarantee that the division of the rows between the new child streams will be proportional to this fractional value. Additionally, because the server-side unit for assigning data is collections of rows, this fraction will always map to a data storage boundary on the server side. |
SplitReadStreamResponse
Fields | |
---|---|
primary_stream |
Primary stream, which contains the beginning portion of |original_stream|. An empty value indicates that the original stream can no longer be split. |
remainder_stream |
Remainder stream, which contains the tail of |original_stream|. An empty value indicates that the original stream can no longer be split. |
StreamStats
Estimated stream statistics for a given Stream.
Fields | |
---|---|
progress |
Represents the progress of the current stream. |
Progress
Fields | |
---|---|
at_response_start |
The fraction of rows assigned to the stream that have been processed by the server so far, not including the rows in the current response message. This value, along with Note that if a filter is provided, the |
at_response_end |
Similar to |
ThrottleState
Information on if the current connection is being throttled.
Fields | |
---|---|
throttle_percent |
How much this connection is being throttled. Zero means no throttling, 100 means fully throttled. |