ReadRowsStream(client, name, offset, read_rows_kwargs, retry_delay_callback=None)
A stream of results from a read rows request.
This stream is an iterable of ReadRowsResponse. Iterate over it to fetch all row messages.
If the fastavro library is installed, use the rows() method to parse all messages into a stream of row dictionaries.
If the pandas and fastavro libraries are installed, use the
to_dataframe()
method to parse all messages into a pandas.DataFrame
.
This object should not be created directly, but is returned by other methods in this library.
Inheritance
builtins.object > ReadRowsStreamMethods
ReadRowsStream
ReadRowsStream(client, name, offset, read_rows_kwargs, retry_delay_callback=None)
Construct a ReadRowsStream.
Name | Description |
client |
google.cloud.bigquery_storage_v1.services big_query_read.BigQueryReadClient
A GAPIC client used to reconnect to a ReadRows stream. This must be the GAPIC client to avoid a circular dependency on this class. |
name |
str
Required. Stream ID from which rows are being read. |
offset |
int
Required. Position in the stream to start reading from. The offset requested must be less than the last row read from ReadRows. Requesting a larger offset is undefined. |
read_rows_kwargs |
dict
Keyword arguments to use when reconnecting to a ReadRows stream. |
retry_delay_callback |
Optional[Callable[[float], None]]
If the client receives a retryable error that asks the client to delay its next attempt and retry_delay_callback is not None, ReadRowsStream will call retry_delay_callback with the delay duration (in seconds) before it starts sleeping until the next attempt. |
Type | Description |
Iterable[ google.cloud.bigquery_storage_v1.servicesReadRowsResponse ] | A sequence of row messages. |
__iter__
__iter__()
An iterable of messages.
Type | Description |
Iterable[ ReadRowsResponse ] | A sequence of row messages. |
rows
rows(read_session=None)
Iterate over all rows in the stream.
This method requires the fastavro library in order to parse row messages in avro format. For arrow format messages, the pyarrow library is required.
.. warning:: DATETIME columns are not supported. They are currently parsed as strings in the fastavro library.
Name | Description |
read_session |
Optional[ReadSession]
DEPRECATED. This argument was used to specify the schema of the rows in the stream, but now the first message in a read stream contains this information. |
Type | Description |
Iterable[Mapping] | A sequence of rows, represented as dictionaries. |
to_arrow
to_arrow(read_session=None)
Create a pyarrow.Table
of all rows in the stream.
This method requires the pyarrow library and a stream using the Arrow format.
Name | Description |
read_session |
ReadSession
DEPRECATED. This argument was used to specify the schema of the rows in the stream, but now the first message in a read stream contains this information. |
Type | Description |
pyarrow.Table | A table of all rows in the stream. |
to_dataframe
to_dataframe(read_session=None, dtypes=None)
Create a pandas.DataFrame
of all rows in the stream.
This method requires the pandas libary to create a data frame and the fastavro library to parse row messages.
.. warning:: DATETIME columns are not supported. They are currently parsed as strings.
Name | Description |
read_session |
ReadSession
DEPRECATED. This argument was used to specify the schema of the rows in the stream, but now the first message in a read stream contains this information. |
dtypes |
Map[str, Union[str, pandas.Series.dtype]]
Optional. A dictionary of column names pandas |
Type | Description |
pandas.DataFrame | A data frame of all rows in the stream. |