Class RowIterator (3.38.0)

RowIterator(
    client,
    api_request,
    path,
    schema,
    page_token=None,
    max_results=None,
    page_size=None,
    extra_params=None,
    table=None,
    selected_fields=None,
    total_rows=None,
    first_page_response=None,
    location: typing.Optional[str] = None,
    job_id: typing.Optional[str] = None,
    query_id: typing.Optional[str] = None,
    project: typing.Optional[str] = None,
    num_dml_affected_rows: typing.Optional[int] = None,
    query: typing.Optional[str] = None,
    total_bytes_processed: typing.Optional[int] = None,
    slot_millis: typing.Optional[int] = None,
    created: typing.Optional[datetime.datetime] = None,
    started: typing.Optional[datetime.datetime] = None,
    ended: typing.Optional[datetime.datetime] = None,
)

A class for iterating through HTTP/JSON API row list responses.

Parameters
Name	Description
`query`	`Optional[str]` The query text used.
`total_bytes_processed`	`Optional[int]` If representing query results, the total bytes processed by the associated query.
`slot_millis`	`Optional[int]` If representing query results, the number of slot ms billed for the associated query.
`created`	`Optional[datetime.datetime]` If representing query results, the creation time of the associated query.
`started`	`Optional[datetime.datetime]` If representing query results, the start time of the associated query.
`ended`	`Optional[datetime.datetime]` If representing query results, the end time of the associated query.
`client`	`Optional[google.cloud.bigquery.Client]` The API client instance. This should always be non-`None`, except for subclasses that do not use it, namely the `_EmptyRowIterator`.
`api_request`	`Callable[google.cloud._http.JSONConnection.api_request]` The function to use to make API requests.
`path`	`str` The method path to query for the list of items.
`schema`	`Sequence[Union[ SchemaField, Mapping[str, Any] ]]` The table's schema. If any item is a mapping, its content must be compatible with from_api_repr.
`page_token`	`str` A token identifying a page in a result set to start fetching results from.
`max_results`	`Optional[int]` The maximum number of results to fetch.
`page_size`	`Optional[int]` The maximum number of rows in each page of results from this request. Non-positive values are ignored. Defaults to a sensible value set by the API.
`extra_params`	`Optional[Dict[str, object]]` Extra query string parameters for the API call.
`table`	`Optional[Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, ]]` The table which these rows belong to, or a reference to it. Used to call the BigQuery Storage API to fetch rows.
`selected_fields`	`Optional[Sequence[google.cloud.bigquery.schema.SchemaField]]` A subset of columns to select from this table.
`total_rows`	`Optional[int]` Total number of rows in the table.
`first_page_response`	`Optional[dict]` API response for the first page of results. These are returned when the first page is requested.

Properties

created

If representing query results, the creation time of the associated query.

ended

If representing query results, the end time of the associated query.

job_id

ID of the query job (if applicable).

To get the job metadata, call job = client.get_job(rows.job_id, location=rows.location).

location

Location where the query executed (if applicable).

See: https://cloud.google.com/bigquery/docs/locations

num_dml_affected_rows

If this RowIterator is the result of a DML query, the number of rows that were affected.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#body.QueryResponse.FIELDS.num_dml_affected_rows

project

GCP Project ID where these rows are read from.

query

The query text used.

query_id

[Preview] ID of a completed query.

This ID is auto-generated and not guaranteed to be populated.

schema

List[google.cloud.bigquery.schema.SchemaField]: The subset of columns to be read from the table.

slot_millis

Number of slot ms the user is actually billed for.

started

If representing query results, the start time of the associated query.

total_bytes_processed

total bytes processed from job statistics, if present.

total_rows

int: The total number of rows in the table or query results.

Methods

to_arrow

to_arrow(
    progress_bar_type: typing.Optional[str] = None,
    bqstorage_client: typing.Optional[bigquery_storage.BigQueryReadClient] = None,
    create_bqstorage_client: bool = True,
) -> pyarrow.Table

[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.

Parameters
Name	Description
`progress_bar_type`	`Optional[str]` If set, use the `tqdm https://tqdm.github.io/`_ library to display a progress bar while the data downloads. Install the `tqdm` package to use this feature. Possible values of `progress_bar_type` include: `None` No progress bar. `'tqdm'` Use the `tqdm.tqdm` function to print a progress bar to :data:`sys.stdout`. `'tqdm_notebook'` Use the `tqdm.notebook.tqdm` function to display a progress bar as a Jupyter notebook widget. `'tqdm_gui'` Use the `tqdm.tqdm_gui` function to display a progress bar as a graphical dialog box.
`bqstorage_client`	`Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]` A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API. This method requires `google-cloud-bigquery-storage` library. This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
`create_bqstorage_client`	`Optional[bool]` If `True` (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the `bqstorage_client` parameter for more information. This argument does nothing if `bqstorage_client` is supplied. .. versionadded:: 1.24.0

Exceptions
Type	Description
`ValueError`	If the `pyarrow` library cannot be imported. .. versionadded:: 1.17.0

to_arrow_iterable

to_arrow_iterable(bqstorage_client: typing.Optional[bigquery_storage.BigQueryReadClient] = None, max_queue_size: int = <object object>, max_stream_count: typing.Optional[int] = None) -> typing.Iterator[pyarrow.RecordBatch]

[Beta] Create an iterable of class:pyarrow.RecordBatch, to process the table as a stream.

Parameters
Name	Description
`bqstorage_client`	`Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]` A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This method requires the `pyarrow` and `google-cloud-bigquery-storage` libraries. This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
`max_queue_size`	`Optional[int]` The maximum number of result pages to hold in the internal queue when streaming query results over the BigQuery Storage API. Ignored if Storage API is not used. By default, the max queue size is set to the number of BQ Storage streams created by the server. If `max_queue_size` is :data:`None`, the queue size is infinite.
`max_stream_count`	`Optional[int]` The maximum number of parallel download streams when using BigQuery Storage API. Ignored if BigQuery Storage API is not used. This setting also has no effect if the query result is deterministically ordered with ORDER BY, in which case, the number of download stream is always 1. If set to 0 or None (the default), the number of download streams is determined by BigQuery the server. However, this behaviour can require a lot of memory to store temporary download result, especially with very large queries. In that case, setting this parameter value to a value > 0 can help reduce system resource consumption.

Returns
Type	Description
`pyarrow.RecordBatch .. versionadded:: 2.31.0`	A generator of `pyarrow.RecordBatch`.

to_dataframe

to_dataframe(
    bqstorage_client: typing.Optional[bigquery_storage.BigQueryReadClient] = None,
    dtypes: typing.Optional[typing.Dict[str, typing.Any]] = None,
    progress_bar_type: typing.Optional[str] = None,
    create_bqstorage_client: bool = True,
    geography_as_object: bool = False,
    bool_dtype: typing.Optional[typing.Any] = DefaultPandasDTypes.BOOL_DTYPE,
    int_dtype: typing.Optional[typing.Any] = DefaultPandasDTypes.INT_DTYPE,
    float_dtype: typing.Optional[typing.Any] = None,
    string_dtype: typing.Optional[typing.Any] = None,
    date_dtype: typing.Optional[typing.Any] = DefaultPandasDTypes.DATE_DTYPE,
    datetime_dtype: typing.Optional[typing.Any] = None,
    time_dtype: typing.Optional[typing.Any] = DefaultPandasDTypes.TIME_DTYPE,
    timestamp_dtype: typing.Optional[typing.Any] = None,
    range_date_dtype: typing.Optional[
        typing.Any
    ] = DefaultPandasDTypes.RANGE_DATE_DTYPE,
    range_datetime_dtype: typing.Optional[
        typing.Any
    ] = DefaultPandasDTypes.RANGE_DATETIME_DTYPE,
    range_timestamp_dtype: typing.Optional[
        typing.Any
    ] = DefaultPandasDTypes.RANGE_TIMESTAMP_DTYPE,
) -> pandas.DataFrame

Create a pandas DataFrame by loading all pages of a query.

Parameters
Name	Description
`bqstorage_client`	`Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]` A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This method requires `google-cloud-bigquery-storage` library. This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
`dtypes`	`Optional[Map[str, Union[str, pandas.Series.dtype]]]` A dictionary of column names pandas `dtype`s. The provided `dtype` is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.
`progress_bar_type`	`Optional[str]` If set, use the `tqdm https://tqdm.github.io/`_ library to display a progress bar while the data downloads. Install the `tqdm` package to use this feature. Possible values of `progress_bar_type` include: `None` No progress bar. `'tqdm'` Use the `tqdm.tqdm` function to print a progress bar to :data:`sys.stdout`. `'tqdm_notebook'` Use the `tqdm.notebook.tqdm` function to display a progress bar as a Jupyter notebook widget. `'tqdm_gui'` Use the `tqdm.tqdm_gui` function to display a progress bar as a graphical dialog box. .. versionadded:: 1.11.0
`create_bqstorage_client`	`Optional[bool]` If `True` (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the `bqstorage_client` parameter for more information. This argument does nothing if `bqstorage_client` is supplied. .. versionadded:: 1.24.0
`geography_as_object`	`Optional[bool]` If `True`, convert GEOGRAPHY data to `shapely` geometry objects. If `False` (default), don't cast geography data to `shapely` geometry objects. .. versionadded:: 2.24.0
`bool_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype (e.g. `pandas.BooleanDtype()`) to convert BigQuery Boolean type, instead of relying on the default `pandas.BooleanDtype()`. If you explicitly set the value to `None`, then the data type will be `numpy.dtype("bool")`. BigQuery Boolean type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#boolean_type .. versionadded:: 3.8.0
`int_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype (e.g. `pandas.Int64Dtype()`) to convert BigQuery Integer types, instead of relying on the default `pandas.Int64Dtype()`. If you explicitly set the value to `None`, then the data type will be `numpy.dtype("int64")`. A list of BigQuery Integer types can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#integer_types .. versionadded:: 3.8.0
`float_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype (e.g. `pandas.Float32Dtype()`) to convert BigQuery Float type, instead of relying on the default `numpy.dtype("float64")`. If you explicitly set the value to `None`, then the data type will be `numpy.dtype("float64")`. BigQuery Float type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#floating_point_types .. versionadded:: 3.8.0
`string_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype (e.g. `pandas.StringDtype()`) to convert BigQuery String type, instead of relying on the default `numpy.dtype("object")`. If you explicitly set the value to `None`, then the data type will be `numpy.dtype("object")`. BigQuery String type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_type .. versionadded:: 3.8.0
`date_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype (e.g. `pandas.ArrowDtype(pyarrow.date32())`) to convert BigQuery Date type, instead of relying on the default `db_dtypes.DateDtype()`. If you explicitly set the value to `None`, then the data type will be `numpy.dtype("datetime64[ns]")` or `object` if out of bound. BigQuery Date type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#date_type .. versionadded:: 3.10.0
`datetime_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype (e.g. `pandas.ArrowDtype(pyarrow.timestamp("us"))`) to convert BigQuery Datetime type, instead of relying on the default `numpy.dtype("datetime64[ns]`. If you explicitly set the value to `None`, then the data type will be `numpy.dtype("datetime64[ns]")` or `object` if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#datetime_type .. versionadded:: 3.10.0
`time_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype (e.g. `pandas.ArrowDtype(pyarrow.time64("us"))`) to convert BigQuery Time type, instead of relying on the default `db_dtypes.TimeDtype()`. If you explicitly set the value to `None`, then the data type will be `numpy.dtype("object")`. BigQuery Time type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time_type .. versionadded:: 3.10.0
`timestamp_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype (e.g. `pandas.ArrowDtype(pyarrow.timestamp("us", tz="UTC"))`) to convert BigQuery Timestamp type, instead of relying on the default `numpy.dtype("datetime64[ns, UTC]")`. If you explicitly set the value to `None`, then the data type will be `numpy.dtype("datetime64[ns, UTC]")` or `object` if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type .. versionadded:: 3.10.0
`range_date_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype, such as: .. code-block:: python pandas.ArrowDtype(pyarrow.struct( [("start", pyarrow.date32()), ("end", pyarrow.date32())] )) to convert BigQuery RANGE
`range_datetime_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype, such as: .. code-block:: python pandas.ArrowDtype(pyarrow.struct( [ ("start", pyarrow.timestamp("us")), ("end", pyarrow.timestamp("us")), ] )) to convert BigQuery RANGE
`range_timestamp_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype, such as: .. code-block:: python pandas.ArrowDtype(pyarrow.struct( [ ("start", pyarrow.timestamp("us", tz="UTC")), ("end", pyarrow.timestamp("us", tz="UTC")), ] )) to convert BigQuery RANGE

Exceptions
Type	Description
`ValueError`	If the `pandas` library cannot be imported, or the bigquery_storage_v1 module is required but cannot be imported. Also if `geography_as_object` is `True`, but the `shapely` library cannot be imported. Also if `bool_dtype`, `int_dtype` or other dtype parameters is not supported dtype.

Returns
Type	Description
`pandas.DataFrame`	A `pandas.DataFrame` populated with row data and column headers from the query results. The column headers are derived from the destination table's schema.

to_dataframe_iterable

to_dataframe_iterable(bqstorage_client: typing.Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: typing.Optional[typing.Dict[str, typing.Any]] = None, max_queue_size: int = <object object>, max_stream_count: typing.Optional[int] = None) -> pandas.DataFrame

Create an iterable of pandas DataFrames, to process the table as a stream.

Parameters
Name	Description
`bqstorage_client`	`Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]` A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This method requires `google-cloud-bigquery-storage` library. This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
`dtypes`	`Optional[Map[str, Union[str, pandas.Series.dtype]]]` A dictionary of column names pandas `dtype`s. The provided `dtype` is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.
`max_queue_size`	`Optional[int]` The maximum number of result pages to hold in the internal queue when streaming query results over the BigQuery Storage API. Ignored if Storage API is not used. By default, the max queue size is set to the number of BQ Storage streams created by the server. If `max_queue_size` is :data:`None`, the queue size is infinite. .. versionadded:: 2.14.0
`max_stream_count`	`Optional[int]` The maximum number of parallel download streams when using BigQuery Storage API. Ignored if BigQuery Storage API is not used. This setting also has no effect if the query result is deterministically ordered with ORDER BY, in which case, the number of download stream is always 1. If set to 0 or None (the default), the number of download streams is determined by BigQuery the server. However, this behaviour can require a lot of memory to store temporary download result, especially with very large queries. In that case, setting this parameter value to a value > 0 can help reduce system resource consumption.

Exceptions
Type	Description
`ValueError`	If the `pandas` library cannot be imported.

Returns
Type	Description
`pandas.DataFrame`	A generator of `pandas.DataFrame`.

to_geodataframe

to_geodataframe(
    bqstorage_client: typing.Optional[bigquery_storage.BigQueryReadClient] = None,
    dtypes: typing.Optional[typing.Dict[str, typing.Any]] = None,
    progress_bar_type: typing.Optional[str] = None,
    create_bqstorage_client: bool = True,
    geography_column: typing.Optional[str] = None,
    bool_dtype: typing.Optional[typing.Any] = DefaultPandasDTypes.BOOL_DTYPE,
    int_dtype: typing.Optional[typing.Any] = DefaultPandasDTypes.INT_DTYPE,
    float_dtype: typing.Optional[typing.Any] = None,
    string_dtype: typing.Optional[typing.Any] = None,
) -> geopandas.GeoDataFrame

Create a GeoPandas GeoDataFrame by loading all pages of a query.

Parameters
Name	Description
`bqstorage_client`	`Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]` A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This method requires the `pyarrow` and `google-cloud-bigquery-storage` libraries. This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.
`dtypes`	`Optional[Map[str, Union[str, pandas.Series.dtype]]]` A dictionary of column names pandas `dtype`s. The provided `dtype` is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.
`progress_bar_type`	`Optional[str]` If set, use the `tqdm https://tqdm.github.io/`_ library to display a progress bar while the data downloads. Install the `tqdm` package to use this feature. Possible values of `progress_bar_type` include: `None` No progress bar. `'tqdm'` Use the `tqdm.tqdm` function to print a progress bar to :data:`sys.stdout`. `'tqdm_notebook'` Use the `tqdm.notebook.tqdm` function to display a progress bar as a Jupyter notebook widget. `'tqdm_gui'` Use the `tqdm.tqdm_gui` function to display a progress bar as a graphical dialog box.
`create_bqstorage_client`	`Optional[bool]` If `True` (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the `bqstorage_client` parameter for more information. This argument does nothing if `bqstorage_client` is supplied.
`geography_column`	`Optional[str]` If there are more than one GEOGRAPHY column, identifies which one to use to construct a geopandas GeoDataFrame. This option can be ommitted if there's only one GEOGRAPHY column.
`bool_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype (e.g. `pandas.BooleanDtype()`) to convert BigQuery Boolean type, instead of relying on the default `pandas.BooleanDtype()`. If you explicitly set the value to `None`, then the data type will be `numpy.dtype("bool")`. BigQuery Boolean type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#boolean_type
`int_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype (e.g. `pandas.Int64Dtype()`) to convert BigQuery Integer types, instead of relying on the default `pandas.Int64Dtype()`. If you explicitly set the value to `None`, then the data type will be `numpy.dtype("int64")`. A list of BigQuery Integer types can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#integer_types
`float_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype (e.g. `pandas.Float32Dtype()`) to convert BigQuery Float type, instead of relying on the default `numpy.dtype("float64")`. If you explicitly set the value to `None`, then the data type will be `numpy.dtype("float64")`. BigQuery Float type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#floating_point_types
`string_dtype`	`Optional[pandas.Series.dtype, None]` If set, indicate a pandas ExtensionDtype (e.g. `pandas.StringDtype()`) to convert BigQuery String type, instead of relying on the default `numpy.dtype("object")`. If you explicitly set the value to `None`, then the data type will be `numpy.dtype("object")`. BigQuery String type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_type

Exceptions
Type	Description
`ValueError`	If the `geopandas` library cannot be imported, or the bigquery_storage_v1 module is required but cannot be imported. .. versionadded:: 2.24.0

Returns
Type	Description
`geopandas.GeoDataFrame`	A `geopandas.GeoDataFrame` populated with row data and column headers from the query results. The column headers are derived from the destination table's schema.