Class RowIterator (3.22.0)

RowIterator(
    client,
    api_request,
    path,
    schema,
    page_token=None,
    max_results=None,
    page_size=None,
    extra_params=None,
    table=None,
    selected_fields=None,
    total_rows=None,
    first_page_response=None,
    location: typing.Optional[str] = None,
    job_id: typing.Optional[str] = None,
    query_id: typing.Optional[str] = None,
    project: typing.Optional[str] = None,
    num_dml_affected_rows: typing.Optional[int] = None,
)

A class for iterating through HTTP/JSON API row list responses.

Parameters

Name Description
client Optional[google.cloud.bigquery.Client]

The API client instance. This should always be non-None, except for subclasses that do not use it, namely the _EmptyRowIterator.

api_request Callable[google.cloud._http.JSONConnection.api_request]

The function to use to make API requests.

path str

The method path to query for the list of items.

schema Sequence[Union[ SchemaField, Mapping[str, Any] ]]

The table's schema. If any item is a mapping, its content must be compatible with from_api_repr.

page_token str

A token identifying a page in a result set to start fetching results from.

max_results Optional[int]

The maximum number of results to fetch.

page_size Optional[int]

The maximum number of rows in each page of results from this request. Non-positive values are ignored. Defaults to a sensible value set by the API.

extra_params Optional[Dict[str, object]]

Extra query string parameters for the API call.

table Optional[Union[ google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, ]]

The table which these rows belong to, or a reference to it. Used to call the BigQuery Storage API to fetch rows.

selected_fields Optional[Sequence[google.cloud.bigquery.schema.SchemaField]]

A subset of columns to select from this table.

total_rows Optional[int]

Total number of rows in the table.

first_page_response Optional[dict]

API response for the first page of results. These are returned when the first page is requested.

Properties

job_id

ID of the query job (if applicable).

To get the job metadata, call job = client.get_job(rows.job_id, location=rows.location).

location

Location where the query executed (if applicable).

See: https://cloud.google.com/bigquery/docs/locations

num_dml_affected_rows

If this RowIterator is the result of a DML query, the number of rows that were affected.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#body.QueryResponse.FIELDS.num_dml_affected_rows

pages

Iterator of pages in the response.

Exceptions
Type Description
ValueError If the iterator has already been started.
Returns
Type Description
types.GeneratorType[google.api_core.page_iterator.Page] A generator of page instances.

project

GCP Project ID where these rows are read from.

query_id

[Preview] ID of a completed query.

This ID is auto-generated and not guaranteed to be populated.

schema

List[google.cloud.bigquery.schema.SchemaField]: The subset of columns to be read from the table.

total_rows

int: The total number of rows in the table or query results.

Methods

__iter__

__iter__()

Iterator for each item returned.

Exceptions
Type Description
ValueError If the iterator has already been started.
Returns
Type Description
types.GeneratorType[Any] A generator of items from the API.

to_arrow

to_arrow(
    progress_bar_type: typing.Optional[str] = None,
    bqstorage_client: typing.Optional[bigquery_storage.BigQueryReadClient] = None,
    create_bqstorage_client: bool = True,
) -> pyarrow.Table

[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.

Parameters
Name Description
progress_bar_type Optional[str]

If set, use the tqdm https://tqdm.github.io/_ library to display a progress bar while the data downloads. Install the tqdm package to use this feature. Possible values of progress_bar_type include: None No progress bar. 'tqdm' Use the tqdm.tqdm function to print a progress bar to :data:sys.stdout. 'tqdm_notebook' Use the tqdm.notebook.tqdm function to display a progress bar as a Jupyter notebook widget. 'tqdm_gui' Use the tqdm.tqdm_gui function to display a progress bar as a graphical dialog box.

bqstorage_client Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]

A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API. This method requires google-cloud-bigquery-storage library. This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.

create_bqstorage_client Optional[bool]

If True (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the bqstorage_client parameter for more information. This argument does nothing if bqstorage_client is supplied. .. versionadded:: 1.24.0

Exceptions
Type Description
ValueError If the pyarrow library cannot be imported. .. versionadded:: 1.17.0

to_arrow_iterable

to_arrow_iterable(bqstorage_client: typing.Optional[bigquery_storage.BigQueryReadClient] = None, max_queue_size: int = <object object>) -> typing.Iterator[pyarrow.RecordBatch]

[Beta] Create an iterable of class:pyarrow.RecordBatch, to process the table as a stream.

Parameters
Name Description
bqstorage_client Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]

A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This method requires the pyarrow and google-cloud-bigquery-storage libraries. This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.

max_queue_size Optional[int]

The maximum number of result pages to hold in the internal queue when streaming query results over the BigQuery Storage API. Ignored if Storage API is not used. By default, the max queue size is set to the number of BQ Storage streams created by the server. If max_queue_size is :data:None, the queue size is infinite.

Returns
Type Description
pyarrow.RecordBatch .. versionadded:: 2.31.0 A generator of pyarrow.RecordBatch.

to_dataframe

to_dataframe(
    bqstorage_client: typing.Optional[bigquery_storage.BigQueryReadClient] = None,
    dtypes: typing.Optional[typing.Dict[str, typing.Any]] = None,
    progress_bar_type: typing.Optional[str] = None,
    create_bqstorage_client: bool = True,
    geography_as_object: bool = False,
    bool_dtype: typing.Optional[typing.Any] = DefaultPandasDTypes.BOOL_DTYPE,
    int_dtype: typing.Optional[typing.Any] = DefaultPandasDTypes.INT_DTYPE,
    float_dtype: typing.Optional[typing.Any] = None,
    string_dtype: typing.Optional[typing.Any] = None,
    date_dtype: typing.Optional[typing.Any] = DefaultPandasDTypes.DATE_DTYPE,
    datetime_dtype: typing.Optional[typing.Any] = None,
    time_dtype: typing.Optional[typing.Any] = DefaultPandasDTypes.TIME_DTYPE,
    timestamp_dtype: typing.Optional[typing.Any] = None,
    range_date_dtype: typing.Optional[
        typing.Any
    ] = DefaultPandasDTypes.RANGE_DATE_DTYPE,
    range_datetime_dtype: typing.Optional[
        typing.Any
    ] = DefaultPandasDTypes.RANGE_DATETIME_DTYPE,
    range_timestamp_dtype: typing.Optional[
        typing.Any
    ] = DefaultPandasDTypes.RANGE_TIMESTAMP_DTYPE,
) -> pandas.DataFrame

Create a pandas DataFrame by loading all pages of a query.

Parameters
Name Description
bqstorage_client Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]

A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This method requires google-cloud-bigquery-storage library. This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.

dtypes Optional[Map[str, Union[str, pandas.Series.dtype]]]

A dictionary of column names pandas dtypes. The provided dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

progress_bar_type Optional[str]

If set, use the tqdm https://tqdm.github.io/_ library to display a progress bar while the data downloads. Install the tqdm package to use this feature. Possible values of progress_bar_type include: None No progress bar. 'tqdm' Use the tqdm.tqdm function to print a progress bar to :data:sys.stdout. 'tqdm_notebook' Use the tqdm.notebook.tqdm function to display a progress bar as a Jupyter notebook widget. 'tqdm_gui' Use the tqdm.tqdm_gui function to display a progress bar as a graphical dialog box. .. versionadded:: 1.11.0

create_bqstorage_client Optional[bool]

If True (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the bqstorage_client parameter for more information. This argument does nothing if bqstorage_client is supplied. .. versionadded:: 1.24.0

geography_as_object Optional[bool]

If True, convert GEOGRAPHY data to shapely geometry objects. If False (default), don't cast geography data to shapely geometry objects. .. versionadded:: 2.24.0

bool_dtype Optional[pandas.Series.dtype, None]

If set, indicate a pandas ExtensionDtype (e.g. pandas.BooleanDtype()) to convert BigQuery Boolean type, instead of relying on the default pandas.BooleanDtype(). If you explicitly set the value to None, then the data type will be numpy.dtype("bool"). BigQuery Boolean type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#boolean_type .. versionadded:: 3.8.0

int_dtype Optional[pandas.Series.dtype, None]

If set, indicate a pandas ExtensionDtype (e.g. pandas.Int64Dtype()) to convert BigQuery Integer types, instead of relying on the default pandas.Int64Dtype(). If you explicitly set the value to None, then the data type will be numpy.dtype("int64"). A list of BigQuery Integer types can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#integer_types .. versionadded:: 3.8.0

float_dtype Optional[pandas.Series.dtype, None]

If set, indicate a pandas ExtensionDtype (e.g. pandas.Float32Dtype()) to convert BigQuery Float type, instead of relying on the default numpy.dtype("float64"). If you explicitly set the value to None, then the data type will be numpy.dtype("float64"). BigQuery Float type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#floating_point_types .. versionadded:: 3.8.0

string_dtype Optional[pandas.Series.dtype, None]

If set, indicate a pandas ExtensionDtype (e.g. pandas.StringDtype()) to convert BigQuery String type, instead of relying on the default numpy.dtype("object"). If you explicitly set the value to None, then the data type will be numpy.dtype("object"). BigQuery String type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_type .. versionadded:: 3.8.0

date_dtype Optional[pandas.Series.dtype, None]

If set, indicate a pandas ExtensionDtype (e.g. pandas.ArrowDtype(pyarrow.date32())) to convert BigQuery Date type, instead of relying on the default db_dtypes.DateDtype(). If you explicitly set the value to None, then the data type will be numpy.dtype("datetime64[ns]") or object if out of bound. BigQuery Date type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#date_type .. versionadded:: 3.10.0

datetime_dtype Optional[pandas.Series.dtype, None]

If set, indicate a pandas ExtensionDtype (e.g. pandas.ArrowDtype(pyarrow.timestamp("us"))) to convert BigQuery Datetime type, instead of relying on the default numpy.dtype("datetime64[ns]. If you explicitly set the value to None, then the data type will be numpy.dtype("datetime64[ns]") or object if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#datetime_type .. versionadded:: 3.10.0

time_dtype Optional[pandas.Series.dtype, None]

If set, indicate a pandas ExtensionDtype (e.g. pandas.ArrowDtype(pyarrow.time64("us"))) to convert BigQuery Time type, instead of relying on the default db_dtypes.TimeDtype(). If you explicitly set the value to None, then the data type will be numpy.dtype("object"). BigQuery Time type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time_type .. versionadded:: 3.10.0

timestamp_dtype Optional[pandas.Series.dtype, None]

If set, indicate a pandas ExtensionDtype (e.g. pandas.ArrowDtype(pyarrow.timestamp("us", tz="UTC"))) to convert BigQuery Timestamp type, instead of relying on the default numpy.dtype("datetime64[ns, UTC]"). If you explicitly set the value to None, then the data type will be numpy.dtype("datetime64[ns, UTC]") or object if out of bound. BigQuery Datetime type can be found at: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type .. versionadded:: 3.10.0

range_date_dtype Optional[pandas.Series.dtype, None]

If set, indicate a pandas ExtensionDtype, such as: .. code-block:: python pandas.ArrowDtype(pyarrow.struct( [("start", pyarrow.date32()), ("end", pyarrow.date32())] )) to convert BigQuery RANGE

range_datetime_dtype Optional[pandas.Series.dtype, None]

If set, indicate a pandas ExtensionDtype, such as: .. code-block:: python pandas.ArrowDtype(pyarrow.struct( [ ("start", pyarrow.timestamp("us")), ("end", pyarrow.timestamp("us")), ] )) to convert BigQuery RANGE

range_timestamp_dtype Optional[pandas.Series.dtype, None]

If set, indicate a pandas ExtensionDtype, such as: .. code-block:: python pandas.ArrowDtype(pyarrow.struct( [ ("start", pyarrow.timestamp("us", tz="UTC")), ("end", pyarrow.timestamp("us", tz="UTC")), ] )) to convert BigQuery RANGE

Exceptions
Type Description
ValueError If the pandas library cannot be imported, or the bigquery_storage_v1 module is required but cannot be imported. Also if geography_as_object is True, but the shapely library cannot be imported. Also if bool_dtype, int_dtype or other dtype parameters is not supported dtype.
Returns
Type Description
pandas.DataFrame A pandas.DataFrame populated with row data and column headers from the query results. The column headers are derived from the destination table's schema.

to_dataframe_iterable

to_dataframe_iterable(bqstorage_client: typing.Optional[bigquery_storage.BigQueryReadClient] = None, dtypes: typing.Optional[typing.Dict[str, typing.Any]] = None, max_queue_size: int = <object object>) -> pandas.DataFrame

Create an iterable of pandas DataFrames, to process the table as a stream.

Parameters
Name Description
bqstorage_client Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]

A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This method requires google-cloud-bigquery-storage library. This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.

dtypes Optional[Map[str, Union[str, pandas.Series.dtype]]]

A dictionary of column names pandas dtypes. The provided dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

max_queue_size Optional[int]

The maximum number of result pages to hold in the internal queue when streaming query results over the BigQuery Storage API. Ignored if Storage API is not used. By default, the max queue size is set to the number of BQ Storage streams created by the server. If max_queue_size is :data:None, the queue size is infinite. .. versionadded:: 2.14.0

Exceptions
Type Description
ValueError If the pandas library cannot be imported.
Returns
Type Description
pandas.DataFrame A generator of pandas.DataFrame.

to_geodataframe

to_geodataframe(
    bqstorage_client: typing.Optional[bigquery_storage.BigQueryReadClient] = None,
    dtypes: typing.Optional[typing.Dict[str, typing.Any]] = None,
    progress_bar_type: typing.Optional[str] = None,
    create_bqstorage_client: bool = True,
    geography_column: typing.Optional[str] = None,
) -> geopandas.GeoDataFrame

Create a GeoPandas GeoDataFrame by loading all pages of a query.

Parameters
Name Description
bqstorage_client Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]

A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This method requires the pyarrow and google-cloud-bigquery-storage libraries. This method only exposes a subset of the capabilities of the BigQuery Storage API. For full access to all features (projections, filters, snapshots) use the Storage API directly.

dtypes Optional[Map[str, Union[str, pandas.Series.dtype]]]

A dictionary of column names pandas dtypes. The provided dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

progress_bar_type Optional[str]

If set, use the tqdm https://tqdm.github.io/_ library to display a progress bar while the data downloads. Install the tqdm package to use this feature. Possible values of progress_bar_type include: None No progress bar. 'tqdm' Use the tqdm.tqdm function to print a progress bar to :data:sys.stdout. 'tqdm_notebook' Use the tqdm.notebook.tqdm function to display a progress bar as a Jupyter notebook widget. 'tqdm_gui' Use the tqdm.tqdm_gui function to display a progress bar as a graphical dialog box.

create_bqstorage_client Optional[bool]

If True (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the bqstorage_client parameter for more information. This argument does nothing if bqstorage_client is supplied.

geography_column Optional[str]

If there are more than one GEOGRAPHY column, identifies which one to use to construct a geopandas GeoDataFrame. This option can be ommitted if there's only one GEOGRAPHY column.

Exceptions
Type Description
ValueError If the geopandas library cannot be imported, or the bigquery_storage_v1 module is required but cannot be imported. .. versionadded:: 2.24.0
Returns
Type Description
geopandas.GeoDataFrame A geopandas.GeoDataFrame populated with row data and column headers from the query results. The column headers are derived from the destination table's schema.