Class QueryJob (3.1.0)

QueryJob(job_id, query, client, job_config=None)

Asynchronous job: query tables.

Parameters

Name	Description
job_id	`str` the job's ID, within the project belonging to `client`.
query	`str` SQL query string.
client	`google.cloud.bigquery.client.Client` A client which holds credentials and project configuration for the dataset (which requires a project).
job_config	`Optional[google.cloud.bigquery.job.QueryJobConfig]` Extra configuration options for the query job.

Inheritance

builtins.object > google.api_core.future.base.Future > google.api_core.future.polling.PollingFuture > google.cloud.bigquery.job.base._AsyncJob > QueryJob

Properties

allow_large_results

See allow_large_results.

billing_tier

Return billing tier from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.billing_tier

Returns

Type	Description
Optional[int]	Billing tier used by the job, or None if job is not yet complete.

cache_hit

Return whether or not query results were served from cache.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.cache_hit

Returns

Type	Description
Optional[bool]	whether the query results were returned from cache, or None if job is not yet complete.

clustering_fields

See clustering_fields.

connection_properties

See connection_properties.

.. versionadded:: 2.29.0

create_disposition

See create_disposition.

create_session

See create_session.

.. versionadded:: 2.29.0

created

Datetime at which the job was created.

Returns

Type	Description
Optional[datetime.datetime]	the creation time (None until set from the server).

ddl_operation_performed

Optional[str]: Return the DDL operation performed.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.ddl_operation_performed

ddl_target_routine

Optional[google.cloud.bigquery.routine.RoutineReference]: Return the DDL target routine, present for CREATE/DROP FUNCTION/PROCEDURE queries.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.ddl_target_routine

ddl_target_table

Optional[google.cloud.bigquery.table.TableReference]: Return the DDL target table, present for CREATE/DROP TABLE/VIEW queries.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.ddl_target_table

Custom encryption configuration (e.g., Cloud KMS keys) or :data:None if using default encryption.

See destination_encryption_configuration.

dry_run

See dry_run.

ended

Datetime at which the job finished.

Returns

Type	Description
Optional[datetime.datetime]	the end time (None until set from the server).

error_result

Error information about the job as a whole.

Returns

Type	Description
Optional[Mapping]	the error information (None until set from the server).

errors

Information about individual errors generated by the job.

Returns

Type	Description
Optional[List[Mapping]]	the error information (None until set from the server).

estimated_bytes_processed

Return the estimated number of bytes processed by the query.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.estimated_bytes_processed

Returns

Type	Description
Optional[int]	number of DML rows affected by the job, or None if job is not yet complete.

etag

ETag for the job resource.

Returns

Type	Description
Optional[str]	the ETag (None until set from the server).

flatten_results

See flatten_results.

job_id

str: ID of the job.

job_type

Type of job.

Returns

Type	Description
str	one of 'load', 'copy', 'extract', 'query'.

labels

Dict[str, str]: Labels for the job.

location

str: Location where the job runs.

maximum_billing_tier

See maximum_billing_tier.

maximum_bytes_billed

See maximum_bytes_billed.

num_child_jobs

The number of child jobs executed.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.num_child_jobs

num_dml_affected_rows

Return the number of DML rows affected by the job.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.num_dml_affected_rows

Returns

Type	Description
Optional[int]	number of DML rows affected by the job, or None if job is not yet complete.

parent_job_id

Return the ID of the parent job.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics.FIELDS.parent_job_id

Returns

Type	Description
Optional[str]	parent job id.

path

URL path for the job's APIs.

Returns

Type	Description
str	the path based on project and job ID.

priority

See priority.

project

Project bound to the job.

Returns

Type	Description
str	the project (derived from the client).

query

str: The query text used in this query job.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.query

query_parameters

See query_parameters.

query_plan

Return query plan from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.query_plan

Returns

Type	Description
List[google.cloud.bigquery.job.QueryPlanEntry]	mappings describing the query plan, or an empty list if the query has not yet completed.

range_partitioning

See range_partitioning.

referenced_tables

Return referenced tables from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.referenced_tables

Returns

Type	Description
List[Dict]	mappings describing the query plan, or an empty list if the query has not yet completed.

reservation_usage

Job resource usage breakdown by reservation.

Returns

Type	Description
List[google.cloud.bigquery.job.ReservationUsage]	Reservation usage stats. Can be empty if not set from the server.

schema

The schema of the results.

Present only for successful dry run of non-legacy SQL queries.

schema_update_options

See schema_update_options.

script_statistics

Statistics for a child job of a script.

self_link

URL for the job resource.

Returns

Type	Description
Optional[str]	the URL (None until set from the server).

session_info

[Preview] Information of the session if this job is part of one.

.. versionadded:: 2.29.0

slot_millis

Union[int, None]: Slot-milliseconds used by this query job.

started

Datetime at which the job was started.

Returns

Type	Description
Optional[datetime.datetime]	the start time (None until set from the server).

state

Status of the job.

Returns

Type	Description
Optional[str]	the state (None until set from the server).

statement_type

Return statement type from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.statement_type

Returns

Type	Description
Optional[str]	type of statement used by the job, or None if job is not yet complete.

table_definitions

See table_definitions.

time_partitioning

See time_partitioning.

timeline

List(TimelineEntry): Return the query execution timeline from job statistics.

total_bytes_billed

Return total bytes billed from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.total_bytes_billed

Returns

Type	Description
Optional[int]	Total bytes processed by the job, or None if job is not yet complete.

total_bytes_processed

Return total bytes processed from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.total_bytes_processed

Returns

Type	Description
Optional[int]	Total bytes processed by the job, or None if job is not yet complete.

transaction_info

Information of the multi-statement transaction if this job is part of one.

Since a scripting query job can execute multiple transactions, this property is only expected on child jobs. Use the list_jobs method with the parent_job parameter to iterate over child jobs.

.. versionadded:: 2.24.0

udf_resources

See udf_resources.

undeclared_query_parameters

Return undeclared query parameters from job statistics, if present.

See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobStatistics2.FIELDS.undeclared_query_parameters

Returns

Type	Description
List[Union[ google.cloud.bigquery.query.ArrayQueryParameter, google.cloud.bigquery.query.ScalarQueryParameter, google.cloud.bigquery.query.StructQueryParameter ]]	Undeclared parameters, or an empty list if the query has not yet completed.

use_legacy_sql

See use_legacy_sql.

use_query_cache

See use_query_cache.

user_email

E-mail address of user who submitted the job.

Returns

Type	Description
Optional[str]	the URL (None until set from the server).

write_disposition

See write_disposition.

bi_engine_stats

API documentation for bigquery.job.QueryJob.bi_engine_stats property.

dml_stats

API documentation for bigquery.job.QueryJob.dml_stats property.

Methods

add_done_callback

add_done_callback(fn)

Add a callback to be executed when the operation is complete.

If the operation is not already complete, this will start a helper thread to poll for the status of the operation in the background.

Parameter

Name	Description
fn	`Callable[Future]` The callback to execute when the operation is complete.

cancel

cancel(client=None, retry: retries.Retry = <google.api_core.retry.Retry object>, timeout: float = None)

API call: cancel job via a POST request

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel

Parameters

Name	Description
timeout	`Optional[float]` The number of seconds to wait for the underlying HTTP transport before using `retry`
client	`Optional[google.cloud.bigquery.client.Client]` the client to use. If not passed, falls back to the `client` stored on the current dataset.
retry	`Optional[google.api_core.retry.Retry]` How to retry the RPC.

Returns

Type	Description
bool	Boolean indicating that the cancel request was sent.

cancelled

cancelled()

Check if the job has been cancelled.

This always returns False. It's not possible to check if a job was cancelled in the API. This method is here to satisfy the interface for google.api_core.future.Future.

Returns

Type	Description
bool	False

done

done(retry: retries.Retry = <google.api_core.retry.Retry object>, timeout: float = None, reload: bool = True)

Checks if the job is complete.

Parameters

Name	Description
timeout	`Optional[float]` The number of seconds to wait for the underlying HTTP transport before using `retry`.
reload	`Optional[bool]` If `True`, make an API call to refresh the job state of unfinished jobs before checking. Default `True`.
retry	`Optional[google.api_core.retry.Retry]` How to retry the RPC. If the job state is `DONE`, retrying is aborted early, as the job will not change anymore.

Returns

Type	Description
bool	True if the job is complete, False otherwise.

exception

exception(timeout=None)

Get the exception from the operation, blocking if necessary.

Parameter

Name	Description
timeout	`int` How long to wait for the operation to complete. If None, wait indefinitely.

Returns

Type	Description
Optional[google.api_core.GoogleAPICallError]	The operation's error.

exists

exists(client=None, retry: retries.Retry = <google.api_core.retry.Retry object>, timeout: float = None)

API call: test for the existence of the job via a GET request

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

Parameters

Name	Description
timeout	`Optional[float]` The number of seconds to wait for the underlying HTTP transport before using `retry`.
client	`Optional[google.cloud.bigquery.client.Client]` the client to use. If not passed, falls back to the `client` stored on the current dataset.
retry	`Optional[google.api_core.retry.Retry]` How to retry the RPC.

Returns

Type	Description
bool	Boolean indicating existence of the job.

from_api_repr

from_api_repr(resource: dict, client: Client)

Factory: construct a job given its API representation

Parameters

Name	Description
resource	`Dict` dataset job representation returned from the API
client	`google.cloud.bigquery.client.Client` Client which holds credentials and project configuration for the dataset.

Returns

Type	Description
google.cloud.bigquery.job.QueryJob	Job parsed from ``resource``.

reload

reload(client=None, retry: retries.Retry = <google.api_core.retry.Retry object>, timeout: float = None)

API call: refresh job properties via a GET request.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get

Parameters

Name	Description
timeout	`Optional[float]` The number of seconds to wait for the underlying HTTP transport before using `retry`.
client	`Optional[google.cloud.bigquery.client.Client]` the client to use. If not passed, falls back to the `client` stored on the current dataset.
retry	`Optional[google.api_core.retry.Retry]` How to retry the RPC.

result

result(page_size: int = None, max_results: int = None, retry: retries.Retry = <google.api_core.retry.Retry object>, timeout: float = None, start_index: int = None, job_retry: retries.Retry = <google.api_core.retry.Retry object>)

Start the job and wait for it to complete and get the result.

Parameters

Name	Description
page_size	`Optional[int]` The maximum number of rows in each page of results from this request. Non-positive values are ignored.
max_results	`Optional[int]` The maximum total number of rows from this request.
timeout	`Optional[float]` The number of seconds to wait for the underlying HTTP transport before using `retry`. If multiple requests are made under the hood, `timeout` applies to each individual request.
start_index	`Optional[int]` The zero-based index of the starting row to read.
retry	`Optional[google.api_core.retry.Retry]` How to retry the call that retrieves rows. This only applies to making RPC calls. It isn't used to retry failed jobs. This has a reasonable default that should only be overridden with care. If the job state is `DONE`, retrying is aborted early even if the results are not available, as this will not change anymore.
job_retry	`Optional[google.api_core.retry.Retry]` How to retry failed jobs. The default retries rate-limit-exceeded errors. Passing `None` disables job retry. Not all jobs can be retried. If `job_id` was provided to the query that created this job, then the job returned by the query will not be retryable, and an exception will be raised if non-`None` non-default `job_retry` is also provided.

Exceptions

Type	Description
google.cloud.exceptions.GoogleAPICallError	If the job failed and retries aren't successful.
concurrent.futures.TimeoutError	If the job did not complete in the given timeout.
TypeError	If Non-``None`` and non-default ``job_retry`` is provided and the job is not retryable.

Returns

Type	Description
google.cloud.bigquery.table.RowIterator	Iterator of row data Row-s. During each page, the iterator will have the ``total_rows`` attribute set, which counts the total number of rows in the result set (this is distinct from the total number of rows in the current page: ``iterator.page.num_items``). If the query is a special query that produces no results, e.g. a DDL query, an ``_EmptyRowIterator`` instance is returned.

running

running()

True if the operation is currently running.

set_exception

set_exception(exception)

Set the Future's exception.

set_result

set_result(result)

Set the Future's result.

to_api_repr

to_api_repr()

Generate a resource for _begin.

to_arrow

to_arrow(
    progress_bar_type: str = None,
    bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None,
    create_bqstorage_client: bool = True,
    max_results: Optional[int] = None,
)

[Beta] Create a class:pyarrow.Table by loading all pages of a table or query.

Parameters

Name	Description
progress_bar_type	`Optional[str]` If set, use the `tqdm <https://tqdm.github.io/>`_ library to display a progress bar while the data downloads. Install the `tqdm` package to use this feature. Possible values of `progress_bar_type` include: `None` No progress bar. `'tqdm'` Use the `tqdm.tqdm` function to print a progress bar to :data:`sys.stderr`. `'tqdm_notebook'` Use the `tqdm.tqdm_notebook` function to display a progress bar as a Jupyter notebook widget. `'tqdm_gui'` Use the `tqdm.tqdm_gui` function to display a progress bar as a graphical dialog box.
bqstorage_client	`Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]` A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API. This method requires `google-cloud-bigquery-storage` library. Reading from a specific partition or snapshot is not currently supported by this method.
create_bqstorage_client	`Optional[bool]` If `True` (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the `bqstorage_client` parameter for more information. This argument does nothing if `bqstorage_client` is supplied. .. versionadded:: 1.24.0
max_results	`Optional[int]` Maximum number of rows to include in the result. No limit by default. .. versionadded:: 2.21.0

to_dataframe

to_dataframe(
    bqstorage_client: Optional[bigquery_storage.BigQueryReadClient] = None,
    dtypes: Dict[str, Any] = None,
    progress_bar_type: str = None,
    create_bqstorage_client: bool = True,
    max_results: Optional[int] = None,
    geography_as_object: bool = False,
)

Return a pandas DataFrame from a QueryJob

Parameters

Name	Description
bqstorage_client	`Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]` A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API. This method requires the `fastavro` and `google-cloud-bigquery-storage` libraries. Reading from a specific partition or snapshot is not currently supported by this method.
dtypes	`Optional[Map[str, Union[str, pandas.Series.dtype]]]` A dictionary of column names pandas `dtype`s. The provided `dtype` is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.
progress_bar_type	`Optional[str]` If set, use the `tqdm <https://tqdm.github.io/>`_ library to display a progress bar while the data downloads. Install the `tqdm` package to use this feature. See to_dataframe for details. .. versionadded:: 1.11.0
create_bqstorage_client	`Optional[bool]` If `True` (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the `bqstorage_client` parameter for more information. This argument does nothing if `bqstorage_client` is supplied. .. versionadded:: 1.24.0
max_results	`Optional[int]` Maximum number of rows to include in the result. No limit by default. .. versionadded:: 2.21.0
geography_as_object	`Optional[bool]` If `True`, convert GEOGRAPHY data to `shapely` geometry objects. If `False` (default), don't cast geography data to `shapely` geometry objects. .. versionadded:: 2.24.0

Exceptions

Type	Description
ValueError	If the `pandas` library cannot be imported, or the bigquery_storage_v1 module is required but cannot be imported. Also if `geography_as_object` is `True`, but the `shapely` library cannot be imported.

Returns

Type	Description
pandas.DataFrame	A `pandas.DataFrame` populated with row data and column headers from the query results. The column headers are derived from the destination table's schema.

to_geodataframe

to_geodataframe(
    bqstorage_client: bigquery_storage.BigQueryReadClient = None,
    dtypes: Dict[str, Any] = None,
    progress_bar_type: str = None,
    create_bqstorage_client: bool = True,
    max_results: Optional[int] = None,
    geography_column: Optional[str] = None,
)

Return a GeoPandas GeoDataFrame from a QueryJob

Parameters

Name	Description
dtypes	`Optional[Map[str, Union[str, pandas.Series.dtype]]]` A dictionary of column names pandas `dtype`s. The provided `dtype` is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.
progress_bar_type	`Optional[str]` If set, use the `tqdm <https://tqdm.github.io/>`_ library to display a progress bar while the data downloads. Install the `tqdm` package to use this feature. See to_dataframe for details. .. versionadded:: 1.11.0
create_bqstorage_client	`Optional[bool]` If `True` (default), create a BigQuery Storage API client using the default API settings. The BigQuery Storage API is a faster way to fetch rows from BigQuery. See the `bqstorage_client` parameter for more information. This argument does nothing if `bqstorage_client` is supplied. .. versionadded:: 1.24.0
max_results	`Optional[int]` Maximum number of rows to include in the result. No limit by default. .. versionadded:: 2.21.0
geography_column	`Optional[str]` If there are more than one GEOGRAPHY column, identifies which one to use to construct a GeoPandas GeoDataFrame. This option can be ommitted if there's only one GEOGRAPHY column.
bqstorage_client	`Optional[google.cloud.bigquery_storage_v1.BigQueryReadClient]` A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API. This method requires the `fastavro` and `google-cloud-bigquery-storage` libraries. Reading from a specific partition or snapshot is not currently supported by this method.

Exceptions

Type	Description
ValueError	If the `geopandas` library cannot be imported, or the bigquery_storage_v1 module is required but cannot be imported. .. versionadded:: 2.24.0

Returns

Type	Description
geopandas.GeoDataFrame	A `geopandas.GeoDataFrame` populated with row data and column headers from the query results. The column headers are derived from the destination table's schema.

init

__init__(job_id, query, client, job_config=None)

Initialize self. See help(type(self)) for accurate signature.

QueryJob

QueryJob(job_id, query, client, job_config=None)

Asynchronous job: query tables.

Parameters

Name	Description
job_id	`str` the job's ID, within the project belonging to `client`.
query	`str` SQL query string.
client	`google.cloud.bigquery.client.Client` A client which holds credentials and project configuration for the dataset (which requires a project).
job_config	`Optional[google.cloud.bigquery.job.QueryJobConfig]` Extra configuration options for the query job.

Class QueryJob (3.1.0)

Parameters

Inheritance

Properties

allow_large_results

billing_tier

cache_hit

clustering_fields

connection_properties

create_disposition

create_session

created

ddl_operation_performed

ddl_target_routine

ddl_target_table

default_dataset

destination

destination_encryption_configuration

dry_run

ended

error_result

errors

estimated_bytes_processed

etag

flatten_results

job_id

job_type

labels

location

maximum_billing_tier

maximum_bytes_billed

num_child_jobs

num_dml_affected_rows

parent_job_id

path

priority

project

query

query_parameters

query_plan

range_partitioning

referenced_tables

reservation_usage

schema

schema_update_options

script_statistics

self_link

session_info

slot_millis

started

state

statement_type

table_definitions

time_partitioning

timeline

total_bytes_billed

total_bytes_processed

transaction_info

udf_resources

undeclared_query_parameters

use_legacy_sql

use_query_cache

user_email

write_disposition

bi_engine_stats

dml_stats

Methods

add_done_callback

cancel

cancelled

done

exception

exists

from_api_repr

reload

result

running

set_exception

set_result

to_api_repr

init