The BigQuery Storage API provides fast access to data stored in BigQuery. Use the BigQuery Storage API to download data stored in BigQuery for use in analytics tools such as the pandas library for Python.
Objectives
In this tutorial you:
- Download query results to a pandas DataFrame by using the BigQuery Storage API from the IPython magics for BigQuery in a Jupyter notebook.
- Download query results to a pandas DataFrame by using the BigQuery client library for Python.
- Download BigQuery table data to a pandas DataFrame by using the BigQuery client library for Python.
- Download BigQuery table data to a pandas DataFrame by using the BigQuery Storage API client library for Python.
Costs
BigQuery is a paid product and you will incur BigQuery usage costs for the queries you run. The first 1 TB of query data processed per month is free. For more information, see the BigQuery Pricing page.
BigQuery Storage API is a paid product and you will incur usage costs for the table data you scan when downloading a DataFrame. For more information, see the BigQuery Pricing page.
Before you begin
Before you begin this tutorial, use the Google Cloud Console to create or select a project and enable billing.
-
Sign in to your Google Account.
If you don't already have one, sign up for a new account.
-
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.
- BigQuery is automatically enabled in new projects. To activate BigQuery in a preexisting project, Enable the BigQuery, BigQuery Storage API APIs.
- Set up a Python development environment.
Setup Python - Set up authentication for your development environment.
Setup Authentication
You should also be familiar with the IPython magics for BigQuery, the BigQuery client library, and how to use the client library with pandas before completing this tutorial.
Install the client libraries
Install the BigQuery Python client library version 1.9.0 or higher and the BigQuery Storage API Python client library.
PIP
Install the
google-cloud-bigquery
and
google-cloud-bigquery-storage
packages.
pip install --upgrade 'google-cloud-bigquery[bqstorage,pandas]'
Conda
Install the
BigQuery
and the BigQuery Storage API
Conda packages from the community-run conda-forge
channel.
conda install -c conda-forge google-cloud-bigquery \
google-cloud-bigquery-storage \
pandas \
pyarrow
Download query results using the IPython magics for BigQuery
Start the Jupyter notebook server and create a new Jupyter notebook. Load the
IPython magics for BigQuery using the
%load_ext
magic.
%load_ext google.cloud.bigquery
Download large query results with the BigQuery Storage API by adding the
--use_bq_storage_api
argument to the %%bigquery
magics.
When this argument is used with small query results, the magics use the BigQuery API to download the results.
Set the
context.use_bqstorage_api
property to True
to use the BigQuery Storage API by default.
After you set the context.use_bqstorage_api
property, run the %%bigquery
magics without additional arguments to use the BigQuery Storage API to
download large results.
Use the Python client libraries
Create Python clients
Use the following code to construct a BigQuery
Client
object and a
BigQueryStorageClient
.
Use the google-auth Python library to create credentials that are sufficiently scoped for both APIs. Pass in a credentials object to each constructor to avoid authenticating twice.
Download query results using the BigQuery client library
Run a query by using the
query
method. Call the
to_dataframe
method to wait for the query to finish and download the results by using the
BigQuery Storage API.
Download table data using the BigQuery client library
Download all rows in a table by using the
list_rows
method, which returns a
RowIterator
object. Download rows by using the BigQuery Storage API by calling the
to_dataframe
method with the bqstorage_client
argument.
Download table data using the BigQuery Storage API client library
Use the BigQuery Storage API client library directly for fine-grained control over filters and parallelism. When only simple row filters are needed, a BigQuery Storage API read session may be used in place of a query.
Create a
TableReference
object with the desired table to read. Create a
TableReadOptions
object to select columns or filter rows. Create a read session using the
create_read_session
method.
If there are any streams on the session, begin reading rows from it by using the
read_rows
method. Call the
to_dataframe
method on the reader to write the entire stream to a pandas DataFrame. For
better performance, read from multiple streams in parallel, but this code
example reads from only a single stream for simplicity.
Source code for all examples
View the complete source code for all client library examples.
Cleaning up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete your project. You didn't create any BigQuery resources this tutorial, but deleting your project removes any other resources that you created.- In the Cloud Console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
What's next
- Explore the Python client libraries reference —