pandas-gbq
library provides a simple interface for running queries and
uploading pandas dataframes to BigQuery. It is a thin wrapper
around the BigQuery client
library, google-cloud-bigquery
. This
topic provides code samples comparing google-cloud-bigquery
and pandas-gbq
.
Key differences in the level of functionality and support between the two libraries include:
pandas-gbq | google-cloud-bigquery | |
---|---|---|
Support | Open source library maintained by PyData and volunteer contributors | Open source library maintained by Google |
BigQuery API functionality covered | Run queries and save data from pandas DataFrames to tables | Full BigQuery API functionality, with added support for reading/writing pandas DataFrames and a Jupyter magic for running queries |
docs / source | docs / source |
Install the libraries
To use the code samples in this guide, install the pandas-gbq
package and the
BigQuery Python client libraries.
PIP
Install the
pandas-gbq
and
google-cloud-bigquery
packages.
pip install --upgrade pandas-gbq 'google-cloud-bigquery[bqstorage,pandas]'
Conda
Install the
pandas-gbq
and
google-cloud-bigquery
Conda packages from the community-run conda-forge
channel.
conda install -c conda-forge pandas-gbq google-cloud-bigquery
Running Queries
Both libraries support querying data stored in BigQuery. Key differences between the libraries include:
pandas-gbq | google-cloud-bigquery | |
---|---|---|
Default SQL syntax | Standard SQL (configurable with pandas_gbq.context.dialect ) |
Standard SQL |
Query configurations | Sent as dictionary in the format specified in the BigQuery REST reference. | Use the QueryJobConfig class, which contains properties for the various API configuration options. |
Querying data with the standard SQL syntax
The following sample shows how to run a standard SQL query with and without explicitly specifying a project. For both libraries, if a project is not specified, the project will be determined from the default credentials.
pandas-gbq
:
google-cloud-bigquery
:
Querying data with the legacy SQL syntax
The following sample shows how to run a query using legacy SQL syntax. See the Standard SQL migration guide for guidance on updating your queries to standard SQL.
pandas-gbq
:
google-cloud-bigquery
:
Using the BigQuery Storage API to download large results
Use the BigQuery Storage API to speed-up downloads of large results by 15 to 31 times.
pandas-gbq
:
google-cloud-bigquery
:
Running a query with a configuration
Sending a configuration with a BigQuery API request is required
to perform certain complex operations, such as running a parameterized query or
specifying a destination table to store the query results. In pandas-gbq
, the
configuration must be sent as a dictionary in the format specified in the
BigQuery REST reference.
In google-cloud-bigquery
, job configuration classes are provided, such as
QueryJobConfig
,
which contain the necessary properties to configure complex jobs.
The following sample shows how to run a query with named parameters.
pandas-gbq
:
google-cloud-bigquery
:
Loading a pandas DataFrame to a BigQuery table
Both libraries support uploading data from a pandas DataFrame to a new table in BigQuery. Key differences include:
pandas-gbq | google-cloud-bigquery | |
---|---|---|
Type support | Converts the DataFrame to CSV format before sending to the API, which does not support nested or array values. | Converts the DataFrame to Parquet format before sending to the API, which supports nested and array values. Note that pyarrow , which is the parquet engine used to send the DataFrame data to the BigQuery API, must be installed to load the DataFrame to a table. |
Load configurations | Sent as dictionary in the format specified in the BigQuery REST reference. | Use the LoadJobConfig class, which contains properties for the various API configuration options. |
pandas-gbq
:
google-cloud-bigquery
:
google-cloud-bigquery
package requires the pyarrow
library to serialize
a pandas DataFrame to a Parquet file.
Install the pyarrow
package:
conda install -c conda-forge pyarrow
or
pip install pyarrow
Features not supported by pandas-gbq
While the pandas-gbq
library provides a useful interface for querying data
and writing data to tables, it does not cover many of the
BigQuery API features, including but not limited to:
- Managing datasets, including creating new datasets, updating dataset properties, and deleting datasets
- Loading data into BigQuery from formats other than pandas DataFrames
- Managing tables, including listing tables in a dataset, copying table data, and deleting tables
- Exporting BigQuery data directly to Cloud Storage