pandas-gbq
library is a community-led project by the pandas
community. The
BigQuery client library,
google-cloud-bigquery
, is the official python library for interacting with
BigQuery. If you are using the pandas-gbq
library, you are
already using the google-cloud-bigquery
library. pandas-gbq
uses
google-cloud-bigquery
to make API calls to BigQuery.
This topic provides code samples comparing google-cloud-bigquery
and
pandas-gbq
. The code samples in this topic use the following libraries:
Key differences in the level of functionality and support between the two libraries include:
pandas-gbq | google-cloud-bigquery | |
---|---|---|
Support | Open source library maintained by PyData and volunteer contributors | Open source library maintained by Google |
BigQuery API functionality covered | Limited to running queries and saving data from pandas DataFrames to tables | Full BigQuery API functionality, with added support for reading/writing pandas DataFrames and a Jupyter magic for running queries |
Cadence of new features | New features added to the library only if implemented by volunteer contributors | New features implemented as they are released in the BigQuery API |
docs / source | docs / source |
Running Queries
Both libraries support querying data stored in BigQuery. Key differences between the libraries include:
pandas-gbq | google-cloud-bigquery | |
---|---|---|
Default SQL syntax | Standard SQL (configurable with pandas_gbq.context.dialect ) |
Standard SQL |
Query configurations | Sent as dictionary in the format specified in the BigQuery REST reference. | Use the QueryJobConfig class, which contains properties for the various API configuration options. |
Querying data with the standard SQL syntax
The following sample shows how to run a standard SQL query with and without explicitly specifying a project. For both libraries, if a project is not specified, the project will be determined from the default credentials.
pandas-gbq
:
google-cloud-bigquery
:
Querying data with the legacy SQL syntax
The following sample shows how to run a query using legacy SQL syntax. See the Standard SQL migration guide for guidance on updating your queries to standard SQL.
pandas-gbq
:
google-cloud-bigquery
:
Using the BigQuery Storage API to download large results
Use the BigQuery Storage API to speed-up downloads of large results by 15 to 31 times.
pandas-gbq
:
google-cloud-bigquery
:
Running a query with a configuration
Sending a configuration with a BigQuery API request is required
to perform certain complex operations, such as running a parameterized query or
specifying a destination table to store the query results. In pandas-gbq
, the
configuration must be sent as a dictionary in the format specified in the
BigQuery REST reference.
In google-cloud-bigquery
, job configuration classes are provided, such as
QueryJobConfig
,
which contain the necessary properties to configure complex jobs.
The following sample shows how to run a query with named parameters.
pandas-gbq
:
google-cloud-bigquery
:
Loading a pandas DataFrame to a BigQuery table
Both libraries support uploading data from a pandas DataFrame to a new table in BigQuery. Key differences include:
pandas-gbq | google-cloud-bigquery | |
---|---|---|
Type support | Converts the DataFrame to CSV format before sending to the API, which does not support nested or array values. | Converts the DataFrame to Parquet format before sending to the API, which supports nested and array values. Note that pyarrow , which is the parquet engine used to send the DataFrame data to the BigQuery API, must be installed to load the DataFrame to a table. |
Load configurations | Sent as dictionary in the format specified in the BigQuery REST reference. | Use the LoadJobConfig class, which contains properties for the various API configuration options. |
pandas-gbq
:
google-cloud-bigquery
:
google-cloud-bigquery
package requires the pyarrow
library to serialize
a pandas DataFrame to a Parquet file.
Install the pyarrow
package:
conda install -c conda-forge pyarrow
or
pip install pyarrow
Features not supported by pandas-gbq
While the pandas-gbq
library provides a useful interface for querying data
and writing data to tables, it does not cover many of the
BigQuery API features, including but not limited to:
- Managing datasets, including creating new datasets, updating dataset properties, and deleting datasets
- Loading data into BigQuery from formats other than pandas DataFrames
- Managing tables, including listing tables in a dataset, copying table data, and deleting tables
- Exporting BigQuery data directly to Cloud Storage