Migrating from pandas-gbq

The pandas-gbq library is a community-led project by the pandas community. The BigQuery client library, google-cloud-bigquery, is the official python library for interacting with BigQuery. If you are using the pandas-gbq library, you are already using the google-cloud-bigquery library. pandas-gbq uses google-cloud-bigquery to make API calls to BigQuery. It provides an easy interface from pandas to BigQuery, but lacks many of the features provided by the google-cloud-bigquery library.

This topic provides details on the changes that you need to update your Python code to use google-cloud-bigquery rather than pandas-gbq. The code samples in this topic use the following versions of the two libraries:

google-cloud-bigquery[pandas,pyarrow]==1.7.0
pandas-gbq==0.7.0

Key differences in the level of functionality and support between the two libraries include:

pandas-gbq google-cloud-bigquery
Support Open source library maintained by PyData and volunteer contributors Open source library maintained by Google and volunteer contributors
BigQuery API functionality covered Limited to running queries and saving data from pandas DataFrames to tables Full BigQuery API functionality, with added support for reading/writing pandas DataFrames and a Jupyter magic for running queries
Cadence of new features New features added to the library only if implemented by volunteer contributors New features implemented as they are released in the BigQuery API
docs / source docs / source

Running Queries

Both libraries support querying data stored in BigQuery. Key differences between the libraries include:

pandas-gbq google-cloud-bigquery
Default SQL syntax Legacy SQL Standard SQL
Query configurations Sent as dictionary in the format specified in the BigQuery REST reference. Use the QueryJobConfig class, which contains properties for the various API configuration options.

Querying data with the standard SQL syntax

The following sample shows how to run a standard SQL query with and without explicitly specifying a project. For both libraries, if a project is not specified, the project will be determined from the default credentials.

pandas-gbq:

import pandas

sql = """
    SELECT name
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state = 'TX'
    LIMIT 100
"""

# Run a Standard SQL query using the environment's default project
df = pandas.read_gbq(sql, dialect='standard')

# Run a Standard SQL query with the project set explicitly
project_id = 'your-project-id'
df = pandas.read_gbq(sql, project_id=project_id, dialect='standard')

google-cloud-bigquery:

from google.cloud import bigquery

client = bigquery.Client()
sql = """
    SELECT name
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state = 'TX'
    LIMIT 100
"""

# Run a Standard SQL query using the environment's default project
df = client.query(sql).to_dataframe()

# Run a Standard SQL query with the project set explicitly
project_id = 'your-project-id'
df = client.query(sql, project=project_id).to_dataframe()

Querying data with the legacy SQL syntax

The following sample shows how to run a query using legacy SQL syntax. See the Standard SQL Migration Guide for guidance on updating your queries to standard SQL.

pandas-gbq:

import pandas

sql = """
    SELECT name
    FROM [bigquery-public-data:usa_names.usa_1910_current]
    WHERE state = 'TX'
    LIMIT 100
"""

df = pandas.read_gbq(sql, dialect='legacy')

google-cloud-bigquery:

from google.cloud import bigquery

client = bigquery.Client()
sql = """
    SELECT name
    FROM [bigquery-public-data:usa_names.usa_1910_current]
    WHERE state = 'TX'
    LIMIT 100
"""
query_config = bigquery.QueryJobConfig(use_legacy_sql=True)

df = client.query(sql, job_config=query_config).to_dataframe()

Running a query with a configuration

Sending a configuration with a BigQuery API request is required to perform certain complex operations, such as running a parameterized query or specifying a destination table to store the query results. In pandas-gbq, the configuration must be sent as a dictionary in the format specified in the BigQuery REST reference. In google-cloud-bigquery, job configuration classes are provided, such as QueryJobConfig, which contain the necessary properties to configure complex jobs.

The following sample shows how to run a query with named parameters.

pandas-gbq:

import pandas

sql = """
    SELECT name
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state = @state
    LIMIT @limit
"""
query_config = {
    'query': {
        'parameterMode': 'NAMED',
        'queryParameters': [
            {
                'name': 'state',
                'parameterType': {'type': 'STRING'},
                'parameterValue': {'value': 'TX'}
            },
            {
                'name': 'limit',
                'parameterType': {'type': 'INTEGER'},
                'parameterValue': {'value': 100}
            }
        ]
    }
}

df = pandas.read_gbq(sql, configuration=query_config)

google-cloud-bigquery:

from google.cloud import bigquery

client = bigquery.Client()
sql = """
    SELECT name
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state = @state
    LIMIT @limit
"""
query_config = bigquery.QueryJobConfig(
    query_parameters=[
        bigquery.ScalarQueryParameter('state', 'STRING', 'TX'),
        bigquery.ScalarQueryParameter('limit', 'INTEGER', 100)
    ]
)

df = client.query(sql, job_config=query_config).to_dataframe()

Loading a pandas DataFrame to a BigQuery table

Both libraries support uploading data from a pandas DataFrame to a new table in BigQuery. Key differences include:

pandas-gbq google-cloud-bigquery
Type support Converts the DataFrame to CSV format before sending to the API, which does not support nested or array values. Converts the DataFrame to Parquet format before sending to the API, which supports nested and array values. Note that pyarrow, which is the parquet engine used to send the DataFrame data to the BigQuery API, must be installed to load the DataFrame to a table.
Load configurations Sent as dictionary in the format specified in the BigQuery REST reference. Use the LoadJobConfig class, which contains properties for the various API configuration options.

pandas-gbq:

import pandas

df = pandas.DataFrame(
    {
        'my_string': ['a', 'b', 'c'],
        'my_int64': [1, 2, 3],
        'my_float64': [4.0, 5.0, 6.0],
    }
)
full_table_id = 'my_dataset.new_table'
project_id = 'my-project-id'

df.to_gbq(full_table_id, project_id=project_id)

google-cloud-bigquery:

from google.cloud import bigquery
import pandas

df = pandas.DataFrame(
    {
        'my_string': ['a', 'b', 'c'],
        'my_int64': [1, 2, 3],
        'my_float64': [4.0, 5.0, 6.0],
    }
)
client = bigquery.Client()
dataset_ref = client.dataset('my_dataset')
table_ref = dataset_ref.table('new_table')

client.load_table_from_dataframe(df, table_ref).result()

Features not supported by pandas-gbq

While the pandas-gbq library provides a useful interface for querying data and writing data to tables, it does not cover many of the BigQuery API features, including but not limited to:

Was this page helpful? Let us know how we did:

Send feedback about...

Need help? Visit our support page.