Migrate to BigQuery DataFrames 2.0
On April 16, 2025, the BigQuery DataFrames team plans to release version 2.0 of BigQuery DataFrames. This version makes security and performance improvements to the BigQuery DataFrames API and adds new features. This document describes the changes and provides migration guidance. You can apply these recommendations before the release by using the latest version 1.x of BigQuery DataFrames or by installing pre-release versions of the 2.0 package.
Install BigQuery DataFrames version 2.0
To avoid breaking changes, pin to a specific version of BigQuery DataFrames in your
requirements.txt
file (for example, bigframes==1.38.0
)
or your pyproject.toml
file (for example, dependencies = ["bigframes = 1.38.0"]
).
When you are ready to try the latest version, you can run
pip install --upgrade bigframes
to install the latest version of
BigQuery DataFrames.
Use the allow_large_results
option
BigQuery has a maximum response size
limit for query jobs. Starting in BigQuery DataFrames version 2.0,
BigQuery DataFrames enforces this limit by default in methods that return results to
the client, such as peek()
, to_pandas()
, and
to_pandas_batches()
. If your job returns large results, you can
set allow_large_results
to True
in your
BigQueryOptions
object to avoid breaking changes. This option is set to
False
by default in BigQuery DataFrames version 2.0.
import bigframes.pandas as bpd
bpd.options.bigquery.allow_large_results = True
You can override the allow_large_results
option by using the
allow_large_results
parameter in
the to_pandas()
and other methods. For example:
bf_df = bpd.read_gbq(query)
# ... other operations on bf_df ...
pandas_df = bf_df.to_pandas(allow_large_results=True)
Use the @remote_function
decorator
BigQuery DataFrames version 2.0 makes some changes to the default
behavior of the @remote_function
decorator.
Set a service account
As of version 2.0, BigQuery DataFrames no longer uses the Compute Engine service account by default for the Cloud Run functions it deploys. To limit the permissions of the function you deploy,
- Create a service account with minimal permissions.
- Then supply the service account email to the
cloud_function_service_account
parameter of the@remote_function
decorator.
For example:
@remote_function(
cloud_function_service_account="my-service-account@my-project.iam.gserviceaccount.com",
...
)
def my_remote_function(parameter: int) -> str:
return str(parameter)
If you would like to use the Compute Engine service account, you can set the
cloud_function_service_account
parameter of the
@remote_function
decorator to "default"
. For
example:
# This usage is discouraged. Use only if you have a specific reason to use the
# default Compute Engine service account.
@remote_function(cloud_function_service_account="default", ...)
def my_remote_function(parameter: int) -> str:
return str(parameter)
Set ingress settings
As of version 2.0, BigQuery DataFrames sets the ingress
settings of the Cloud Run functions it deploys to
"internal-only"
. Previously, the ingress settings were set to
"all"
by default. You can change the ingress settings by setting the
cloud_function_ingress_settings
parameter of the
@remote_function
decorator. For example:
@remote_function(cloud_function_ingress_settings="internal-and-gclb", ...)
def my_remote_function(parameter: int) -> str:
return str(parameter)
Use custom endpoints
Previously, if a region didn't support regional service
endpoints and
bigframes.pandas.options.bigquery.use_regional_endpoints = True
,
then BigQuery DataFrames would fall back to locational endpoints. Version
2.0 of BigQuery DataFrames removes this fallback behavior. To connect
to locational endpoints in version 2.0, set the
bigframes.pandas.options.bigquery.client_endpoints_override
option. For example:
import bigframes.pandas as bpd
bpd.options.bigquery.client_endpoints_override = {
"bqclient": "https://LOCATION-bigquery.googleapis.com",
"bqconnectionclient": "LOCATION-bigqueryconnection.googleapis.com",
"bqstoragereadclient": "LOCATION-bigquerystorage.googleapis.com",
}
Replace LOCATION
with the name of the BigQuery
location you want to connect to.
Use partial ordering mode
With BigQuery DataFrames version 2.0, the
partial ordering mode
is generally available, but it isn't enabled by default.
To use partial ordering, set ordering_mode
to partial
before performing
any other operation with BigQuery DataFrames, as shown in the following code sample:
This mode generates more efficient queries
in most cases, and identical queries in others, such as those that use the groupby()
function.
Some pandas-compatible functions that require ordering, such as .iloc[row_index]
, are not
supported in partial ordering mode. For more information, see
Partial ordering mode.