Migrate to BigQuery DataFrames 2.0

On April 16, 2025, the BigQuery DataFrames team plans to release version 2.0 of BigQuery DataFrames. This version makes security and performance improvements to the BigQuery DataFrames API and adds new features. This document describes the changes and provides migration guidance. You can apply these recommendations before the release by using the latest version 1.x of BigQuery DataFrames or by installing pre-release versions of the 2.0 package.

Install BigQuery DataFrames version 2.0

To avoid breaking changes, pin to a specific version of BigQuery DataFrames in your requirements.txt file (for example, bigframes==1.38.0) or your pyproject.toml file (for example, dependencies = ["bigframes = 1.38.0"]). When you are ready to try the latest version, you can run pip install --upgrade bigframes to install the latest version of BigQuery DataFrames.

Use the allow_large_results option

BigQuery has a maximum response size limit for query jobs. Starting in BigQuery DataFrames version 2.0, BigQuery DataFrames enforces this limit by default in methods that return results to the client, such as peek(), to_pandas(), and to_pandas_batches(). If your job returns large results, you can set allow_large_results to True in your BigQueryOptions object to avoid breaking changes. This option is set to False by default in BigQuery DataFrames version 2.0.


  import bigframes.pandas as bpd

  bpd.options.bigquery.allow_large_results = True

You can override the allow_large_results option by using the allow_large_results parameter in the to_pandas() and other methods. For example:


  bf_df = bpd.read_gbq(query)
  # ... other operations on bf_df ...
  pandas_df = bf_df.to_pandas(allow_large_results=True)

Use the @remote_function decorator

BigQuery DataFrames version 2.0 makes some changes to the default behavior of the @remote_function decorator.

Set a service account

As of version 2.0, BigQuery DataFrames no longer uses the Compute Engine service account by default for the Cloud Run functions it deploys. To limit the permissions of the function you deploy,

  1. Create a service account with minimal permissions.
  2. Then supply the service account email to the cloud_function_service_account parameter of the @remote_function decorator.

For example:


  @remote_function(
    cloud_function_service_account="my-service-account@my-project.iam.gserviceaccount.com",
    ...
  )
  def my_remote_function(parameter: int) -> str:
    return str(parameter)

If you would like to use the Compute Engine service account, you can set the cloud_function_service_account parameter of the @remote_function decorator to "default". For example:


  # This usage is discouraged. Use only if you have a specific reason to use the
  # default Compute Engine service account.
  @remote_function(cloud_function_service_account="default", ...)
  def my_remote_function(parameter: int) -> str:
    return str(parameter)

Set ingress settings

As of version 2.0, BigQuery DataFrames sets the ingress settings of the Cloud Run functions it deploys to "internal-only". Previously, the ingress settings were set to "all" by default. You can change the ingress settings by setting the cloud_function_ingress_settings parameter of the @remote_function decorator. For example:


  @remote_function(cloud_function_ingress_settings="internal-and-gclb", ...)
  def my_remote_function(parameter: int) -> str:
    return str(parameter)

Use custom endpoints

Previously, if a region didn't support regional service endpoints and bigframes.pandas.options.bigquery.use_regional_endpoints = True, then BigQuery DataFrames would fall back to locational endpoints. Version 2.0 of BigQuery DataFrames removes this fallback behavior. To connect to locational endpoints in version 2.0, set the bigframes.pandas.options.bigquery.client_endpoints_override option. For example:


import bigframes.pandas as bpd

bpd.options.bigquery.client_endpoints_override = {
  "bqclient": "https://LOCATION-bigquery.googleapis.com",
  "bqconnectionclient": "LOCATION-bigqueryconnection.googleapis.com",
  "bqstoragereadclient": "LOCATION-bigquerystorage.googleapis.com",
}

Replace LOCATION with the name of the BigQuery location you want to connect to.

Use partial ordering mode

With BigQuery DataFrames version 2.0, the partial ordering mode is generally available, but it isn't enabled by default. To use partial ordering, set ordering_mode to partial before performing any other operation with BigQuery DataFrames, as shown in the following code sample:

import bigframes.pandas as bpd

bpd.options.bigquery.ordering_mode = "partial"

This mode generates more efficient queries in most cases, and identical queries in others, such as those that use the groupby() function. Some pandas-compatible functions that require ordering, such as .iloc[row_index], are not supported in partial ordering mode. For more information, see Partial ordering mode.

What's next