Jump to Content
Data Analytics

Power up your data analysis: The Data Science Agent now supports BigQuery ML, DataFrames, and Spark

September 16, 2025
Ellery Berk

Product Manager, Google Cloud

Jeff Nelson

Developer Relations Engineer, Google Cloud

Try Gemini 2.5

Our most intelligent model is now available on Vertex AI

Try now

We recently announced AI-first Colab Enterprise notebook experience in BigQuery and Vertex AI to help you simplify and transform your data science and analytics workflows. Colab Enterprise notebooks come with a built-in Data Science Agent to accelerate your data science development with agentic capabilities that facilitate data exploration, transformation, and machine learning modeling. With nothing but a simple prompt, the agent generates a detailed plan for your workflows – from data loading and cleaning to model training and evaluation.

Today, we're introducing powerful new features in the Data Science Agent to further simplify and scale your analytical journeys, especially with large and open-format datasets.

Generate BigQuery ML, BigQuery DataFrames, & Spark

You can now harness the power of BigQuery Machine Learning (ML), BigQuery DataFrames (BigFrames), and Spark for large-scale data processing directly within the Data Science Agent. BigQuery ML and BigQuery DataFrames allow you to scale up data transformation, model training, and inference by running them directly on BigQuery. And with Serverless for Apache Spark, you can perform distributed data processing on large datasets, allowing you to work with data that is too large to fit into memory on a single machine.

To invoke these tools, simply include the following keywords in your prompt:

  • For BigQuery ML: use "BigQuery ML", "BQML", or "SQL"
  • For BigQuery DataFrames: specify "BigQuery DataFrames" or "BigFrames"
  • For PySpark: include "Spark" or "PySpark"
https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_dsa_bigframes.gif

In the future, the Data Science Agent will be able to pick the relevant framework for your use case — e.g., based on the size of your selected datasets or the contents of your Notebook.

In the meantime, here are some sample prompts to get you started:

  • “Build a high-quality forecasting model using BigQuery SQL on project_id.dataset_id.table_id to predict stock needs. Present the model’s evaluation metrics and visualize the forecast with a 95% confidence interval.”

  • “Using BigQuery DataFrames, train and evaluate a gradient boosted tree model to predict housing prices from the table project_id.dataset_id.table_id. Before training, one-hot encode the neighborhood column.”

  • “I want to group similar customers together for targeted marketing campaigns, but first I need to do dimensionality reduction using a PCA model. Use Spark to do this on table project_id.dataset_id.table_id.”

Limitation: the Data Science Agent currently generates Spark 4.0 code. The agent can help you upgrade your code to Spark 4.0. However, if you need to use an earlier version of Spark, we recommend not using the Data Science Agent for PySpark for now.

Add data using context and @ mentions

We are also making it easier to bring your data into the conversation. The Data Science Agent can now automatically retrieve metadata and tables for your BigQuery tables. This means you can describe a table directly in your prompt and let the Data Science Agent search for the most relevant table on your behalf.

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_dsa_context_aware.gif

Further, you can now search for BigQuery tables within your current project using an @ mention. This familiar, industry-standard mechanism allows you to build your prompt with the relevant context — without your hands ever leaving the keyboard.

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_dsa_at_selection.gif

Limitation: The @ mention currently only searches for BigQuery tables in your current project. For broader searches across projects or to add files from session storage and local uploads, please continue to use the "+" button.

Try the Data Science Agent today

Under the hood, we've also optimized the Data Science Agent so it will start up faster after your first message. Less waiting, faster insights. Similar improvements for Colab Enterprise in Vertex AI are coming soon.

We’re committed to evolving the AI-powered data science experience and can’t wait to show you what we’re building next. To get started, check out the resources below:

Posted in