Developers & Practitioners

Agent Factory Recap: AI Agents for Data Engineering and Data Science

October 28, 2025

https://storage.googleapis.com/gweb-cloudblog-publish/images/ai-agents-for-data-engineering-data-scienc.max-2500x2500.png

Lucia Subatin

Developer Advocate

Smitha Kolan

Senior Developer Relations

Welcome to another exciting episode of The Agent Factory, the podcast that goes beyond the hype to build production-ready AI agents! In this episode, we were thrilled to host Lucia Subatin, who guided us through the world of data agents and their transformative power for data engineers and scientists. She also showcased some truly innovative applications of graph databases and AI for better access to knowledge.

This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.

The Agent Industry Pulse

Timestamp: [01:45]

This week, the agent industry is buzzing with some groundbreaking releases:

Gemini API's Computer Use Model: A new model that grants AI agents the ability to "see" and interact with your computer screen. It takes screenshots, decides on UI actions (click, scroll, type, open webpage), and executes them, allowing agents to automate real-world browser tasks like filling forms or testing user flows. Built with robust safety layers, every action undergoes a safety check, requiring human confirmation for risky operations. We even saw a demo of it looking up pricing on a documentation page!

https://storage.googleapis.com/gweb-cloudblog-publish/images/computer-use-model-demo.max-2200x2200.png

CodeMender - AI Agent for Code Security: This AI agent is designed to autonomously patch new vulnerabilities as they arise (reactive) and rewrite existing code to secure entire classes of flaws (proactive). Leveraging the reasoning power of Gemini Deep Think and equipped with self-correction tools like static analysis and fuzzing, CodeMender automates the creation and validation of high-quality security patches at scale. It has already upstreamed 72 security fixes to open-source projects, marking a significant breakthrough for software security.

The Factory Floor

The Factory Floor is our segment for getting hands-on. Here, we moved from high-level concepts to practical code with live demos.

The BigQuery Data Engineering Agent

https://storage.googleapis.com/gweb-cloudblog-publish/images/data-engineering-agent.max-2200x2200.png

Timestamp: [06:44]

We dove into the BigQuery Data Engineering Agent, a powerful tool for automating data pipeline creation and management directly within BigQuery.

Generating Sales Regions: Lucia demonstrated how to use the agent to add a new sales_region field to an accounts table based on the billing_country, leveraging BigQuery's AI_GENERATE function to call Gemini 2.5 Flash from a SQL statement.
Creating a Time Dimension Table: The agent was then prompted to generate a comprehensive time_dimension table, crucial for natural language to SQL queries by providing readily available date components (year, quarter, month name) for easier analysis.
Automating Data Quality Assertions: Finally, Lucia showed how the agent can automatically generate data quality assertions for all tables, such as ensuring non-null IDs and unique account names, to maintain data cleanliness and reliability for agent applications.

The Data Science Agent

https://storage.googleapis.com/gweb-cloudblog-publish/images/data-science-agent.max-2200x2200.png

Timestamp: [07:24]

Next, we explored the Data Science Agent, operating within Colab Enterprise, to extract insights and prepare data for agent applications.

Anomaly Detection: Lucia tasked the agent with detecting anomalies in a Case table. The agent formulated a plan to load and describe data, preprocess it for anomaly detection, train an isolation forest model, and provide visualizations.
Identifying Anomalous Records: After executing its plan, the agent successfully identified anomalous records, provided a summary of its findings, and even presented a visual confirmation of the separation between normal and anomalous data points. It also offered insights and next steps to understand the root causes of these anomalies, proving invaluable for improving data collection processes.

Creating Comics from Spanner Concepts using an ADK

Timestamp: [26:01]

In a truly unique demonstration, we saw how to combine a graph database with AI for creative content generation.

Spanner Graph Database: Lucia explained Spanner as a globally distributed, strongly consistent database with graph capabilities. She showcased a graph database built from Spanner's documentation, traversable via GQL.
Knowledge Traversal and Comic Generation: Using an ADK application, a knowledge agent traversed this Spanner graph database to answer "What are regions?" Based on the retrieved information, another agent generated a detailed prompt for Nano Banana, an image generation model, to create a six-panel comic strip explaining Spanner regions in a vibrant tech illustration style. The comic visually explained regional, dual-region, and multi-region configurations.

The following is an example of another comic generated by the agent, responding to the question “What is interleaving?”

https://storage.googleapis.com/gweb-cloudblog-publish/images/blog_post_image.max-600x600.png

It was incredible to see how agents could not only retrieve precise information but also transform it into engaging visual content, even with multiple iterations to refine text clarity in the generated images.

Developer Q&A

Timestamp: [38:49]

We wrapped up with some great questions from our developer community:

On the Availability of Data Science and Data Engineering Agents

Timestamp: [38:53]

Both the Data Science Agent and the Data Engineering Agent are currently in preview. The Data Science Agent is in public preview, while access to the Data Engineering Agent requires following a specific link, which we'll provide in the description. This means developers can start experimenting with these powerful tools today!

On the Scalability and Deployment of the Data Engineering Agent

Timestamp: [39:33]

The Data Engineering Agent leverages highly scalable platforms: BigQuery and Dataform. It can perform analysis across multiple tables, datasets, and projects, provided the executing pipeline has the necessary permissions. For deployment to higher environments (staging, production), Dataform excels in assisting the data pipeline lifecycle by generating declarative artifacts that can be released and configured for deployment across various project and dataset combinations, ensuring a robust software delivery lifecycle for your data pipelines.

What an incredible journey through the world of data agents and creative AI! We hope this episode inspired you to explore the possibilities of augmenting your data workflows and even generating engaging content with these innovative tools. The power to build cleaner data pipelines, derive deeper insights, and bring complex concepts to life through AI is truly at your fingertips.

Your turn to build

Ready to get hands-on? Dive into the resources linked below and start building your own data agents and AI-powered applications today! Don't forget to watch the full episode for all the practical demonstrations.