Jump to Content
Developers & Practitioners

Agent Factory Recap: Cracking Open an Open Model

November 14, 2025
https://storage.googleapis.com/gweb-cloudblog-publish/images/The_Agent_Factory_Blog_-_Hero.max-2500x2500.png
Amit Maraj

Developer Relations Engineer

Ivan Nardini

Developer Relations Engineer

Welcome back to The Agent Factory! In this episode, we’re joined by Ravin Kumar, a Research Engineer at DeepMind, to tackle one of the biggest topics in AI right now: building and training open-source agentic models. We wanted to go beyond just using agents and understand what it takes to build the entire factory line—from gathering data and supervised fine-tuning to reinforcement learning and evaluations.

Video Thumbnail

This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.

The Agent Industry Pulse

Timestamp: 2:00

https://storage.googleapis.com/gweb-cloudblog-publish/images/image-1_ZfG9LL0.max-2200x2200.png

Before diving into the deep research, we looked at the latest developments in the fast-moving world of AI agents.

  • Gemini 2.5 Computer Use: Google's new model can act as a virtual user, interacting with computer screens, clicking buttons, typing in forms, and scrolling. It’s a shift from agents that just know things to agents that can do tasks directly in a browser.

  • Vibe Coding in AI Studio: A new approach to app building where you describe the "vibe" of the application you want, and the AI handles the boilerplate. It includes an Annotation Mode to refine specific UI elements with simple instructions like "Change this to green."

  • DeepSeek-OCR and Context Compression: DeepSeek introduced a method that treats documents like images to understand layout, compressing 10-20 text tokens into a single visual token. This drastically improves speed and reduces cost for long-context tasks.

  • Google Veo 3.1 and Flow: The new update to the AI video model adds rich audio generation and powerful editing features. You can now use "Insert" to add characters or "Remove" to erase objects from existing video footage, giving creators iterative control.

Ravin Kumar on Building Open Models

We sat down with Ravin to break down the end-to-end process of creating an open model with agent capabilities. It turns out the process mirrors a traditional ML lifecycle but with significantly more complex components.

Defining Agent Data

Timestamp: 14:55

Ravin explained that training data for agents looks vastly different from standard text datasets. It starts with identifying what users actually need. The data itself is a collection of trajectories, complex examples of the model making decisions and using tools. Ravin noted that they use a mix of human-curated data and synthetic data generated by their own internal "teacher" models and APIs to create a playground for the open models to learn in.

Training Techniques: SFT and Reinforcement Learning

Timestamp: 17:14 

Once the data is ready, the training process involves a two-phase approach. First comes Supervised Fine-Tuning (SFT), where frameworks update the model's weights to nudge it into new behaviors based on the examples. However, to handle generalization—new situations not in the original trainin data—they rely on Reinforcement Learning (RL). Ravin highlighted the difficulty of setting rewards in RL, warning that models are prone to "reward hacking," where they might collect intermediate rewards without ever completing the final task.

The Stakes of Evaluation

Timestamp: 20:10

Ravin emphasized that evaluation is the most critical and high-stakes part of the process. You can't just trust the training process; you need a rigorous "final exam." They use a combination of broad public benchmarks to measure general capability and specific, custom evaluations to ensure the model is safe and effective for its intended user use case.

Conclusion

This conversation with Ravin Kumar really illuminated that building open agentic models is a highly structured, rigorous process. It requires creating high-quality trajectories for data, a careful combination of supervised and reinforcement learning, and, crucially, intense evaluation.

Your turn to build

As Ravin advised, the best place to start is at the end. Before you write a single line of training code, define what success looks like by building a small, 50-example final exam for your agent. If you can't measure it, you can't improve it. We also encourage you to try mixing different approaches; for example, using a powerful API model like Gemini as a router and a specialized open-source model for specific tasks.

Check out the full episode for more details, and catch us next time!

Connect with us

Posted in