Security & Identity

How Google Does It: Building an effective AI red team

March 17, 2026

https://storage.googleapis.com/gweb-cloudblog-publish/images/GettyImages-2193460418.max-2600x2600.jpg

Daniel Fabian

Director, Red Teaming

Seth Rosenblatt

Security Editor, Google Cloud

Get original CISO insights in your inbox

The latest on security from Google Cloud's Office of the CISO, twice a month.

Ever wondered how Google does security? As part of our “How Google Does It” series, we share insights, observations, and top tips about how Google approaches some of today's most pressing security topics, challenges, and concerns — straight from Google experts. In this edition, Daniel Fabian, director, Red Teaming, shares lessons from the Google AI Red Team on uncovering vulnerabilities in AI systems before attackers do.

Since its inception, the Google Red Team has become an integral part of our security approach, a reliable sparring partner for our defense teams to better protect our employees, users, and customers. We continuously evolve our red teaming practices to meet the latest innovations in technology, including creating a dedicated AI Red Team at Google.

Similar to our traditional red team, the AI Red Team conducts real-world attacks to simulate adversaries targeting Google, from nation states and advanced persistent threat (APT) groups to cybercriminals, hacktivists, and malicious insiders. However, we have a singular mission: Get inside the minds of the threat actors going after AI deployments.

For this reason, the AI Red Team has specialized AI subject matter expertise necessary to carry out complex technical attacks that emulate what threat intelligence teams see today, and also prepare Google for what novel attacks adversaries might attempt in the future. These engagements play a crucial role in helping us identify potential vulnerabilities and weaknesses, improving our ability to anticipate attacks and develop stronger, faster defenses.

While classic security defenses are often built upon a wealth of historical breach data, the fortunate rarity (so far) of real-world AI attacks makes this type of red teaming one of our most vital tools for preparedness. Here’s an inside look at the key aspects of how we red team our way to safer, more secure AI.

1. Build realistic attack scenarios

When it comes to simulating realistic attack scenarios for AI, the AI Red Team has to think like an adversary. To start an exercise, we define who an attacker is, what their capabilities are, and the goals they want to achieve. We then come up with ideas about how this attacker would accomplish their objectives, considering what they might target and the steps they would need to take to be successful.

We review the latest adversarial research, and where we are integrating AI across Google, to help us understand what attacks are practical and realistic compared to those that are still too theoretical to execute. For example, prompt injection attacks now pose significantly more risk as AI agents have progressed from handling basic tasks like answering questions to performing complex, multi-step business workflows, simultaneously ingesting sensitive data and performing critical actions.

The vast majority of security issues related to AI only materialize once the model is integrated into products and given the ability to act, including accessing sensitive information.

Tweet this quote

These threats change constantly over time as AI systems and capabilities become more powerful. It’s imperative for us to know what’s feasible now for our real products and capabilities that rely on AI, but also what kind of attacks will be possible in the future. To help us keep up, we use the latest insights from Google threat intelligence teams including Mandiant, the Google Threat Intelligence Group (GTIG), content abuse red teaming from our Trust and Safety team, and the latest adversarial research from Google DeepMind.

2. Shift from a deterministic to a probabilistic mindset

One of the most counterintuitive lessons we have learned about attacking AI is that it’s closer to social engineering than the deterministic, reproducible exploits we are familiar with from traditional cybersecurity. AI systems operate probabilistically, which makes them great at recognizing patterns and resilient to random noise and uncertainty.

However when it comes to attacks, this probabilistic nature can work in favor of attackers, allowing them to intentionally probe AI models to discover the specific point where they start behaving incorrectly.

Instead of looking for flaws in the code, attacks against AI increasingly center on persuading a model to violate its guardrails or act against the interest of the product or the user. This is also why from a security perspective, a model in isolation is generally not an attractive target for an adversary (except attempting to steal the model weights).

The vast majority of security issues related to AI only materialize once the model is integrated into products and given the ability to act, including accessing sensitive information. In the context of agents, their knowledge of sensitive information such as business documents and their ability to interact with the "real world" (for example unlocking the front door, or ordering food for the user) makes them a very interesting target for attackers.

In response to this shift, the AI Red Team has changed its methods of attack to test a range of system defenses, using the tactics, techniques, and procedures (TTPs) we consider most relevant and realistic for real-world adversaries. We simulate a wide range of attacks, including prompt attacks, training data extraction, backdooring the model, adversarial examples, data poisoning, and exfiltration.

We also refine this list as capabilities, threats, and motivations change. Regularly evaluating and updating our methods is particularly important in the AI era, where the technology is maturing at an incredible pace. An attack that is relatively harmless today could become devastating tomorrow.

3. Combine traditional security and AI expertise

We believe that for realistic adversarial simulation, it’s critical to combine both classic security and AI subject matter expertise whenever possible. Real-world threat actors don't care about organizational boundaries, and they will use whatever means are necessary to achieve their goals. In some cases, targeted attacks on AI might be the path of least resistance, and in other instances, traditional security exploits are.

We take this into account for our red team exercises and regularly collaborate with our traditional red team, sharing ideas and skill sets to pull off realistic end-to-end adversarial operations. For example, some of the TTPs we use to target AI might require specific internal access, such as compromising an internal system, lateral movement, or gaining access to relevant AI pipelines.

In these scenarios, the two teams might work together to get the AI Red Team in position to execute a successful attack. Overall, we have found that incorporating emerging attack patterns and methods into standard threat actor operating procedures is very effective at helping identify and resolve potential issues, and preparing our defense teams for what the future might look like.

4. Never forget the rules of engagement

At Google, the number one priority is always the security and privacy of our users. While we don’t have many constraints around what we can target, we do have strict rules of engagement that outline exactly what we can and can’t do.

For example, our engagements are limited to systems, services, and devices fully owned and managed by Alphabet. We also can’t coerce, bribe, or threaten any targets. Most importantly, none of our exercises ever have access to real customer data. Even if we discover an issue that might result in us gaining access, we take steps to make sure no actual customer data is touched under any circumstances.

To keep our exercises realistic — real threat actors don't abide by our rules of engagement after all — we set up authentic simulations, such as creating synthetic accounts that are okay to target.

In addition, our rules of engagement also require us to keep a detailed activity log of everything that happens in the course of an exercise. This log serves three purposes:

It provides an auditable trail, protecting live systems and data, and also the red teamers working on the exercise.
It helps the blue team confidently distinguish red team activity from actual attacks.
It’s used after an exercise to compare notes with what the blue team detected and what they missed.

What’s next: Don’t be afraid of AI

For many red teams, attacking complex AI systems can be daunting and intimidating. However, we have found that the most critical asset is something most red teams already have: a strong attacker mindset.

While it can be helpful to understand as much as possible about how an AI system works, many attacks (such as prompt injection) don't require a PhD in CS and Math. Being able to think like a threat actor, imagining the most likely attack paths, strategies, tools and approaches is what leads to the most realistic exercises, and the best lessons on how to stop them.

Using AI for attacks presents new challenges for blue teams, as adversaries move at machine speed on the network, and sensitive information may have already been exfiltrated by the time the detection pipeline has raised an attack to the SOC. We're putting a lot of effort into using AI for red teaming in part because we know that real-world attackers are also using AI: They’re making their attacks faster, more sophisticated, and on a larger scale than what we've seen before.

Advancing our mission means we never stop learning. We rigorously assess the impact of all simulated attacks, analyzing their impact and the resilience of our detection and prevention capabilities. These results are documented and shared as attack narratives with the relevant stakeholders and teams, helping us to improve our security approaches, fuel our research, and inform our development efforts and security investments as we present novel but no less exciting challenges.

This article includes insights from the episodes, “AI Red Teaming: Surprises, Strategies, and Lessons from Google” and “How We Attack AI? Learn More at Our RSA Conference Panel!” of the Cloud Security Podcast.

Posted in

Security & Identity

https://storage.googleapis.com/gweb-cloudblog-publish/images/GettyImages-2195455476.max-700x700.jpg

Security & Identity

5 lessons from red teaming AI applications

By Brice Daniels • 4-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/GettyImages-1312787030.max-700x700.jpg

Security & Identity

How to stop AI voice clones from bypassing your security perimeter

By Tom McWalters • 3-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/GettyImages-1389508110.max-700x700.jpg

Security & Identity

Leadership lessons from The Cyber-Savvy Boardroom podcast

By David Homovich • 2-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/GettyImages-672053694.max-700x700.jpg

Security & Identity

Why AI-powered cyber fraud is winning — and how we fight back

By Marina Kaganovich • 7-minute read

How Google Does It: Building an effective AI red team

Daniel Fabian

Seth Rosenblatt

Get original CISO insights in your inbox

1. Build realistic attack scenarios

The vast majority of security issues related to AI only materialize once the model is integrated into products and given the ability to act, including accessing sensitive information.

2. Shift from a deterministic to a probabilistic mindset

3. Combine traditional security and AI expertise

4. Never forget the rules of engagement

What’s next: Don’t be afraid of AI

Related articles

5 lessons from red teaming AI applications

How to stop AI voice clones from bypassing your security perimeter

Leadership lessons from The Cyber-Savvy Boardroom podcast

Why AI-powered cyber fraud is winning — and how we fight back