5 lessons from red teaming AI applications

Brice Daniels
Head of Northeast Offensive Security Services, Mandiant, Google Cloud
Muhammad Muneer
Principal Security Consultant, Mandiant, Google Cloud
Get original CISO insights in your inbox
The latest on security from Google Cloud's Office of the CISO, twice a month.
SubscribeThe rapid pace of AI innovation has created immense pressure for businesses to prioritize feature development and time-to-market over security hardening. While it may seem that these are mutually exclusive, we strongly believe that building AI applications securely is a critical component of the development and time-to-market process.
"AI trust and the responsible AI (RAI) practices that enable trust are no longer a tangential concern but a foundational requirement for realizing the full potential of the technology,” said McKinsey in its 2026 AI Trust Maturity Survey.
Adhering to the right guardrails can give development a competitive boost. To help you build AI securely, Mandiant has developed a proactive, risk-based approach centered on the Good AI Assessment (GAIA) Top 10, outlined in our new report, Secure Development of Generative AI Applications: A Proactive Approach.
Mandiant offensive security (red) teams are on the front lines, stress-testing AI systems to understand their unique vulnerabilities. This hands-on experience has shaped our comprehensive roadmap for building secure AI applications from the ground up. We've distilled these recent findings into five critical lessons to help you securely develop and deploy AI applications.
Case study: A chatbot relying on past dialogue
As part of Mandiant’s efforts to better understand the challenges you face, the Mandiant offensive security team conducted adversarial assessments on AI applications to demonstrate vulnerabilities in real-world scenarios.
One particularly insightful case involved a pre-production banking chatbot. This chatbot, typical of many rapidly-prototyped AI applications, provided its users with advanced feature capabilities but offered only a low-to-moderate level security posture.
The Mandiant offensive security team quickly uncovered an exposed API endpoint, allowing for a SQL injection and remote control of the database server. By intercepting the chat history data, which had been sent back to the server in easily-modifiable JSON, the offensive team was able to inject fake "system" messages.
The generative AI pipeline introduces distinct points of vulnerability beyond the model itself. Security isn't a final layer; it should be integrated across all phases of the SDLC, from data ingestion and processing through to deployment.
The large language model, relying heavily on past dialogue, accepted this falsified history as fact, bypassed its primary instructions, and allowed the Mandiant offensive team to make unauthorized account changes.
Given the relative ease with which our red team was able to exploit the model, you should consider it mission-critical to go beyond securing mere model prompts and secure the entire AI application.
1. Defend your AI pipeline from end to end
The generative AI pipeline introduces distinct points of vulnerability beyond the model itself. Security isn't a final layer; it should be integrated across all phases of the SDLC, from data ingestion and processing through to deployment. Relying solely on strong system prompts leaves applications exposed if the underlying infrastructure and data pipelines are vulnerable to manipulation.
2. Don't take front-end data at face value
Threat actors will actively try to intercept and modify client-side data structures, like JSON payloads and chat logs, to execute indirect prompt injections. Design your application with a Zero Trust mindset and never assume that conversation history coming from the user interface is authentic.
To prevent attackers from rewriting history or embedding malicious prompts, move from trusting user input to continuously verifying it. Use cryptographic signatures, such as HMAC, to confirm the integrity of the conversation context before it reaches the model.
3. Lock the door on system-level prompts
Verifying data integrity is only half the battle: You must also restrict the commands that the model is allowed to process from the user. It’s crucial to block privilege escalation at the application layer.
We recommend configuring your application logic to sanitize data and automatically drop or block privileged system messages that originate from the user interface. By establishing strict boundaries, you can prevent attackers from assigning themselves administrative roles to bypass your security policies or subvert the AI's core directives.
4. Stick to application security basics
AI applications rely heavily on third-party libraries, orchestrators, and components, and that comes with traditional application security and supply chain risks. As adversaries use AI to accelerate vulnerability discovery, organizations should strengthen and scale their vulnerability management capabilities.
It’s simply not enough to secure the model. Developers should apply application security testing and vulnerability scanning to their entire AI tech stack.
5. Build an early warning system
Integrate application and infrastructure logs with centralized security monitoring tools. This enables real-time detection and response to attacks targeting databases, infrastructure, and the AI itself.
Proactive monitoring is your early warning system, allowing you to quickly identify and mitigate threats like model poisoning, data leakage, adversarial attacks, and supply chain vulnerabilities specific to AI components.
Mandiant can help you take a proactive approach
A proactive approach, grounded in offensive security testing and comprehensive risk-based approaches like the GAIA Top 10, is essential. Mandiant is at the forefront of helping secure generative AI applications, providing the insights and strategies needed to build trust, empower responsible innovation, and protect against the next generation of threats.
To learn more about securing your AI applications, read the full report here.



