AI agent security: How to protect digital sidekicks (and your business)

Anton Chuvakin
Security Advisor, Office of the CISO
Get original CISO insights in your inbox
The latest on security from Google Cloud's Office of the CISO, twice a month.
SubscribeThe promise of AI agents is immense. Powerful software that can plan, make decisions, and accurately act on your behalf at work? Yes please.
But what if your digital assistant goes rogue? Without robust, flexible security in place, an AI agent could leak sensitive company data, cause a system outage, and even expose your business to security threats. That's why understanding agent security is so important.
The ABCs of AI agents
Agentic AI is the broader framework of AI systems that can act independently to achieve a goal. It's the "agency" from which an AI derives its ability to think, plan, and act with limited human oversight.
Threats to AI agents often exploit their unique ability to autonomously execute complex tasks and interact with external systems. Because these agents can act without direct human intervention, a compromised agent can lead to significantly increased threat risk.
AI agents are specific software programs, components in an agentic AI system. It's the individual "digital sidekick" that uses large language models (LLMs) to perceive its environment, make decisions, and perform tasks to reach a specific goal.
The difference between AI agents, chatbots, and LLMs is their capacity for autonomy and action. While LLMs are the brain, and chatbots are the mouthpiece, AI agents are the hands — they can actually "do" things.
This distinction is crucial for CISOs because the security risks scale with the level of autonomy. Now let’s take a look at the risks that come with AI agents.
Breaking down the threats
Threats to AI agents often exploit their unique ability to autonomously execute complex tasks and interact with external systems. Because these agents can act without direct human intervention, a compromised agent can lead to significantly increased threat risk.
AI agents can expand your attack surface because they combine the security risks of traditional software with vulnerabilities specific to AI. For example, tool misuse, such as when a threat actor uses an AI agent to advance a phishing campaign, is one way that AI agents introduce a broader attack surface.
Another risk is data loss, which can occur when an attacker uses a prompt injection attack to manipulate an agent into revealing sensitive information. Attackers can exploit an agent's resource consumption to launch a denial of service (DoS) attack, overwhelming the system and disrupting its operations.
The data leak
AI agents are most useful when they have access to your company's data, including client information, financial reports, and strategic documents, but that level of access carries significant risk.
Agents, like LLMs, can be vulnerable to "jailbreaks," which are clever prompts that are designed to bypass their safety features and expose confidential information. A malicious actor could trick an agent into sending them sensitive data by crafting a prompt that tricks the agent into ignoring its security protocols and data access limitations.
For example, an agent tasked with summarizing emails could be fed a jailbreak prompt that causes it to leak confidential information to an unauthorized recipient. This can lead to a significant breach of privacy and could violate regulations.
Sneak attacks and other rogue actions
AI agents are built from components including LLMs and software libraries. Each is a potential weakness.
A malicious actor could inject harmful code into a software library, "poisoning" the data the LLM was trained on. This attack is particularly insidious because the malicious code or data is integrated into the agent before it's even deployed, making it difficult to detect.
For example, an IT support agent might use a knowledge base that has been intentionally poisoned to recommend employees download malicious software to "fix" their problem, leading to a widespread security breach.
The denial-of-service (DoS) disaster
Agents are built to perform complex tasks that often involve many small actions. If an agent's logic is flawed, or it receives an unexpected input, it could get stuck in a loop.
Imagine an agent designed to monitor system health. If it malfunctions, perhaps due to a malicious input crafted to exploit a vulnerability, it could enter a state where it endlessly consumes all of a server's resources. This could involve making a continuous stream of API calls, endlessly writing to a log file, or recursively calling itself. Ultimately, this situation could bring down the entire system and could cost the company revenue.
Follow these guiding principles for securing AI agents
For AI agents to be safe and effective, they should be governed by these four core principles.
Agents should have well-defined human controllers. A human should always be in charge, deciding what the agent does and doesn't do. If an agent's tasks become too complex for a human to oversee, it's a sign that the job should be broken down into smaller, more manageable tasks for multiple agents.
Securing AI agents requires a continuous, layered approach that combines traditional security principles with new, AI-specific methods.
The agent's powers should have clear limitations. You need to define the agent's agency (the raw power it holds) and its scope (the field of applicability, or where that power can be used). Limiting these dimensions prevents the agent from operating in areas where it isn't needed.
The agent's actions and planning should be easily observable. Every action an agent takes should be logged and analyzed to help you to monitor its behavior, detect anomalies, and audit its decisions — just as you would with any other system.
Effective agent security should start with defense in depth. As with any technology, organizations should develop a layered approach that combines both traditional security methods and new, AI-specific ones.
Here’s how to get started.
How to secure your agents
Securing AI agents requires a continuous, layered approach that combines traditional security principles with new, AI-specific methods.
Go old school with traditional security
You can start by applying time-tested security principles that you probably already use for employees. Securing AI agents begins with a foundational approach of authentication, authorization, and auditability.
First, authentication ensures that every agent has a unique identity, allowing you to verify that it is what it claims to be. This crucial step can help prevent unauthorized agents from accessing your systems.
Next, authorization dictates the permissions an agent is granted. Be sure to follow the principle of least privilege, which means you should only give an agent the bare minimum access it needs to perform its job and nothing more, limiting its potential to cause harm.
Finally, auditability involves logging every action the agent takes. This creates a clear record, or audit trail, that you can use to investigate what happened and why it happened if something goes wrong. Agent application logs play a critical role here, whether humans or machines analyze them.
Beyond these approaches, a strong security strategy should also include secure software development. Since agents are software, they should be built with security in mind — from the start. This includes using safe, trusted libraries, consistently scanning the code for potential vulnerabilities, and following appropriate standards for software bills of materials (SBOM) in conjunction with Google’s Supply chain Levels for Software Artifacts (SLSA) framework.
Be modern with AI-specific methods
When securing AI agents, it's not enough to rely on traditional security measures alone. You also should embrace new, AI-specific methods designed for today’s ever-evolving challenges.
One such method to explore is to use guard models. Think of a guard model as a digital supervisor who monitors the agent's actions. Before an agent performs a high-impact task, like deleting a file, the guard model steps in to check if the action aligns with your security policies, adding an extra layer of intelligent oversight that can prevent malicious attacks and unintentional errors.
Another technique is to employ adversarial training, a fire drill for your AI agent. You intentionally expose it to unexpected and malicious inputs in a controlled environment to see how it might be exploited.
By understanding these vulnerabilities, you can make the agent more resilient and robust. If you're using public models, we strongly recommend that you confirm that the model provider has performed rigorous testing.
Unlock greater AI potential — securely
Securing agents is a continuous process, one that requires human leadership, continuous learning, and strategic upskilling. By empowering your teams with the knowledge and tools to navigate the evolving landscape, you can harness the incredible power of AI agents while staying on top of the risks — and unintended consequences.
To learn more, you can check out our recent newsletter on securing AI agents.