Jump to Content
Security & Identity

Disconnected but resilient: Securing agentic AI at the extreme edge

March 16, 2026
https://storage.googleapis.com/gweb-cloudblog-publish/images/GettyImages-1306713348.max-2600x2600.jpg
Thiébaut Meyer

Director, Office of the CISO

Antoine Larmanjat

Distinguished engineer, Google Cloud

Get original CISO insights in your inbox

The latest on security from Google Cloud's Office of the CISO, twice a month.

Subscribe

If AI agents lose internet connectivity, critical systems — from autonomous vehicles to industrial infrastructure — could fail. In these constrained environments, agents risk "inference loss," making real-world decisions without accurate context.

Deploying agents at the edge presents a paradox: the need for high-performance reasoning on hardware limited by power and compute. At Google Cloud, we address this through graceful degradation, ensuring agents remain secure and operational even under harsh constraints.

Balancing compute and connectivity

From a high level, we recommend that the agent uses a frontier model (such as Google Gemini) in the cloud for complex, novel reasoning when bandwidth allows. The moment connectivity is severed, the system should degrade gracefully in ways that depend on the situation.

For high-power edge devices, including embedded robotics, drones, and vehicle systems, this means switching to a distilled model (like Gemma) that runs locally.

For extreme edge devices, including IoT sensors running on coin-cell batteries, we recommend the TinyML approach, with heavily quantized micro-models that perform simple tasks, such as keyword spotting and simple anomaly detection without waking the main processor. Distillation of large models using a teacher-student approach is also a possibility.

https://storage.googleapis.com/gweb-cloudblog-publish/images/gemini-at-work-transform-crop.max-400x400.png
How to revert to Gemma when Gemini becomes disconnected while keeping the full context.

The power limitation also drives us to consider broadening the scope of hardware. One project that could help is Coral NPU, a machine learning accelerator core designed for energy-efficient AI at the edge in collaboration between Google DeepMind and Google Research.

The network is also a crucial element because bandwidth could be extremely reduced, and we don’t want it to drain power. Standard HTTPS is often too heavy for latency-sensitive agents that require lighter and faster protocols. Moving toward HTTP/3 (based on UDP/QUIC) can resolve handshake overhead.

In addition, specialized protocols, such as agent communication protocols (ACP) that allow peer-to-peer discovery and collaboration, can reduce overhead and latency in local mesh networks.

Dynamic trust for agent identity

In an open and connected environment, we use tokens to assert identity. For example, the SPIFFE/SPIRE framework provides a uniform identity-control plane, and encourages trust-building in controlled agentic AI systems. Additional critical characteristics can be transmitted and checked through verifiable credentials (VC).

Since trust should not be seen with a static perspective, we envision a system where an agent’s "trust score" is monitored and assessed in real-time.

To prevent identity alteration and spoofing in embedded environments, agent identity can be anchored in a hardware root of trust. Trusted Platform Modules (TPM) and secure elements also can be used to cryptographically validate the operating system and the agent container before the device even boots.

Since trust should not be seen with a static perspective, we envision a system where an agent’s "trust score" is monitored and assessed in real-time. For example, if a GDPR-certified agent attempts to export raw video data instead of anonymized insights, its credentials can be revoked instantly, preventing rogue behavior.

From prompt to physical: Mitigating exploit risk

We want agent input to be multi-modal, so they can receive text, voice, pictures, videos, and more. This flexibility also can make them prone to cyberattacks, especially in resource-constrained systems that may lack the power for heavy, multi-layered security filters.

Prompt injection protection is a whole new field of cybersecurity for agents. The level of security is commensurate with the impact that a bad action, the degree of automation, or a malfunction could have on the environment the agent operates in. Security in the agentic era isn't just about preventing data leakage; it's also about preventing physical damage. We call this a prompt-to-physical exploit.

However, rogue agents may not necessarily have malicious intent — they might just be over-optimized. Consider a drone rewarded for "the shortest flight time." Without safeguards, it might "reason" that flying straight through a restricted zone is the optimal path.

To prevent such behavior, we need semantic-aware security to govern the reasoning layer. It could be achieved for instance through deterministic safety interlocks that sit between the model and the tools.

While these controls do not necessarily lessen the likelihood of a rogue action being triggered, they drastically mitigate the impact. For example, if a coding agent attempts to delete a file system, or a robotic arm attempts a motion outside its safety envelope, the circuit breaker trips, forcing the agent to fallback to a safe state.

Likewise, an agent should be capable of handling input starvation: if a sensor goes dark, the agent should recognize the data void instead of hallucinating new instructions.

Keeping agents running in a disconnected world

True operational resilience requires an agent to actively manage the failure of any internal component, whether it be a software module, a hardware sensor, or a network link.

In most cases, when a system faces external perturbations, or when the internet connection is cut, it would stop. In a constrained environment, the agent needs to keep working, and specific degraded modes should be taken into account to ensure operations and safety.

We advocate for graceful degradation workflows. For example, an agent might normally rely on a massive model in the cloud for complex reasoning. However, if connectivity is lost, the system should automatically switch to a smaller, locally embedded model to maintain basic functionality, while preserving the current context.

Beyond the model itself, this strategy can use existing solutions: local vector databases (SQLite-vec or ChromaDB for instance) to maintain context without cloud synchronization, or in the case of embodied agents (such as drones or robots), the software libraries ROS 2 (Robot Operating System) to handle Quality of Service on the edge.

When thinking about operational resilience, we should also take into account the lifecycle management of the agent, as the sequence of different operational states it transitions through depends on the health of its components. This includes the ability to patch or halt rogue agents even in disconnected environments.

From solo agents to squads

Agents rarely work alone. Whether it is a swarm of sensors or a manufacturing line, agents need to operate in teams (squads) and share data. New event-driven architectures are now emerging to enable this cooperation and orchestration.

From an architecture perspective, when there is a risk that connections can be severed, agents can be too verbose and require a lot of network interaction. For resilient agent performance, it’s preferable to place all the internal agent states on the same memory space and on the same device.

Another challenge is managing data sharing across agents on a strict need-to-know basis — this is the least privilege principle. In addition, it is necessary to orchestrate a group purpose without flooding the limited network bandwidth.

Developers also should consider what happens when groups of agents need to share data in a central place using an event-driven architecture. While these architectures are well known when the cloud, models, power, and connectivity are at hand, engineering prowess should be implemented in constrained environments when agents need to interact.

For example:

  • NATS/RabbitMQ/Redpanda as replacements of more usual messaging systems (like Kafka for example);
  • Local protected and shared files instead of cloud storage;
  • An agent architecture that limits the verbosity of communications to only what needs to be exchanged between agents.

The Google Cloud edge

The shift from conversational chatbots to autonomous agents is more than a software upgrade — it’s a hardware-aware architectural revolution. At Google Cloud, we are building the toolchain to bridge the gap between the server rack and the edge. From Gemini and Gemma models that scale with your connectivity to TPU silicon and MuJoCo digital twins, we are dedicated to building systems that don’t just reason, but act safely in the real world.

If your agents panic when they lose a signal, they aren’t ready for the field. Resilient AI requires an architecture that respects the brutal physics of power, bandwidth, and safety. Start auditing your edge workloads today: Ensure your agents are as resilient in the field as they are intelligent in the cloud.

For more information on building resilience into AI agents, check out our CISO Insights hub.

Posted in