Jump to Content
AI & Machine Learning

When AI writes the code, who reviews it?

April 28, 2026
https://storage.googleapis.com/gweb-cloudblog-publish/images/GettyImages-2207848138.max-2600x2600.jpg
Lee Boonstra

Software Engineer, Office of the CTO

AI-driven coding has shifted the software bottleneck from writing to reviewing, requiring new guardrails to manage the human workload.

Try Gemini Enterprise Business Edition today

The front door to AI in the workplace

Try now

At Google's Office of the CTO, our team runs as a hybrid model where humans and AI agents collaborate on real operational work. Software development got fast. Incredibly fast. But that speed exposed something we didn't expect: the bottleneck for shipping software shifted from writing code to reviewing and integrating it.

What a massive pull request taught us

I had been working for days on a massive feature for our agent, including a dashboard rewrite. The pull request (PR) was enormous. The changes scrolled for screen after screen, and no single reviewer could grasp it all. When we tried to split it into manageable chunks, our existing workflow simply wasn't built for the volume of AI-generated commits.

Merge conflicts multiplied as developers landed in the same files. We created a dependency chain we couldn't untangle: PR #1 couldn't merge without PR #2, which needed PR #3, but PR #3 was blocked by a reviewer in a different timezone. PR #4 was already approved, since it was a simple one. Some changes got approved while related ones sat waiting for review, spawning even more conflicts.

By the week's end, nothing was testable as a whole. The main branch was broken, staging was stuck on orange alert, and the team chat had turned into a mutual support group.

Day to day, this showed up in three ways:

  • Merge conflicts: Multiple developers landing on the same file within the hour.

  • Review gridlock: A massive PR becomes a Russian-doll of sub-PRs.

  • Context fragmentation: While you grab coffee, a teammate renames a variable in a shared file; your agent, quoting yesterday's snapshot, cheerfully mints code that calls a function that no longer exists.

When development accelerates, the bug-to-code ratio remains constant. The actual error rate doesn't change, just who receives the blame and how quickly it happens. To survive this, we extracted three critical lessons:

  • Technical guardrails: We implemented syntax linters, rules, skills, and mandatory AI-generated test coverage to create a safety net for rapid development.

  • Reimagined ownership: We stopped nitpicking style on disposable, agent-written code and shifted focus to architectural blueprints. For cross-timezone teams, we instituted the "Conditional Looks Good To Me (LGTM)," approving PRs contingent on passing tests to eliminate 12-hour delays.

  • AI reviewer guides: Every PR now includes an AI-generated snapshot of what changes, potential breakage points, and a risk assessment. This helps human reviewers focus on what truly matters rather than getting lost in the lines.

Preventing burnout and "approval fatigue"

We're seeing a new phenomenon: approval fatigue. According to Quantum Workplace research reported by CNBC, frequent AI users are 45% more likely to experience high burnout than non-users. We see exactly why on our team. When faced with a constant stream of micro-approvals (improving a single line, adjusting a tool call) developers start clicking "Approve" reflexively. It's a form of low-grade exhaustion where the team stops checking the machine's work just to keep up with its pace.

To protect the team, we moved from constant oversight to structured boundaries. We set digital quiet hours so approval requests don't bleed into evenings and weekends. And we hold weekly agent insight sessions where developers share patterns their AI counterparts identified, turning isolated discoveries into shared organizational knowledge.

What happens when an agent acts without guardrails

So far, I've focused on the review and integration side. But there's another lesson we learned the hard way about what happens when an agent acts without sufficient guardrails.

During a routine code update, I discovered both the power and the limits of Antigravity's built-in UI browser. This feature allows the AI agent to interact with applications under development without requiring login credentials, making it invaluable for UX testing. However, in YOLO (auto approve) mode, it can act faster than you can think.

My simple prompt to create a button triggered an unexpected chain reaction. The browser agent autonomously clicked the new button, which was intended for an email agent. Without a specified URL, the agent hallucinated by connecting to a deprecated legacy agent with no email safeguards. The result? Fifty colleagues received false emails filled with hallucinated content.

This incident highlighted what I now call context hallucination risk: when AI lacks sufficient data, it sometimes fills gaps using whatever strings exist in its context, including sensitive information like hardcoded email addresses or URLs.

I see you laughing, and you're probably thinking, 'Who cares, it's just another email.' Fair enough. But consider what the agent was actually doing: fulfilling its directive with the data available to it, without any check on whether it should. That's the core risk with autonomous systems. Without a human-in-the-loop or a policy engine, the agent optimizes for its goal using whatever it can find. Guardrails aren't optional. They're what keep a useful tool from becoming an unpredictable one.

For me the solution required a few immediate changes:

  • Implementing a Zero-Trust Model with a policy engine requiring permission verification before any tool execution,

  • and practicing rigorous context hygiene by replacing all personally identifiable information (PII) like names and email addresses with {{placeholders}} in templates.

After all, an agent can't misuse what it can't see. Oh, and we decided to finally remove those legacy agents.

The bottleneck moved

In less than a year, our software development cycles got dramatically faster. Watching an agent produce a thousand lines of well-documented code by lunchtime, sometimes before my first Red Bull kicks in, is still a rush.

But that speed revealed something important. AI eliminated the code production bottleneck, and the constraint moved downstream to the humans who have to review, test, and integrate all that output. Better prompts and faster models won't fix that. What fixes it is evolving how teams work together, how we review code, and how we set boundaries with tools that never take a break.

The bottleneck didn't disappear. It moved from the code to the people reviewing it.

Posted in