Systems

At Google, the future is multiarch; AI and automation are helping us get there

October 21, 2025

Parthasarathy Ranganathan

VP, Engineering Fellow

Wolff Dobson

Developer Relations Engineer

Try Gemini 3

Our most intelligent model is now available on Vertex AI and Gemini Enterprise

Try now

Google Axion processors, our first custom Arm®-based CPUs, mark a major step in delivering both performance and energy efficiency for Google Cloud customers and our first-party services, providing up to 65% better price-performance and up to 60% more energy-efficient than comparable instances on Google Cloud.

We put Axion processors to the test: running Google production services. Now that our clusters contain both x86 and Axion Arm-based machines, Google's production services are able to run tasks simultaneously on multiple instruction-set architectures (ISAs). Today, this means most binaries that compile for x86 now need to compile to both x86 and Arm at the same time — no small thing when you consider that the Google environment includes over 100,000 applications!

We recently published a preprint of a paper called "Instruction Set Migration at Warehouse Scale" about our migration process, in which we analyze 38,156 commits we made to Google's giant monorepo, Google3. To make a long story short, the paper describes the combination of hard work, automation, and AI we used to get to where we are today. We currently serve Google services in production on Arm and x86 simultaneously including YouTube, Gmail, and BigQuery, and we have migrated more than 30,000 applications to Arm, with Arm hardware fully-subscribed and more servers deployed each month.

Let's take a brief look at two steps on our journey to make Google multi-architecture, or ‘multiarch’: an analysis of migration patterns, and exploring the use of AI in porting the code. For more, be sure to read the entire paper.

Migrating all of Google's services to multiarch

Going into a migration from x86-only to Arm and x86, both the multiarch team and the application owners assumed that we would be spending time on architectural differences such as floating point drift, concurrency, intrinsics such as platform-specific operators, and performance.

At first, we migrated some of our top jobs like F1, Spanner, and Bigtable using typical software practices, complete with weekly meetings and dedicated engineers. In this early period, we found evidence of the above issues, but not nearly as many as we expected. It turns out modern compilers and tools like sanitizers have shaken out most of the surprises. Instead, we spent the majority of our time working on issues like:

fixing tests that broke because they overfit to our existing x86 servers
updating intricate build and release systems, usually for our oldest and highest-traffic services
resolving rollout issues in production configurations
taking care to avoid destabilizing critical systems

Moving a dozen applications to Arm this way absolutely worked, and we were proud to get things running on Borg, our cluster management system. As one engineer remarked, "Everyone fixated on the totally different toolchain, and [assumed] surely everything would break. The majority of the difficulty was configs and boring stuff."

And yet, it's not sufficient to migrate a few big jobs and be done. Although ~60% of our running compute is in our top 50 applications, the curve of usage across the remaining applications in Google's monorepo is relatively flat. The more jobs that can run on multiple architectures, the easier it is for Borg to fit them efficiently into cells. For good utilization of our Arm servers, then, we needed to address this long list of the remaining 100,000+ applications.

The multiarch team could not effectively reach out to so many application owners; just setting up the meetings would have been cost-prohibitive! Instead, we have relied on automation, helping to minimize involvement from the application teams themselves.

Automation tools
We had many sources of automation to help us, some of which we already used widely at Google before we started the multiarch migration. These include:

Rosie, which lets us programmatically generate large numbers of commits and shepherd them through the code review process. For example, the commit could be one line to enable Arm in a job's Blueprint: "arm_variant_mode = ::blueprint::VariantMode::VARIANT_MODE_RELEASE"
Sanitizers and fuzzers, which catch common differences in execution between x86 and Arm (e.g., data races that are hidden by x86's TSO memory model). Catching these kinds of issues ahead of time avoids non-deterministic, hard-to-debug behavior when recompiling to a new ISA.
Continuous Health Monitoring Platform (CHAMP), which is a new automated framework for rolling out and monitoring multiarch jobs. It automatically evicts jobs that cause issues on Arm, such as crash-looping or exhibiting very slow throughput, for later offline tuning and debugging.

We also began using an AI-based migration tool called CogniPort — more on that below.

Analysis
The 38,156 commits to our code monorepo constituted most of the commits across the entire ISA migration project, from huge jobs like Bigtable to myriad tiny ones. To analyze these commits, we passed the commit messages and code diffs into Gemini Flash LLM’s 1M token context window in groups of 100, generating 16 categories of commits in four overarching groups.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_MLZW4Y1.max-2200x2200.jpg

Figure 1: Commits fall into four overarching groups.

Once we had a final list, we ran commits again through the model and had it assign one of these 16 categories to each of them (as well as an additional "Uncategorized'' category, which improved stability of the categorization by catching outliers).

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_DDGyjo7.max-1200x1200.jpg

Figure 2: Code examples in the first two categories. More examples are available in the paper.

Altogether, this analysis covered about 700K changed lines of code. We plotted the timeline of our ISA migration, normalized, as lines of code per day or month changed over time.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image2_go6bg5V.max-1200x1200.png

Figure 3: CLs by category by time, normalized.

As you can see, as we started our multiarch toolchain, the largest set of commits were in tooling and test adaptation. Over time, a larger fraction of commits were around code adaptation, aligned with the first few large applications that we migrated. During this phase, the focus was on updating code in shared dependencies and addressing common issues in code and tests as we prepared for scale. In the final phase of the process, almost all commits were configuration files and supporting processes. We also saw that, in this later phase, the number of merged commits rapidly increased, capturing the scale-up of the migration to the whole repository.

https://storage.googleapis.com/gweb-cloudblog-publish/images/4_Commits_by_category_over_time_1200.max-1100x1100.png

Figure 4: CLs by category by time, in raw counts.

It’s worth noting that, overall, most commits related to migration are small. The largest commits are often to very large lists or configurations, as opposed to signaling more inherent complexity or intricate changes to single files.

Automating ISA migrations with AI

Modern generative AI techniques represent an opportunity to automate the remainder of the ISA migration process. We built an agent called CogniPort which aims to close this gap. CogniPort operates on build and test errors. If at any point in the process, an Arm library, binary, or test does not build or a test fails with an error, the agent steps in and aims to fix the problem automatically. As a first step, we have already used CogniPort's Blueprint editing mode to generate migration commits that do not lend themselves to simple changes.

The agent consists of three nested agentic loops, shown below. Each loop executes an LLM to produce one step of reasoning and a tool invocation. The tool is executed and the outputs are attached to the agent's context.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_cCFsU5D.max-1300x1300.png

Figure 5: CogniPort

The outermost agent loop is an orchestrator that repeatedly calls the two other agents, the build-fixer agent and the test-fixer agent. The build-fixer agent tries to build a particular target and makes modifications to files until the target builds successfully or the agent gives up. The test-fixer agent tries to run a particular test and makes modifications until the test succeeds or the agent gives up (and in the process, it may use the build-fixer agent to address build failures in the test).

Testing CogniPort

While we only recently scaled up CogniPort usage to high levels, we had the opportunity to more formally test its behavior by taking historic commits from the dataset above that were created without AI assistance. Focusing on Code & Test Adaptation (categories 1-8) commits that we could cleanly roll back (not all of the other categories were suitable for this approach), we generated a benchmark set of 245 commits. We then rolled the commits back and evaluated whether the agent was able to fix them.

https://storage.googleapis.com/gweb-cloudblog-publish/images/6_Cogniport.max-2200x2200.png

Figure 6: CogniPort results

Despite no special prompts or other optimizations, early tests were very encouraging, successfully fixing failed tests 30% of the time. CogniPort was particularly effective for test fixes, platform-specific conditionals, and data representation fixes. We're confident that as we invest in further optimizations of this approach, we will be even more successful.

A multiarch future

From here, we still have tens of thousands more applications to address with automation. To cover future code growth, all new applications are designed to be multiarch by default. We will continue to use CogniPort to fix tests and configurations, and we will also work with application owners on trickier changes. (One lesson of this project is how well owners tend to know their code!)

Yet, we’re increasingly confident in our goal of driving Google's monorepo towards architecture neutrality for production services, for a variety of reasons:

All of the code used for production services is visible in a vast monorepo (still).
Most of the structural changes we need to build, run, and debug multiarch applications are done.
Existing automation like Rosie and the recently developed CHAMP allows us to keep expanding release and rollout targets without much intervention on our part.
Last but not least, LLM-based automation will allow us to address much of the remaining long tail of applications for a multi-ISA Google fleet.

To read even more about what we learned, don't miss the paper itself. And to learn about our chip designs and how we’re operating a more sustainable cloud, you can read about Axion at g.co/cloud/axion.

_{This blog post and the associated paper represents the work of a very large team. The paper authors are Eric Christopher, Kevin Crossan, Wolff Dobson, Chris Kennelly, Drew Lewis, Kun Lin, Martin Maas, Parthasarathy Ranganathan, Emma Rapati, and Brian Yang, in collaboration with dozens of other Googlers working on our Arm porting efforts.}

Posted in