Infrastructure Modernization

Why Google keeps building custom silicon: The story behind Axion

July 31, 2024

https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_Ab4940U.max-2000x2000.jpg

Mark Lohmeyer

VP & GM, Compute and AI Infrastructure

Parthasarathy Ranganathan

VP, Engineering Fellow

There’s nothing traditional about these high-performance CPUs — except for Google’s longstanding tradition of developing chips to meet evolving business needs.

Try Gemini 1.5 models

Google's most advanced multimodal models are now widely available.

Got AI?

Editor’s note 10/31: The first Google Axion Processor is now GA: Check out C4A

Everything runs on CPUs.

Compute power, delivered by a range of chips that include CPUs, GPUs, and TPUs, underpins nearly every large-scale service in the cloud.

GPUs and our own AI-optimized TPUs have grabbed most of the attention lately, for their role in accelerating progress in the AI era. And yet it’s those general-purpose CPUs that still handle the lion's share of workloads — everything from computationally-heavy data analytics and financial31modeling applications to more straightforward web applications or little-used but important microservices.

The world seems to agree just how important CPUs are, given the response to our announcement of the Google Axion Processor at Google Cloud Next ‘24. The unveiling of Axion as our first custom Arm-based CPU quickly became one of our most widely reported and discussed releases this year, and we’ve seen considerable interest from customers since then.

As a company with its roots online, Google has always prioritized computing hardware, going back to the early days of engineers stringing up servers in garages and industrial spaces around the Valley. In fact, this warehouse-scale computing effectively laid the foundation for what would become Google Cloud. Our push into chip design came more than a decade ago, but the rationale has always been the same: the more we could do to shape our own hardware and software systems, the more we could do to shape our own destiny.

Axion is the latest leap in this journey — though far from the last, as we followed it up a month later with the introduction of the sixth generation of TPUs, named Trillium. To continue innovating in technology, we’ll continue innovating in silicon.

[Hear first hand how we built a decade of TPUs from the people who were there.]

It was our experience of building such specialized chips not only for AI but also mobile and video streaming that gave us the confidence to tackle the more generalized though complex needs of CPUs (it sounds counterintuitive, but a general-purpose chip like Axion needs to handle a wider range of applications, which necessitates its more complex design). We now knew how to bring together teams of software engineers, hardware engineers, researchers, and partners under one roof to co-design custom silicon from the ground up.

https://storage.googleapis.com/gweb-cloudblog-publish/images/12_Google_Cloud_Next_2024_w0BSeaK.max-1700x1700.jpg

When Thomas Kurian, Google Cloud's CEO, unveiled the Axion chip at Google Cloud Next '24, it became one of the most talked about things at the event.

In many ways, the release of Axion brings our work on collaborative, capabilities-first, customer-focused hardware design full circle. What started with the dream of making the world's information more accessible — anytime and anywhere — from a few server racks in a California garage is now the basis for what draws customers to Google Cloud. Axion is the latest milestone of that decades-long focus.

Performance, flexibility, and efficiency with Arm

Axion was designed specifically to give our customers a CPU option that’s generally more performant and energy-efficient than we’ve ever built on — thereby helping to fulfill Google’s sustainability mission and those of our customers and partners.

Axion also gives us the ability to more readily build customer feedback into our chip designs. This flexibility is essential if we want to more quickly and broadly meet the infrastructure needs of our customers and partners as demands on compute continue to rapidly evolve, especially in the fast-moving age of AI. You can see the same considerations at play with the creation of our AI Hypercomputer: a focus on adaptive, responsive AI-optimized infrastructure, across hardware and software, that readily meets the needs of customers.

With Axion starting to work as the core of cores in our data centers, we’re already seeing the better efficiency and performance of having this new silicon that’s fit for purpose. The Axion processors are built with the Arm Neoverse V2 compute cores and offer better performance than the fastest general-purpose Arm-based instances available in the cloud.

We also built Axion to help deliver greater sustainability to the cloud. These processors are up to 60% more energy-efficient than comparable current generation x86 instances, reducing consumption and helping us make strides in Google Cloud’s goal to be the most sustainable cloud on the market. Axion also boosts efficiency across the entire data center thanks to the CPU’s deep integration with our data management systems.

At our Axion announcement, Korwin Smith, the senior director of engineering and cloud infrastructure at Snap, expressed how the chips could provide a boost to the social network’s business: “We're constantly optimizing our infrastructure for performance and efficiency. Google's new Axion Arm-based CPU promises major leaps forward in both. The potential to serve our community with these gains while leading on our sustainability goals is incredibly exciting.”

Chips for us, and everyone

Eager as we’ve been to bring these capabilities to the world, Axion was a long-term project like all the chips that came before it.

Building custom silicon into a co-designed hardware and software service stack takes years of collaboration across development teams at Google and in the open-source community. While Axion development was in the early stages, we started deploying Arm-based servers to get key Google services ready. BigTable, Spanner, BigQuery, Blobstore, Pub/Sub, Google Earth Engine, and the YouTube Ads platform ran on these earlier Google Arm-based servers.

We learned lessons in building and operating multi-architecture code and how best to deploy workloads where they run best: some still belonged on x86 CPUs, others benefited from the new Arm-based general-purpose compute, while still others were best on storage-optimized instances, or with TPU and GPU acceleration for demanding AI training and inference applications.

The flexibility of Axion is essential if we want to more quickly and broadly meet the infrastructure needs of our customers and partners — especially in the fast-moving age of AI.

Tweet this quote

As we built for our customers, we also made open-source contributions that benefit developer communities everywhere.

We open-sourced Android and Tensorflow on Arm, and optimized Go for Arm-based platforms. We contributed to the Arm SystemReady Virtual Environment (VE), a standard that ensures anyone already running a workload on an existing Arm-based server can seamlessly migrate to Axion instances. We recently launched support for Arm-based instances migration in the Migrate to Virtual Machines service and support for Arm-compatible software and solutions on the Google Cloud Marketplace and in the broader ecosystem.

This commitment to openness and collaboration, to globe-spanning networks and the smallest components of each server, has been a feature of Google and Google Cloud since day one. Our multi-year journey to Axion is only just beginning. The ability to work with our customers and partners to continue to refine the system, delivering performance, efficiency, and services in new ways, couldn’t be more exciting.

Just like our earlier infrastructure innovations, we can’t wait to see where it takes us — and we’re excited to build that future together.

Posted in

Infrastructure Modernization

https://storage.googleapis.com/gweb-cloudblog-publish/images/ai-specialized-chips-tpu-history-gen-ai-chip.max-700x700.jpg