Compute

Unlock next-gen VMs using GKE compute classes and Compute Flexible CUDs

September 29, 2025

Victor Szalvay

Product Manager

Olivia Melendez

GCE Product Manager

Organizations are consistently looking to gain an edge with the latest advancements in cloud computing. New Google Compute Engine and Google Kubernetes Engine (GKE) Gen4 machine series including N4, C4, C4A, C4D, to name a few, offer significant improvements in performance, cost-efficiency, and capabilities. However, migrating to new hardware isn't always straightforward. Teams often face challenges with compatibility testing, regional capacity, and navigating financial commitments, all of which can slow down adoption.

The good news is that two powerful Google Cloud features, when used together, provide a strategic and cost-effective path to adopting a new machine series without the usual overhead. By combining the technical agility of GKE compute classes with the financial adaptability of Compute Flexible Committed Use Discounts (Flex CUDs), you can innovate faster, maintain resilience, and optimize costs — all at the same time. Even better, Compute Flex CUDs also allow discounted consumption of Autopilot and Cloud Run —making it easy to consume the right compute for your workload. Let’s dive in.

The challenge: Overcoming hardware adoption hurdles

While adopting the latest machine series unlocks new levels of performance and efficiency, organizations can face some challenges during the transition:

Compatibility testing: Before a full migration, teams need to validate that their applications perform as expected on a new machine series. This requires a strategy for safely introducing new hardware to gather performance data and ensure compatibility.
Navigating regional capacity: As new machine series expand to more regions, their availability can vary. This creates a need for a fallback option to ensure application availability isn't impacted by capacity limitations in a specific location.
Aligning financial commitments: Resource-based CUDs provide excellent value but are tied to specific machine families and are less flexible for teams who want to adopt newer, more cost-performant hardware while still under an existing commitment term.
Migration of workloads: The process of configuring, migrating, and managing workloads across multiple machine types can be operationally complex. This requires significant coordination from platform teams to execute smoothly.

The solution, part 1: GKE compute classes

GKE compute classes provide an elegant technical solution to the challenges of hardware adoption. Instead of tying your workloads to a single machine type, you can define a prioritized list of machine families that GKE can use for autoscaling. This gives you a flexible and resilient way to incrementally integrate cutting-edge technologies.

With compute classes, you can define a policy that tells GKE to prioritize a new, cost-performant machine family (like N4) but automatically fall back to an established machine family (like N2 or N2D) you’re already using if the first choice isn't available. Compute classes allow you to safely roll out new hardware in waves, by incrementally subscribing new workloads to the compute class. This helps to minimize operational risks and downtime.

How it works: An example

Let's say you want to take advantage of the superior price-performance of the new N4 machine series for a stateless web application, but you want to fall back to the previous-generation N2 series for large, unexpected spikes in traffic

You can create a custom ComputeClass object with a prioritized list of machine families:

ComputeClass Manifest (n4-fallback-class.yaml):

This simple definition instructs the GKE cluster autoscaler to first attempt to provision nodes from the N4 family. If it can’t, it automatically tries the next option in the list, the N2 family.

Next, you reference this class in your workload's pod specification using a nodeSelector.

Workload Kubernetes manifest:

You can gradually migrate workloads to N4 using this compute class configuration by simply adding the cloud.google.com/compute-class: n4-fallback-class nodeSelector label to the workloads in question and redeploying them.

Real-world success: Shopify safely adopts new hardware

https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_dzWIO4r.max-1000x1000.png

This powerful combination of technical and financial flexibility isn't just theoretical. It's being used by leading companies today to drive real-world results. At Google Cloud Next '25, Justin Reid, a principal engineer at Shopify, shared how the company leverages GKE compute classes to power one of the world's largest GKE fleets.

GKE compute classes enabled Shopify to serve up massive scale during Black-Friday / Cyber Monday by implementing the exact strategy described above: they defined a compute class that prioritizes the new N4 machines and included N2 machines as a seamless fallback option.

"Compute Classes played a critical role in helping Shopify scale during our most demanding events... It removed a ton of operation complexity for us...” - Justin Reid, principal engineer, Shopify

Watch the whole Next ‘25 session here.

Another example: High-performance workloads with C-series Family

For demanding workloads, C-series VMs are a popular choice, offering consistently high performance and access to enterprise features such as locally attached SSDs, advanced maintenance controls, larger VM shapes, and higher CPU frequencies. You can set up a compute class to prioritize new, performant options like C4 and C4D, which deliver compelling price-performance gains over prior-generation VMs, and also include a fallback to a VM you’ve used extensively.

Your ComputeClass can set C4 or C4D as a primary VM, the other as a fallback option, and C2 VMs as a last. This can allow you to maximize your obtainability for the newest machine types and confidently take advantage of supply for multiple previous-generation platforms without sacrificing availability.

Your ComputeClass manifest might look something like this:

YAML:

By referencing cloud.google.com/compute-class: c4-c4d-fallback-class in your workload's pod specification, your demanding applications always land on the most performant and cost-efficient C-series VMs available, with a reliable fallback plan.

The solution, part 2: Compute Flexible CUDs

Technical agility is only half of the equation. Spend-based Compute Flexible CUDs provide the commercial flexibility to match. Unlike resource-based CUDs, which give you the maximum discount on one specific machine series, Flex CUDs apply to your total eligible compute spend across a wide range of machine families — including Gen4 (e.g. N4 and C4) while leveraging fallback options (e.g. C2, N2).

When you purchase a Compute Flexible CUD, you commit to a certain hourly spend on compute resources (vCPU , memory and local SSD) for a one or three-year term, receiving a significant discount in return (up to 46% off general purpose VMs for a three-year term).

How it works: An example

Imagine you've purchased a three-year Compute Flex CUD. Your GKE cluster, using the n4-fallback-class from the previous example, initially runs your workload on N4 machines. Your Compute Flex CUD discount automatically applies to that usage.

Now, suppose a sudden demand spike in your region results in GKE's compute class policy provisioning N2 machines to temporarily handle the extra load. Critically, your Compute Flex CUD discount automatically follows your workload with your discounts now apply to N2 machines. Your savings follow your spend, giving you the confidence to adopt new hardware without losing your committed use discounts.

Real world success: Verve Group

https://storage.googleapis.com/gweb-cloudblog-publish/images/image2_GqI2iyZ.max-700x700.png

Verve Group SE is a leading digital media company that empowers advertisers and publishers with AI-driven ad-software solutions, connecting them to deliver impactful campaigns with a focus on first-party data and privacy-first technologies.

"Verve uses a variety of machine series including the new C4D as well as other VMs like C3D and N2D. We use custom compute classes to orchestrate fall backs, ranked for cost/performance across regions. The bulk of our spend is covered by Compute Flex CUDs, which plays a vital role in giving us discount flexibility across many of the machine series we consume." - Pablo Loschi, Principal Systems Engineer, Verve

A winning combination for modern infrastructure

By pairing the technical resilience of GKE compute classes with the discounting adaptability of Compute Flex CUDs, you can create a robust and economically sound strategy for hardware adoption like the new generation of Compute Engine machine shapes. This integrated approach empowers you to:

Innovate safely: Gradually introduce and test new machine series with your critical workloads.
Optimize performance and cost: Leverage the latest and most cost performant hardware Google Cloud has to offer.
Enhance resilience: Ensure high availability for your applications even as you integrate new hardware.
Simplify operations: Let GKE manage the complexities of node provisioning and scaling across different machine types.

Leverage these capabilities to stay at the forefront of innovation and confidently explore and harness the benefits of Google Cloud's rapidly evolving compute landscape — securely, efficiently, and cost-effectively.

To learn more, this video offers a helpful overview of how custom compute classes can improve infrastructure autoscaling in GKE. Then, explore Compute Engine’s fourth generation machine types, GKE compute classes and Compute Flexible CUDs today!