Compute

Showcasing dynamic resource management in E2 VMs

July 21, 2021

https://storage.googleapis.com/gweb-cloudblog-publish/images/Compute.max-2600x2600.jpg

Shamel Jacobs

Product Manager

Alex Matute

Software Engineer

Try Google Cloud

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Free trial

Spending wisely is a top priority for many companies—especially when it comes to their cloud compute infrastructure. Last year, we introduced Compute Engine’s E2 VM family, which delivers cost-optimized performance for a wide variety of workloads. Our E2 machines provide up to 31% lower Total Cost Ownership (TCO) compared to our N1 machines, consistent performance across CPU platforms, and instances with up to 32 vCPUs and 128 GB of memory. This is thanks to Google’s dynamic resource management technology, which is enabled by large and efficient physical servers, intelligent VM placement, performance-aware live migration, and a specialized hypervisor CPU scheduler.

This combination of performance and cost-efficiency is driving significant growth in E2 adoption amongst our customers. They are increasingly choosing E2 as an essential building block for their various types of workloads, including web serving applications, small / medium databases, microservices and development environments, among others.

For example, Google Cloud security partner ForgeRock runs several of its identity-based solutions on E2.

"As a global IAM software company, we are tasked with addressing the world’s greatest security challenges with speed and agility, at scale. With that in mind, we are constantly exploring ways to optimize our cloud infrastructure spend while at the same time delivering on performance and reliability. By moving compute workloads to E2 VMs we were able to satisfy all of our criteria. Across the board, E2 VMs have delivered greater infrastructure efficiency for our digital identity platform and we are able to invest in a more delightful customer experience with additional features for our enterprise customers." —Simon Harding, Senior Staff Site Reliability Engineer at ForgeRock

Hands-on experience with E2 VMs

Throughout this past year, we strengthened our investment in dynamic resource management, and improved the at-scale scheduling algorithms that govern E2 VM performance. Our telemetry shows that E2 VMs deliver steady performance across a variety of workloads, even for those that are CPU-intensive. As a result, Alphabet services such as Android and ChromeOS infrastructure are now successfully running on E2 VMs. Google Kubernetes Engine (GKE) control-plane nodes also work seamlessly with E2 VMs.

To provide an illustrative example, we measured a latency-sensitive web application running on a replicated set of e2-standard-4 VMs actively serving requests for a time period of sixteen days. The application serves about ~247 QPS (queries-per-second) of CPU-intensive work per replica. All replicas reply within ±10% latency variation, at the median.

In this particular example, E2's Dynamic Resource Management relied on two Compute Engine technologies to provide sustained consistent performance. The first, our VM placement technology, makes scheduling decisions that leverage resource observations from various workloads in order to predict performance across different target hosts. The second, our custom hypervisor CPU scheduler, minimizes the noisy-neighbour effects from adjacent VMs by providing sub-microsecond average wake-up latencies and fast context switching.

During our sixteen-day observation window, the application underwent a maintenance event that triggered live migration in one of the replicas. Compute Engine relies on its battle-tested and performance-aware live migration technology to keep your VMs running during maintenance events, moving them seamlessly to another host in the same zone instead of requiring them to be rebooted.

The following chart shows that the performance impact of live migration remained negligible while the replica was relocated to a different host. The VM’s overhead averaged between 0.02% to 0.1% of CPU time per second during the event.

https://storage.googleapis.com/gweb-cloudblog-publish/images/vm2.max-2000x2000.jpg

The timeseries above depicts vCPU availability % in a web application replica that underwent a maintenance event. vCPU throughput was 99.90% right before migration, and stabilized at 99.98% The total VM wait time at the time of the migration was about 160 milliseconds.

The clients connected to the replica during the maintenance event did not observe any connectivity loss or degradation; in fact, they noticed a 1 millisecond improvement in latency.

https://storage.googleapis.com/gweb-cloudblog-publish/images/vm1.max-2000x2000.jpg

The timeseries above depicts query latency in milliseconds in a web application replica that underwent a maintenance event. This resulted in a live migration between two different hosts. The total VM wait time at the time of the migration was about ~160 milliseconds.

Another E2 VM benefit enabled by dynamic resource management is its access to the largest pool of compute resources available in Compute Engine to any VM Family. By leveraging dynamic resource management, E2 VMs are scheduled seamlessly across x86 platforms from a combined pool of Intel and AMD based servers. In fact, our application’s replicas were scheduled in a mix of hosts powered by CPUs from both vendors and ran smoothly without experiencing any host errors, without the need to be rebuilt for a specific CPU vendor. As designed, per-vendor overall performance remained comparable, within 0.1% difference in total QPS served, 10% difference in median latency and CPU utilization stable at 55% for Intel-based hosts and 60% for the AMD equivalent.

Putting it all together, Compute Engine’s design of E2 VMs centered around running on a large multi-vendor x86 platform pool and powered by Google Cloud’s dynamic resource management, provides a consistently performant environment for your applications.

Get started

If you’re looking for cost-efficiency, E2 VMs are a great choice. Since E2’s initial launch, we’ve added several new features and capabilities to E2 VMs:

Support for 32 vCPU instances - To meet the processing power required by a diverse range of workloads, we now support sizes up to 32 vCPU with the addition of e2-standard-32 and e2-highcpu-32.

Custom memory for E2 shared-core machine types - For small workloads, we've extended custom machine types to support e2-micro, e2-small, and e2-medium. These VMs range from 0.25 vCPU to 1.0 vCPU with the ability to burst up to 2 vCPU, and now support a customized amount of memory ranging from 1 to 8 GB.

Stay tuned for updates to the Google Cloud Free Tier that will soon include one non-preemptible e2-micro instance for use each month for free. The e2-micro instance will provide you with two vCPUs, each for 12.5% of CPU uptime (0.25 vCPU), and 1 GB of memory.

Enhancements to E2 are a part of a broader effort to meet your application needs with a diverse product portfolio that includes Tau VMs, our latest addition focused on industry leading price/performance. To learn more about E2 VMs and the complete portfolio of Compute Engine VM Families, visit the E2 and the VM Families documentation.

Compute