Andromeda 2.1 reduces GCP’s intra-zone latency by 40%
Jake Adriaens
Staff Software Engineer
Google Cloud customers now enjoy significantly improved intra-zone network latency with the release of Andromeda 2.1, a software-defined network (SDN) stack that underpins all of Google Cloud. The latest version of Andromeda reduces network latency between Compute Engine VMs by 40% over Andromeda 2.0 and by nearly a factor of 8 since we first launched Andromeda in 2014.
This kind of network performance is especially important as more applications move into the cloud and are accessed via web browsers. While the headline metric is often bandwidth, network latency is frequently the more important determiner of application performance. For example, low latency is essential for financial transactions, ad-tech, video, gaming and retail, as well as workloads such as HPC applications, memcache and in-memory databases. Likewise, HTTP-based microservices will see significant improvement in responsiveness with reduced latency.
Andromeda 2.1 latency improvements come from a form of hypervisor bypass that builds on virtio, the Linux paravirtualization standard for device drivers. Andromeda 2.1 enhancements enable the Compute Engine guest VM and the Andromeda software switch to communicate directly via shared memory network queues, bypassing the hypervisor completely for performance-sensitive per-packet operations.
In our previous approach, the hypervisor thread served as a bridge between the guest VM and the Andromeda software switch. Packets flowed from the VM to a hypervisor thread, to the local host’s Andromeda software switch, then over the physical network to another Andromeda software switch, and back up through the hypervisor to the VM. Further, any time the thread wasn’t bridging packets, it was descheduled, increasing tail latency for new packet processing. In many cases, a single network round-trip required four costly hypervisor thread wakeups!
Andromeda 2.1 performance in action
The new Andromeda 2.1 stack delivers noteworthy reductions in VM-to-VM network latency. The figure below shows the factor by which the latency has reduced over time compared to the median round-trip time of the original stack.This reduction in network round-trip times translates into real-world performance boosts for latency sensitive applications. Take Aerospike, a high-performance in-memory NoSQL database. The new Andromeda stack delivers both a reduction in request latency and improved request throughput for Aerospike, as shown below.
Considering Andromeda SDN is a foundational building block of Google Cloud, you should see similar improvements in intra-zone latency, regardless of what applications you're running.