Jump to Content
Google Cloud

Enter the Andromeda zone: Google Cloud Platform's latest networking stack

April 2, 2014
Amin Vahdat

VP/GM ML, Systems, and Cloud AI, Google

We have recently made the latest networking technology that powers our internal services available to Google Cloud Platform users across the world. Andromeda, the codename for Google’s network virtualization stack, now powers two Google Compute Engine zones: us-central1-b and europe-west1-a. Customers in these zones will automatically see major performance gains in throughput over our already fast network connections. We will be fully migrating all zones to Andromeda in the coming months.

At the Open Network Summit, I presented Andromeda. In this presentation, I described some of the networking challenges introduced by virtualization. Delivering the highest level of performance, availability, and security requires orchestrating across virtual machines, hypervisors, operating systems, network interface cards, top of rack switches, fabric switches, border routers, and even our network peering edge. We are uniquely positioned to leverage Google's control and expertise over the entire hardware, software, LAN, and WAN to deliver a seamless experience for Cloud Platform customers.

At Google, we benefit from having programmable access to the entire network stack, from the lowest-level hardware to the highest-level software elements. Rather than being forced to create compromised solutions based on available insertion points, we can design end-to-end secure and performant solutions by coordinating across the stack.

Andromeda is a Software Defined Networking (SDN)-based substrate for our network virtualization efforts. It is the orchestration point for provisioning, configuring, and managing virtual networks and in-network packet processing. The figure below from my presentation shows Andromeda's high-level architecture:

https://storage.googleapis.com/gweb-cloudblog-publish/images/andromedajy2l.max-800x800.PNG

Andromeda's goal is to expose the raw performance of the underlying network while simultaneously exposing network function virtualization (NFV). We expose the same in-network processing that enables our internal services to scale while remaining extensible and isolated to end users. This functionality includes distributed denial of service (DDoS) protection, transparent service load balancing, access control lists, and firewalls. We do this all while improving performance, with more enhancements coming.

Hence, Andromeda itself is not a Cloud Platform networking product; rather, it is the basis for delivering Cloud Platform networking services with high performance, availability, isolation, and security. For example, Cloud Platform firewalls, routing, and forwarding rules all leverage the underlying internal Andromeda APIs and infrastructure. Our site presents the details of these and other advanced network capabilities.

In addition, my presentation covered various scenarios such as the previously described Google Compute Engine 1M RPS Load balancing post. I also spoke about some forthcoming TCP stream performance improvements within Google Compute Engine (GCE), the most notable of which was a significant improvement to network-level latency, throughput, and CPU overhead. While these enhancements will lead to some of the best network performance available in the industry, we are most excited about the path moving forward. Andromeda will enable Cloud Platform to expose more and more of Google’s raw network infrastructure performance to all GCE virtual machines (VMs).

Some of the most valuable enhancements enable VMs built on supporting Linux kernels to exploit offload/multi-queue capabilities. I encourage interested customers to create new GCE VMs using the Debian backports-image. This image has the latest drivers needed to achieve the best performance.

To show the magnitude of improvements rolling-out, the Cloud Platform team performed a number of performance experiments. One benchmark evaluated throughput using netperf TCP_STREAM in the same GCE zone. By comparing the Baseline performance (before Andromeda) against Andromeda, we can highlight the benefits of the Andromeda architecture.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Screen2BShot2B2016-11-082Bat2B43qml.max-700x700.PNG

Additionally, we've started working on the next set of enhancements. In my talk, I highlighted some of the opportunities moving forward: high-speed access to low-latency, durable storage, APIs for NFV, and VM migration to deliver transparent availability in the face of system maintenance. Andromeda is a re-working of our underlying network virtualization architecture, and its SDN core enables us to rapidly iterate and deliver new functionality. This ensures that Cloud Platform's network will continue to be an agent of disruption to cloud computing moving forward.

Posted in