Systems

Google opens Falcon, a reliable low-latency hardware transport, to the ecosystem

October 17, 2023

Dan Lenoski

VP of Engineering, Google Cloud

Nandita Dukkipati

Principal Software Engineer, Google Cloud

At Google, we have a long history of solving problems at scale using Ethernet, and rethinking the transport layer to satisfy demanding workloads that require high burst bandwidth, high message rates, and low latency. Workloads such as storage have needed some of these attributes for a long time, however, with newer use cases such as massive-scale AI/ML training and high performance computing (HPC), the need has grown significantly. In the past, we’ve openly shared our learnings in traffic shaping, congestion control, load balancing, and more with the industry by contributing our ideas to the Association for Computing Machinery and Internet Engineering Task Force. These ideas have been implemented in software and a few in hardware for several years. But going forward, we believe the industry at large will see more gains by implementing the set with dedicated and flexible hardware assist.

To achieve this goal, we developed Falcon to enable a step function in performance over software-only transports. Today at the OCP Global Summit, we are excited to open Falcon to the ecosystem through the Open Compute Project, the natural venue to empower the community with Google’s production learnings to help modernize Ethernet.

As a hardware-assisted transport layer, Falcon is designed to be reliable, high performance, and low latency and leverages production-proven technologies including Carousel, Snap, Swift, PLB, and CSIG.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Falcon.max-1700x1700.jpg

Falcon’s layers are illustrated in the figure below, including their associated function. We show the RDMA and NVM Express™ Upper layer protocols (ULPs), however, Falcon is extensible to additional ULPs as needed by the ecosystem.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_Falcon.max-2000x2000.jpg

The lower layers of Falcon use three key insights to achieve low latency in high-bandwidth, yet lossy, Ethernet data center networks. Fine-grained hardware-assisted round-trip time (RTT) measurements with flexible, per-flow hardware-enforced traffic shaping, and fast and accurate packet retransmissions, are combined with multipath-capable and PSP-encrypted Falcon connections. On top of this foundation, Falcon has been designed from the ground up as a multi-protocol transport capable of supporting ULPs with widely varying performance requirements and application semantics. The ULP mapping layer not only provides out-of-the-box compatibility with Infiniband Verbs RDMA and NVMe ULPs, but also includes additional innovations critical for warehouse-scale applications such as flexible ordering semantics and graceful error handling. Last but not least, the hardware and software are co-designed to work together to help achieve the desired attributes of high message rate, low latency, and high bandwidth, while maintaining flexibility for programmability and continued innovation.

Falcon reflects the central role that Ethernet continues to play in our industry. Falcon is designed for predictable high performance at warehouse scale, as well as flexibility and extensibility. We look forward to working with the community and industry partners to modernize Ethernet to serve the networking requirements of our AI-driven future. We believe that Falcon will be a valuable addition to the other ongoing efforts in this space.

Industry perspectives

Our partners across the industry are enthusiastic about the promise that Falcon holds for developing the next generation of Ethernet.

“We welcome Google’s contribution of Falcon as it shares the Ultra Ethernet Consortium’s vision to drive Ethernet as the best data center fabric for AI and HPC, and look forward to continuing industry innovations in this important space.” - Dr. J Metz, Chair, Ultra Ethernet Consortium (led by AMD, Arista, Broadcom, Cisco, Eviden, Hewlett Packard Enterprise, Intel, Meta, Microsoft, and Oracle)

“Falcon is first available in the Intel IPU E2000 series of products. The value of these IPUs is further enhanced as the first instance of an Ethernet transport to add low tail latency and congestion handling at scale. Intel is a Steering Member of Ultra Ethernet Consortium, which is working to evolve Ethernet for high performance AI and HPC workloads. We plan to deploy the resulting standards-based enhancements in future IPU and Ethernet products.” - Sachin Katti, SVP & GM, Network and Edge Group, Intel

"We are pleased to see a high-performance transport protocol for critical workloads such as AI and HPC that works over standard Ethernet/IP networks and enables massive application bandwidth at scale." - Hugh Holbrook, Group VP, SW Eng., Arista Networks

“Cisco is pleased to see the contribution of Falcon to the OCP. Cisco has long supported open standards and believes in broad ecosystems. The rate and scale of modern data center networks and particularly AI/ML networks is unprecedented, presenting a challenge and opportunity to the industry. Falcon addresses many of the challenges of these networks, enabling efficient network utilization.” - Ofer Iny, Cisco Fellow, Cisco

“Juniper is a strong supporter of open ecosystems, and therefore we are pleased to see Falcon being opened to the OCP community. Falcon allows Ethernet to serve as the data center network-of-choice for demanding workloads, providing high-bandwidth, low tail latency and congestion mitigation. Falcon provides the industry with a proven solution today for demanding AI & ML workloads.” - Raj Yavatkar, Chief Technology Officer, Juniper

“Marvell strongly supports and is committed to the open Ethernet ecosystem as it evolves to support emerging, demanding workloads such as AI. We applaud the contribution of Falcon to OCP and welcome Google sharing practical experiences with the industry.” - Nick Kucharewski, SVP & GM Network Switching Group, Marvell

Learn more

Networking is a foundational component in building the sustainable, secure, scalable societal infrastructure that we need for this AI-driven future. To learn more about Falcon, join us for the OCP Summit presentation, “A Reliable and Low Latency Ethernet Hardware Transport” by Google’s Nandita Dukkipati at 11:45am at the Expo Hall. We’ll contribute the Falcon specification to OCP in the first quarter of 2024.

To learn more about Google’s contributions to the Open Compute Project and our presence at the OCP Global Summit, check out the blog “How we’ll build sustainable, scalable, secure infrastructure for an AI-driven future”.

Posted in

https://storage.googleapis.com/gweb-cloudblog-publish/images/100_million_Li-ion_cells_in_Google_data_cent.max-700x700.jpg

Systems

How we got to 100 million cells in our global Li-ion rack battery fleet

By Christina Peabody • 3-minute read

Systems

Balance of power: A full-stack approach to power and thermal fluctuations in ML infrastructure

By Houle Gan • 6-minute read

Sustainability

Designing sustainable AI: A deep dive into TPU efficiency and lifecycle emissions

By David Patterson • 4-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/25_years.max-700x700.jpg

Networking

Speed, scale and reliability: 25 years of Google data-center networking evolution

By Amin Vahdat • 7-minute read

Google opens Falcon, a reliable low-latency hardware transport, to the ecosystem

Dan Lenoski

Nandita Dukkipati

Industry perspectives

Learn more

Related articles

How we got to 100 million cells in our global Li-ion rack battery fleet

Balance of power: A full-stack approach to power and thermal fluctuations in ML infrastructure

Designing sustainable AI: A deep dive into TPU efficiency and lifecycle emissions

Speed, scale and reliability: 25 years of Google data-center networking evolution