Accelerating MPI applications using Google Virtual NIC (gVNIC)
Jian Yang
Software Engineer
Pavan Kumar
Product Manager
Try Google Cloud
Start building on Google Cloud with $300 in free credits and 20+ always free products.
Free trialAt Google, we are constantly improving the performance of our network infrastructure. We recently introduced Google Virtual NIC (gVNIC), a virtual network interface designed specifically for Compute Engine. gVNIC is an alternative to the VirtIO-based Ethernet driver. It is tightly integrated with our high performance, flexible Andromeda virtual network stack and is required to enable high network bandwidth configurations (50-100 Gbps).
Using gVNIC improves communication performance by more efficiently delivering traffic among your VM instances. This improvement is valuable for high performance computing (HPC) users because MPI communication performance is critical for application scalability of workloads such as weather modeling, computational fluid dynamics, and computer aided engineering.
To simplify using gVNIC for HPC workloads, our CentOS 7 based HPC VM image now supports gVNIC and includes the latest gve driver (gve-1.2.3) by default. Continue reading for more details on gVNIC performance or skip ahead to our quickstart guide to get started today!
Performance results using HPC benchmarks
We compared the performance of gVNIC and VirtIO-Net across the Intel MPI Benchmarks and several application benchmarks, including finite element analysis (ANSYS LS-DYNA), computational fluid dynamics (ANSYS Fluent) and weather modeling (WRF).
Intel MPI Benchmark (IMB) PingPong
IMB PingPong measures the average one-way time to send a fixed-sized message between two MPI ranks over a pair of VMs. We can see from the results below that gVNIC provides lower latency on medium and large message sizes (e.g., 2kB to 4MB). For these messages, gVNIC improves latency by 26%, on average, compared to VirtIO.
Benchmark setup:
HPC VM image: hpc-centos-7-v20210925
Machine type: c2-standard-60
Using compact placement policy
Network bandwidth: Default Tier
Results
OSU Micro Benchmark (OMB) Multiple Bandwidth
OMB Multiple Bandwidth measures the aggregate uni-directional bandwidth between multiple pairs of processes across VMs. For this benchmark, we used 30 processes per node (PPN=30) on each of 2 VMs. gVNIC is required for 100 Gbps networking. This benchmark demonstrates that gVNIC with 100G networking unlocks 57% higher throughput, on average, compared to VirtIO.
Benchmark setup:
HPC VM image: hpc-centos-7-v20210925
Machine type: c2-standard-60
Using compact placement policy
Network bandwidth: Tier 1 (100Gbps)
Results
HPC application benchmarks: WRF, ANSYS LS-DYNA, ANSYS Fluent
The latency and bandwidth benchmark gains using gVNIC translate into shorter runtimes for HPC application benchmarks. Using gVNIC with HPC VM image yields 51% performance improvement to the WRFv3 CONUS 12KM benchmark when running on 720 MPI ranks across 24 Intel Xeon processor-based C2 instances. With ANSYS Fluent and LS-DYNA, we observed a performance improvement of 13% and 11%, respectively, using gVNIC, in comparison with Virtio-Net.
Benchmark setup
ANSYS LS-DYNA (“3-cars” model): 8 c2-standard-60 VMs with compact placement policy, using the LS-DYNA MPP binary compiled with AVX-2
ANSYS Fluent (“aircraft_wing_14m” model): 16 c2-standard-60 VMs with compact placement policy
WRF V3 Parallel Benchmark (12 KM CONUS): 24 c2-standard-60 VMs with compact placement policy
HPC VM image: hpc-centos-7-v20210925
Network bandwidth: Default Tier
Results
Get started today!
Starting today, you can use the latest HPC VM image with gVNIC support via Google Cloud Marketplace or the gcloud command-line tool. Check out our quickstart guide for details on creating instances using gVNIC and the HPC VM image.
Special thanks to Jiuxing Liu, Tanner Love, Mansoor Alicherry and Pallavi Phene for their contributions.