Modern machine learning frameworks often use the NVIDIA Collective Communications Library (NCCL) for GPU-to-GPU communication primitives.
Google's enhanced version of NCCL is called NCCL/gIB and is available on Google Cloud's A3 Ultra, A4, and A4X VMs. NCCL/gIB is often more performant than upstream NCCL on Google infrastructure. Therefore, because NCCL performance can impact overall workload performance, we recommend that you use NCCL/gIB.
NCCL/gIB contains Google-specific features and optimizations such as the following:
- The gIB network plugin offers improved load balancing on Google's networks, leading to more consistent high throughput and low latency during collective operations.
- A custom tuner plugin, which selects the best tuning options on Google Cloud VMs.
- The CoMMA profiler plugin provides detailed performance metrics and diagnostic data for your workload.
NCCL/gIB architecture
NCCL/gIB interacts with your machine learning framework and the NVIDIA GPUs on your clusters to optimize performance and gather telemetry, as shown in this diagram:

Benefits of using NCCL/gIB
It is possible to use the upstream NVIDIA Collective Communications Library on Google Cloud VMs without stability problems. However, NCCL/gIB is better optimized for Google Cloud and the performance disparity can be very significant for certain communication patterns, even with the same NCCL parameters.
For example, the following diagram shows a comparison of NCCL/gIB with upstream NCCL on AllReduce performance. NCCL/gIB outperforms upstream NCCL by as much as 12x on certain message sizes.

32-node NCCL AllReduce performance using A3 Ultra (H200) with no background traffic.
Similarly, as shown in the following image, in a comparison of NCCL/gIB with upstream NCCL on AllGather performance with background traffic, NCCL/gIB outperforms upstream NCCL by approximately 50% on larger message sizes.

32-node NCCL AllGather performance using A3 Ultra (H200) on a shared fabric with a noisy background.
In addition, the CoMMA profiler plugin provides Google with improved custom telemetry, enabling us to better assist you should a workload-level issue arise.
Using NCCL/gIB
To run NCCL/gIB tests on your AI Hypercomputer cluster, choose the page that applies to your setup:
- Run NCCL tests on Compute Engine VMs
- Run NCCL on GKE clusters that use default configuration
- Run NCCL on custom GKE clusters that use A4X
- Run NCCL on custom GKE clusters that use A4 or A3 Ultra
- Run NCCL tests on Slurm clusters
To learn how to address any issues with your cluster after you have run your tests, see Collect and understand NCCL/gIB logs for troubleshooting.