This page provides instructions for how to install NCCL/gIB with either Debian Software Packages (.deb) or the Red Hat Package Manager (.rpm). This installation lets you run NCCL tests on A3 Ultra, A4, and A4X VMs (the following examples are for 2-node tests).
If you are using Google's 1P schedulers such as GKE and Cluster Toolkit (with Slurm and GKE support), then you don't need to follow the steps on this page. Instead, follow the instructions on the page that is appropriate for your scenario:
- Run NCCL on GKE clusters that use default configuration
- Run NCCL on custom GKE clusters that use A4X
- Run NCCL on custom GKE clusters that use A4 or A3 Ultra
- Run NCCL tests on Slurm clusters
Install nccl-gib
Depending on where you run your workloads, you install NCCL/gIB in either the guest VM or the container image.
The nccl-gib package is bundled with an unmodified NVidia NCCL library (libnccl2.so) and headers. All NCCL/gIB content is installed to the /usr/local/gib directory. Some dependencies are also fetched through the distribution's repository.
Debian 12+/Ubuntu 20.04+ (.deb package)
# If not using an image from Google, trust the GCP signing key curl http://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/cloud.google.gpg # Add gpudirect-gib-apt repo echo 'deb https://packages.cloud.google.com/apt gpudirect-gib-apt main' | sudo tee /etc/apt/sources.list.d/nccl-gib.list sudo apt update sudo apt install nccl-gib
RockyLinux/CentOS/RHEL 9+ (.rpm package)
# Add gpudirect-gib-rpm repo sudo tee -a /etc/yum.repos.d/nccl-gib.repo << EOL [gpudirect-gib-rpm] name=NCCL/gIB baseurl=https://packages.cloud.google.com/yum/repos/gpudirect-gib-rpm enabled=1 repo_gpgcheck=0 gpgcheck=0 sudo dnf makecache sudo dnf install nccl-gib
If you are using standard OS images, you must also install the latest NVIDIA DOCA-OFED driver. You don't need to install this driver if you are using Google's A* optimized images, such as Container OS or Guest Accelerator Ubuntu/RockyLinux OS Images.
To avoid VMs running different versions of the nccl-gib package, we recommend that you update nccl-gib before you run your NCCL workloads or disable unattended-upgrades.
Use NCCL/gIB
To enable NCCL/gIB in your workloads, ensure the following:
- /usr/local/gib/scripts/set_nccl_env.shis sourced in your runtime environment. The source file includes all the necessary environment variables for NCCL/gIB and Google expects to update them in future NCCL/gIB releases.
- The /usr/local/gib/lib64directory is in yourLD_LIBRARY_PATH.
To verify NCCL/gIB is enabled check that the following NCCL INFO level log entries are present:
# A sample log entry from NCCL core
vm-0:606:642 [6] NCCL INFO Using network gIB
# A sample log entry from the gIB network plugin
vm-0:606:642 [6] NCCL INFO NET/gIB : Initializing gIB v1.0.5
Run NCCL tests
To learn how to run NCCL tests in a scheduled environment, see the following:
- Run NCCL on GKE clusters that use default configuration
- Run NCCL on custom GKE clusters that use A4X
- Run NCCL on custom GKE clusters that use A4 or A3 Ultra
- Run NCCL tests on Slurm clusters
We also publish a diagnostic container image with everything included at http://us-docker.pkg.dev/gce-ai-infra/gpudirect-gib/nccl-plugin-gib-diagnostic:latest.
To run NCCL tests in a non-scheduled environment:
- Install cuda-12.8 (or newer) and openmpi
- Set up non-interactive ssh logins among the VMs
- Build nccl-tests with MPI enabled. When building nccl-tests, set NCCL_HOME=/usr/local/gib
To run the script shipped with the NCCL/gIB package:
# The script assumes binaries at /opt/nccl-tests/build/
$ /usr/local/gib/scripts/run_nccl_tests.sh -d /opt/nccl-tests/build/ -p 22 -t all_gather -m 0x0 -b 4K -e 16G a4-vm-1 a4-vm-2
Example output on two A4 VMs:
NCCL version 2.25.1+cuda12.8
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
        4096            64     float    none      -1    59.97    0.07    0.06      0    57.49    0.07    0.07      0
        8192           128     float    none      -1    58.17    0.14    0.13      0    58.36    0.14    0.13      0
       16384           256     float    none      -1    59.07    0.28    0.26      0    59.03    0.28    0.26      0
       32768           512     float    none      -1    60.93    0.54    0.50      0    60.79    0.54    0.51      0
       65536          1024     float    none      -1    61.93    1.06    0.99      0    62.17    1.05    0.99      0
      131072          2048     float    none      -1    64.62    2.03    1.90      0    64.48    2.03    1.91      0
      262144          4096     float    none      -1    66.50    3.94    3.70      0    67.05    3.91    3.67      0
      524288          8192     float    none      -1    69.37    7.56    7.09      0    67.83    7.73    7.25      0
     1048576         16384     float    none      -1    117.2    8.95    8.39      0    113.7    9.22    8.64      0
     2097152         32768     float    none      -1    118.8   17.65   16.55      0    118.1   17.75   16.64      0
     4194304         65536     float    none      -1    122.2   34.32   32.17      0    122.6   34.22   32.08      0
     8388608        131072     float    none      -1    132.2   63.44   59.48      0    130.7   64.20   60.18      0
    16777216        262144     float    none      -1    139.2  120.49  112.96      0    139.7  120.07  112.56      0
    33554432        524288     float    none      -1    152.0  220.81  207.01      0    152.1  220.59  206.81      0
    67108864       1048576     float    none      -1    227.6  294.87  276.44      0    225.9  297.08  278.51      0
   134217728       2097152     float    none      -1    431.7  310.87  291.44      0    438.0  306.41  287.26      0
   268435456       4194304     float    none      -1    728.6  368.44  345.41      0    735.9  364.79  341.99      0
   536870912       8388608     float    none      -1   1404.2  382.33  358.44      0   1418.4  378.51  354.85      0
  1073741824      16777216     float    none      -1   2795.8  384.06  360.05      0   2768.9  387.79  363.55      0
  2147483648      33554432     float    none      -1   5440.1  394.75  370.08      0   5418.7  396.31  371.54      0
  4294967296      67108864     float    none      -1    10754  399.40  374.43      0    10746  399.67  374.69      0
  8589934592     134217728     float    none      -1    21434  400.77  375.72      0    21421  401.01  375.95      0
 17179869184     268435456     float    none      -1    42679  402.53  377.38      0    42792  401.48  376.38      0
What's next
- Collect and Understand NCCL Logs for Troubleshooting to understand the test outputs and troubleshoot issues.
- Monitor VMs and Slurm clusters.
- Learn about troubleshooting slow performance.