Run NCCL on Compute Engine VMs

This page provides instructions for how to install NCCL/gIB with either Debian Software Packages (.deb) or the Red Hat Package Manager (.rpm). This installation lets you run NCCL tests on A3 Ultra, A4, and A4X VMs (the following examples are for 2-node tests).

If you are using Google's 1P schedulers such as GKE and Cluster Toolkit (with Slurm and GKE support), then you don't need to follow the steps on this page. Instead, follow the instructions on the page that is appropriate for your scenario:

Install nccl-gib

Depending on where you run your workloads, you install NCCL/gIB in either the guest VM or the container image.

The nccl-gib package is bundled with an unmodified NVidia NCCL library (libnccl2.so) and headers. All NCCL/gIB content is installed to the /usr/local/gib directory. Some dependencies are also fetched through the distribution's repository.

Debian 12+/Ubuntu 20.04+ (.deb package)

# If not using an image from Google, trust the GCP signing key
curl http://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/cloud.google.gpg

# Add gpudirect-gib-apt repo
echo 'deb https://packages.cloud.google.com/apt gpudirect-gib-apt main' | sudo tee /etc/apt/sources.list.d/nccl-gib.list

sudo apt update
sudo apt install nccl-gib

RockyLinux/CentOS/RHEL 9+ (.rpm package)

# Add gpudirect-gib-rpm repo
sudo tee -a /etc/yum.repos.d/nccl-gib.repo << EOL
[gpudirect-gib-rpm]
name=NCCL/gIB
baseurl=https://packages.cloud.google.com/yum/repos/gpudirect-gib-rpm
enabled=1
repo_gpgcheck=0
gpgcheck=0

sudo dnf makecache
sudo dnf install nccl-gib

If you are using standard OS images, you must also install the latest NVIDIA DOCA-OFED driver. You don't need to install this driver if you are using Google's A* optimized images, such as Container OS or Guest Accelerator Ubuntu/RockyLinux OS Images.

To avoid VMs running different versions of the nccl-gib package, we recommend that you update nccl-gib before you run your NCCL workloads or disable unattended-upgrades.

Use NCCL/gIB

To enable NCCL/gIB in your workloads, ensure the following:

  • /usr/local/gib/scripts/set_nccl_env.sh is sourced in your runtime environment. The source file includes all the necessary environment variables for NCCL/gIB and Google expects to update them in future NCCL/gIB releases.
  • The /usr/local/gib/lib64 directory is in your LD_LIBRARY_PATH.

To verify NCCL/gIB is enabled check that the following NCCL INFO level log entries are present:

# A sample log entry from NCCL core
vm-0:606:642 [6] NCCL INFO Using network gIB

# A sample log entry from the gIB network plugin
vm-0:606:642 [6] NCCL INFO NET/gIB : Initializing gIB v1.0.5

Run NCCL tests

To learn how to run NCCL tests in a scheduled environment, see the following:

We also publish a diagnostic container image with everything included at http://us-docker.pkg.dev/gce-ai-infra/gpudirect-gib/nccl-plugin-gib-diagnostic:latest.

To run NCCL tests in a non-scheduled environment:

  1. Install cuda-12.8 (or newer) and openmpi
  2. Set up non-interactive ssh logins among the VMs
  3. Build nccl-tests with MPI enabled. When building nccl-tests, set NCCL_HOME=/usr/local/gib

To run the script shipped with the NCCL/gIB package:

# The script assumes binaries at /opt/nccl-tests/build/
$ /usr/local/gib/scripts/run_nccl_tests.sh -d /opt/nccl-tests/build/ -p 22 -t all_gather -m 0x0 -b 4K -e 16G a4-vm-1 a4-vm-2

Example output on two A4 VMs:

NCCL version 2.25.1+cuda12.8
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
        4096            64     float    none      -1    59.97    0.07    0.06      0    57.49    0.07    0.07      0
        8192           128     float    none      -1    58.17    0.14    0.13      0    58.36    0.14    0.13      0
       16384           256     float    none      -1    59.07    0.28    0.26      0    59.03    0.28    0.26      0
       32768           512     float    none      -1    60.93    0.54    0.50      0    60.79    0.54    0.51      0
       65536          1024     float    none      -1    61.93    1.06    0.99      0    62.17    1.05    0.99      0
      131072          2048     float    none      -1    64.62    2.03    1.90      0    64.48    2.03    1.91      0
      262144          4096     float    none      -1    66.50    3.94    3.70      0    67.05    3.91    3.67      0
      524288          8192     float    none      -1    69.37    7.56    7.09      0    67.83    7.73    7.25      0
     1048576         16384     float    none      -1    117.2    8.95    8.39      0    113.7    9.22    8.64      0
     2097152         32768     float    none      -1    118.8   17.65   16.55      0    118.1   17.75   16.64      0
     4194304         65536     float    none      -1    122.2   34.32   32.17      0    122.6   34.22   32.08      0
     8388608        131072     float    none      -1    132.2   63.44   59.48      0    130.7   64.20   60.18      0
    16777216        262144     float    none      -1    139.2  120.49  112.96      0    139.7  120.07  112.56      0
    33554432        524288     float    none      -1    152.0  220.81  207.01      0    152.1  220.59  206.81      0
    67108864       1048576     float    none      -1    227.6  294.87  276.44      0    225.9  297.08  278.51      0
   134217728       2097152     float    none      -1    431.7  310.87  291.44      0    438.0  306.41  287.26      0
   268435456       4194304     float    none      -1    728.6  368.44  345.41      0    735.9  364.79  341.99      0
   536870912       8388608     float    none      -1   1404.2  382.33  358.44      0   1418.4  378.51  354.85      0
  1073741824      16777216     float    none      -1   2795.8  384.06  360.05      0   2768.9  387.79  363.55      0
  2147483648      33554432     float    none      -1   5440.1  394.75  370.08      0   5418.7  396.31  371.54      0
  4294967296      67108864     float    none      -1    10754  399.40  374.43      0    10746  399.67  374.69      0
  8589934592     134217728     float    none      -1    21434  400.77  375.72      0    21421  401.01  375.95      0
 17179869184     268435456     float    none      -1    42679  402.53  377.38      0    42792  401.48  376.38      0

What's next