Network bandwidths and GPUs


Higher network bandwidths can improve the performance of your distributed workloads running on Compute Engine virtual machine (VM) instances.

Overview

The maximum network bandwidth that is available for VMs with attached GPUs on Compute Engine is as follows:

  • For A3 accelerator-optimized VMs, you can get a maximum network bandwidth of up to 3,200 Gbps.
  • For A2 and G2 accelerator-optimized VMs, you can get a maximum network bandwidth of up to 100 Gbps, based on the machine type.
  • For N1 general-purpose VMs that have P100 and P4 GPUs attached, a maximum network bandwidth of 32 Gbps is available. This is similar to the maximum rate available to N1 VMs that don't have GPUs attached. For more information about network bandwidths, see maximum egress data rate.
  • For N1 general-purpose VMs that have T4 and V100 GPUs attached, you can get a maximum network bandwidth of up to 100 Gbps, based on the combination of GPU and vCPU count.

A3 VMs

A3 accelerator-optimized machine types have either NVIDIA H100 80GB or NVIDIA H200 141GB GPUs attached. Each A3 machine type has a fixed GPU count, vCPU count, and memory size.

A3 Ultra machine type

This machine type has H200 GPUs attached and provides the highest network performance in the A3 series.

For this machine type, a network interface card (NIC) arrangement of 8+2 is available. With this arrangement, 8 NICs share the same Peripheral Component Interconnect Express (PCIe) bus, and 2 NICs reside on a separate PCIe bus.

NICs that share the same PCIe bus, have a non-uniform memory access (NUMA) alignment of one NIC per two NVIDIA H200 141GB GPUs. These NVIDIA ConnectX-7 NICs are ideal for dedicated high bandwidth GPU to GPU communication. For these NICs, you must use the RDMA network profile. The physical NICs that reside on a separate PCIe bus is ideal for other networking needs. For these NICs, we recommend that you use Google Virtual NIC (gVNIC).

For more information about setting up the networks for A3 Ultra VMs, see Create VPC networks in the AI Hypercomputer documentation.

Machine type GPU count GPU memory*
(GB HBM3e)
vCPU count VM memory (GB) Attached Local SSD (GiB) Physical NIC count Maximum network bandwidth (Gbps) Network protocol
a3-ultragpu-8g 8 1128 224 2,952 12,000 10 3,200 RDMA over Converged Ethernet (RoCE)

*GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.

A3 Mega, High, and Edge machine types

These machine types have H100 80GB GPUs attached. Each of these machine types have a fixed GPU count, vCPU count, and memory size.

  • Single NIC A3 VMs: For A3 VMs with 1 to 4 GPUs attached, only a single physical network interface card (NIC) is available.
  • Multi-NIC A3 VMs: For A3 VMs with 8 GPUS attached, multiple physical NICs are available. For these A3 machine types the NICs are arranged as follows on a Peripheral Component Interconnect Express (PCIe) bus:
    • For the A3 Mega machine type: a NIC arrangement of 8+1 is available. With this arrangement, 8 NICs share the same PCIe bus, and 1 NIC resides on a separate PCIe bus.
    • For the A3 High machine type: a NIC arrangement of 4+1 is available. With this arrangement, 4 NICs share the same PCIe bus, and 1 NIC resides on a separate PCIe bus.
    • For the A3 Edge machine type machine type: a NIC arrangement of 4+1 is available. With this arrangement, 4 NICs share the same PCIe bus, and 1 NIC resides on a separate PCIe bus. These 5 NICs provide a total network bandwidth of 400 Gbps for each VM.

    NICs that share the same PCIe bus, have a non-uniform memory access (NUMA) alignment of one NIC per two NVIDIA H100 80GB GPUs. These NICs are ideal for dedicated high bandwidth GPU to GPU communication. The physical NIC that resides on a separate PCIe bus is ideal for other networking needs. For instructions on how to setup networking for A3 High and A3 Edge VMs, see set up jumbo frame MTU networks.

A3 Mega

Machine type GPU count GPU memory*
(GB HBM3)
vCPU count VM memory (GB) Attached Local SSD (GiB) Physical NIC count Maximum network bandwidth (Gbps) Network protocol
a3-megagpu-8g 8 640 208 1,872 6,000 9 1,800 GPUDirect-TCPXO

A3 High

Machine type GPU count GPU memory*
(GB HBM3)
vCPU count VM memory (GB) Attached Local SSD (GiB) Physical NIC count Maximum network bandwidth (Gbps) Network protocol
a3-highgpu-1g 1 80 26 234 750 1 25 GPUDirect-TCPX
a3-highgpu-2g 2 160 52 468 1,500 1 50 GPUDirect-TCPX
a3-highgpu-4g 4 320 104 936 3,000 1 100 GPUDirect-TCPX
a3-highgpu-8g 8 640 208 1,872 6,000 5 1,000 GPUDirect-TCPX

A3 Edge

Machine type GPU count GPU memory*
(GB HBM3)
vCPU count VM memory (GB) Attached Local SSD (GiB) Physical NIC count Maximum network bandwidth (Gbps) Network protocol
a3-edgegpu-8g 8 640 208 1,872 6,000 5
  • 800: for asia-south1 and northamerica-northeast2
  • 400: for all other A3 Edge regions
GPUDirect-TCPX

*GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.

A2 VMs

Each A2 machine type has a fixed number of NVIDIA A100 40GB or NVIDIA A100 80 GB GPUs attached. Each machine type also has a fixed vCPU count and memory size.

A2 machine series are available in two types:

  • A2 Ultra: these machine types have A100 80GB GPUs and Local SSD disks attached.
  • A2 Standard: these machine types have A100 40GB GPUs attached.

A2 Ultra

Machine type GPU count GPU memory*
(GB HBM3)
vCPU count VM memory (GB) Attached Local SSD (GiB) Maximum network bandwidth (Gbps)
a2-ultragpu-1g 1 80 12 170 375 24
a2-ultragpu-2g 2 160 24 340 750 32
a2-ultragpu-4g 4 320 48 680 1,500 50
a2-ultragpu-8g 8 640 96 1,360 3,000 100

A2 Standard

Machine type GPU count GPU memory*
(GB HBM3)
vCPU count VM memory (GB) Attached Local SSD (GiB) Maximum network bandwidth (Gbps)
a2-highgpu-1g 1 40 12 85 Yes 24
a2-highgpu-2g 2 80 24 170 Yes 32
a2-highgpu-4g 4 160 48 340 Yes 50
a2-highgpu-8g 8 320 96 680 Yes 100
a2-megagpu-16g 16 640 96 1,360 Yes 100

*GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.

G2 VM configuration

Each G2 machine type has a fixed number of NVIDIA L4 GPUs and vCPUs attached. Each G2 machine type also has a default memory and a custom memory range. The custom memory range defines the amount of memory that you can allocate to your VM for each machine type. You can specify your custom memory during VM creation.

To get the higher network bandwidth rates (50 Gbps or higher) applied to most GPU VMs, it is recommended that you use Google Virtual NIC (gVNIC). For more information about creating GPU VMs that use gVNIC, see Creating GPU VMs that use higher bandwidths.

Machine type GPU count GPU memory* (GB GDDR6) vCPU count Default VM memory (GB) Custom VM memory range (GB) Max Local SSD supported (GiB) Maximum network bandwidth (Gbps)
g2-standard-4 1 24 4 16 16 to 32 375 10
g2-standard-8 1 24 8 32 32 to 54 375 16
g2-standard-12 1 24 12 48 48 to 54 375 16
g2-standard-16 1 24 16 64 54 to 64 375 32
g2-standard-24 2 48 24 96 96 to 108 750 32
g2-standard-32 1 24 32 128 96 to 128 375 32
g2-standard-48 4 96 48 192 192 to 216 1,500 50
g2-standard-96 8 192 96 384 384 to 432 3,000 100

*GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.

N1 GPU VMs

For N1 general-purpose VMs that have T4 and V100 GPUs attached, you can get a maximum network bandwidth of up to 100 Gbps, based on the combination of GPU and vCPU count. For all other N1 GPU VMs, see Overview.

Review the following section to calculate the maximum network bandwidth that is available for your T4 and V100 VMs based on the GPU model, vCPU, and GPU count.

Less than 5 vCPUs

For T4 and V100 VMs that have 5 vCPUs or less, a maximum network bandwidth of 10 Gbps is available.

More than 5 vCPUs

For T4 and V100 VMs that have more than 5 vCPUs, maximum network bandwidth is calculated based on the number of vCPUs and GPUs for that VM.

To get the higher network bandwidth rates (50 Gbps or higher) applied to most GPU VMs, it is recommended that you use Google Virtual NIC (gVNIC). For more information about creating GPU VMs that use gVNIC, see Creating GPU VMs that use higher bandwidths.

GPU model Number of GPUs Maximum network bandwidth calculation
NVIDIA V100 1 min(vcpu_count * 2, 32)
2 min(vcpu_count * 2, 32)
4 min(vcpu_count * 2, 50)
8 min(vcpu_count * 2, 100)
NVIDIA T4 1 min(vcpu_count * 2, 32)
2 min(vcpu_count * 2, 50)
4 min(vcpu_count * 2, 100)

Create high bandwidth VMs

To create VMs that use higher network bandwidths, see Use higher network bandwidth.

To test or verify the bandwidth speed for any configuration, you can use the benchmarking test. For more information, see Checking network bandwidth.

What's next?