RDMA network profile

This page provides an overview of the RDMA network profile in Google Cloud.

About the RDMA network profile

The RDMA network profile lets you create a Virtual Private Cloud (VPC) network in which you can run AI workloads on VM instances that have NVIDIA ConnectX-7 NICs. These NICs support remote direct memory access (RDMA) connectivity and have the NIC type MRDMA in Google Cloud.

A VPC network with the RDMA network profile supports low-latency, high-bandwidth RDMA communication between the GPUs of VMs that are created in the network using RDMA over converged ethernet v2 (RoCE v2).

For more information about running AI workloads in Google Cloud, see the AI Hypercomputer documentation.

Specifications

VPC networks created with the RDMA network profile have the following specifications:

  • The network only accepts attachments from MRDMA NICs. A3 Ultra VMs are the only VM type that supports MRDMA NICs. Other NIC types, for example the GVNICs of an A3 Ultra VM, must be attached to a regular VPC network.
  • The set of features that are supported in the network is pre-configured by Google Cloud to support running AI workloads that require RDMA. VPC networks with the RDMA network profile have more constraints than regular VPC networks. For more information, see Supported and unsupported features.
  • The network is constrained to the zone of the network profile that you specify when you create the network. For example, any instances that you create in the network must be created in zone of the network profile. Additionally, any subnets that you create in the network must be in the region that corresponds to the zone of the network profile.

    The RDMA network profile is not available in all zones. To view the zones in which the network profile is available, see Supported zones. You can also view the zone-specific instances of the network profile that are available by listing network profiles.

  • The resource name of the RDMA network profile that you specify when you create the network has the following format ZONE-vpc-roce, for example europe-west1-b-vpc-roce.

  • The default MTU in a VPC network created with the RDMA network profile is 8896. This default gives the RDMA driver in the VM's guest OS the flexibility to use an appropriate MTU. The default MTU in regular VPC networks (1460) might be too small for some RDMA workloads. For best performance, Google recommends that you don't change the default MTU.

Supported zones

The RDMA network profile is available in the following zones:

  • europe-west1-b
  • us-east7-c

Supported and unsupported features

This section describes the supported and unsupported features in VPC networks created with the RDMA network profile.

Features of regular VPC networks are supported unless they are configured to be disabled by the network profile, are dependent on a feature that is disabled by the network profile, or don't apply to traffic from RDMA NICs as described this section.

Features configured by the network profile

This table lists the specific features that are configured by the network profile resource and describes whether they are supported or not supported in VPC networks created with the RDMA network profile. It includes the network profile property values set by Google Cloud.

Feature Supported Property name Property value Details
MRDMA NICs interfaceTypes MRDMA

The network supports only MRDMA NICs used by A3 Ultra VMs.

The network doesn't support other NIC types, such as GVNIC or VIRTIO_NET.

Multi-NIC in the same network allowMultiNicInSameNetwork MULTI_NIC_IN_SAME_NETWORK_ALLOWED The network supports multi-NIC VMs where different NICs of the same VM can attach to the same VPC network. The NICs must attach to different subnets in the network, however.

See Performance considerations for multi-NIC in the same VPC network.

IPv4-only subnets allowedSubnetStackTypes SUBNET_STACK_TYPE_IPV4_ONLY

The network supports IPv4-only subnets, including the same Valid IPv4 ranges as regular VPC networks.

The network doesn't support dual-stack or IPv6-only subnets. For more information, see Types of subnets.

PRIVATE subnet purpose allowedSubnetPurposes SUBNET_PURPOSE_PRIVATE

The network supports regular subnets, which have a purpose of PRIVATE.

The network doesn't support Private Service Connect subnets, proxy-only subnets, or Private NAT subnets. For more information, see Purposes of subnets.

GCE_ENDPOINT address purpose addressPurposes GCE_ENDPOINT

The network supports IP addresses with a purpose of GCE_ENDPOINT, which is used for internal IP addresses assigned to VM instances.

The network doesn't support special purpose IP addresses, such as the SHARED_LOADBALANCER_VIP purpose used in Cloud Load Balancing. For more information, see the address resource reference.

External IP addresses for VMs allowExternalIpAccess EXTERNAL_IP_ACCESS_BLOCKED The network doesn't support assigning external IP addresses to VMs. NICs connected to the network can't reach the public internet.
Alias IP ranges allowAliasIpRanges ALIAS_IP_RANGE_BLOCKED The network doesn't support using alias IP ranges, including secondary IPv4 address ranges, which can only be used by alias IP ranges.
Auto mode allowAutoModeSubnet AUTO_MODE_SUBNET_BLOCKED The subnet creation mode of the VPC network can't be set to auto mode.
VPC Network Peering allowVpcPeering VPC_PEERING_BLOCKED The network doesn't support VPC Network Peering. Additionally, the network doesn't support private services access, which relies on VPC Network Peering.
Static routes allowStaticRoutes STATIC_ROUTES_BLOCKED The network doesn't support static routes.
Packet Mirroring allowPacketMirroring PACKET_MIRRORING_BLOCKED The network doesn't support Packet Mirroring.
Cloud NAT allowCloudNat CLOUD_NAT_BLOCKED The network doesn't support Cloud NAT.
Cloud Router allowCloudRouter CLOUD_ROUTER_BLOCKED The network doesn't support creating Cloud Routers.
Cloud Interconnect allowInterconnect INTERCONNECT_BLOCKED The network doesn't support Cloud Interconnect.
Cloud VPN allowVpn VPN_BLOCKED The network doesn't support Cloud VPN.
Cloud Load Balancing allowLoadBalancing LOAD_BALANCING_BLOCKED The network doesn't support Cloud Load Balancing. You can't create load balancers in the network. Additionally, you can't use Google Cloud Armor in the network, because Google Cloud Armor security policies apply only to load balancers and VMs with external IP addresses.
Private Google Access allowPrivateGoogleAccess PRIVATE_GOOGLE_ACCESS_BLOCKED The network doesn't support Private Google Access.
Private Service Connect allowPsc PSC_BLOCKED The network doesn't support any Private Service Connect configurations.

Additional features that don't apply to traffic from RDMA NICs

Because VPC networks with the RDMA network profile are optimized for performance, some features of regular VPC networks that are available for traffic of other protocols don't apply to any traffic in a network with the RDMA network profile, such as the following:

While Google Cloud doesn't prevent you from configuring these features, they aren't effective in VPC networks with the RDMA network profile.

Performance considerations for multi-NIC in the same VPC network

To support workloads that benefit from cross-rail GPU-to-GPU communication, the RDMA network profile lets you create VMs that have multiple MRDMA NICs attached to the same network. However, cross-rail connectivity might affect network performace, such as through increased latency. VMs that have MRDMA NICs use NCCL, which attempts to rail-align all network transfers even for cross-rail communication, for example by using PXN to copy data through NVlink to a rail-aligned GPU prior to transferring over the network.

What's next