RDMA network profile
This page provides an overview of the RDMA network profile in Google Cloud.
About the RDMA network profile
The RDMA network profile lets you create a Virtual Private Cloud (VPC) network
in which you can run AI workloads on VM instances that have
NVIDIA ConnectX-7 NICs. These NICs support remote direct memory access (RDMA)
connectivity and have the NIC type MRDMA
in Google Cloud.
A VPC network with the RDMA network profile supports low-latency, high-bandwidth RDMA communication between the GPUs of VMs that are created in the network using RDMA over converged ethernet v2 (RoCE v2).
For more information about running AI workloads in Google Cloud, see the AI Hypercomputer documentation.
Specifications
VPC networks created with the RDMA network profile have the following specifications:
- The network only accepts attachments from
MRDMA
NICs. A3 Ultra VMs are the only VM type that supportsMRDMA
NICs. Other NIC types, for example the GVNICs of an A3 Ultra VM, must be attached to a regular VPC network. - The set of features that are supported in the network is pre-configured by Google Cloud to support running AI workloads that require RDMA. VPC networks with the RDMA network profile have more constraints than regular VPC networks. For more information, see Supported and unsupported features.
The network is constrained to the zone of the network profile that you specify when you create the network. For example, any instances that you create in the network must be created in zone of the network profile. Additionally, any subnets that you create in the network must be in the region that corresponds to the zone of the network profile.
The RDMA network profile is not available in all zones. To view the zones in which the network profile is available, see Supported zones. You can also view the zone-specific instances of the network profile that are available by listing network profiles.
The resource name of the RDMA network profile that you specify when you create the network has the following format
ZONE-vpc-roce
, for exampleeurope-west1-b-vpc-roce
.The default MTU in a VPC network created with the RDMA network profile is
8896
. This default gives the RDMA driver in the VM's guest OS the flexibility to use an appropriate MTU. The default MTU in regular VPC networks (1460
) might be too small for some RDMA workloads. For best performance, Google recommends that you don't change the default MTU.
Supported zones
The RDMA network profile is available in the following zones:
europe-west1-b
us-east7-c
Supported and unsupported features
This section describes the supported and unsupported features in VPC networks created with the RDMA network profile.
Features of regular VPC networks are supported unless they are configured to be disabled by the network profile, are dependent on a feature that is disabled by the network profile, or don't apply to traffic from RDMA NICs as described this section.
Features configured by the network profile
This table lists the specific features that are configured by the network profile resource and describes whether they are supported or not supported in VPC networks created with the RDMA network profile. It includes the network profile property values set by Google Cloud.
Feature | Supported | Property name | Property value | Details |
---|---|---|---|---|
MRDMA NICs |
interfaceTypes |
MRDMA |
The network supports only The network doesn't support other NIC types, such as |
|
Multi-NIC in the same network | allowMultiNicInSameNetwork |
MULTI_NIC_IN_SAME_NETWORK_ALLOWED |
The network supports multi-NIC VMs where different NICs of the same VM can attach to the same VPC network. The NICs must attach to different subnets in the network, however.
See Performance considerations for multi-NIC in the same VPC network. |
|
IPv4-only subnets | allowedSubnetStackTypes |
SUBNET_STACK_TYPE_IPV4_ONLY |
The network supports IPv4-only subnets, including the same Valid IPv4 ranges as regular VPC networks. The network doesn't support dual-stack or IPv6-only subnets. For more information, see Types of subnets. |
|
PRIVATE subnet purpose |
allowedSubnetPurposes |
SUBNET_PURPOSE_PRIVATE |
The network supports regular subnets, which have a purpose of
The network doesn't support Private Service Connect subnets, proxy-only subnets, or Private NAT subnets. For more information, see Purposes of subnets. |
|
GCE_ENDPOINT address purpose |
addressPurposes |
GCE_ENDPOINT |
The network supports IP addresses with a purpose of The network doesn't support special purpose IP addresses, such as
the |
|
External IP addresses for VMs | allowExternalIpAccess |
EXTERNAL_IP_ACCESS_BLOCKED |
The network doesn't support assigning external IP addresses to VMs. NICs connected to the network can't reach the public internet. | |
Alias IP ranges | allowAliasIpRanges |
ALIAS_IP_RANGE_BLOCKED |
The network doesn't support using alias IP ranges, including secondary IPv4 address ranges, which can only be used by alias IP ranges. | |
Auto mode | allowAutoModeSubnet |
AUTO_MODE_SUBNET_BLOCKED |
The subnet creation mode of the VPC network can't be set to auto mode. | |
VPC Network Peering | allowVpcPeering |
VPC_PEERING_BLOCKED |
The network doesn't support VPC Network Peering. Additionally, the network doesn't support private services access, which relies on VPC Network Peering. | |
Static routes | allowStaticRoutes |
STATIC_ROUTES_BLOCKED |
The network doesn't support static routes. | |
Packet Mirroring | allowPacketMirroring |
PACKET_MIRRORING_BLOCKED |
The network doesn't support Packet Mirroring. | |
Cloud NAT | allowCloudNat |
CLOUD_NAT_BLOCKED |
The network doesn't support Cloud NAT. | |
Cloud Router | allowCloudRouter |
CLOUD_ROUTER_BLOCKED |
The network doesn't support creating Cloud Routers. | |
Cloud Interconnect | allowInterconnect |
INTERCONNECT_BLOCKED |
The network doesn't support Cloud Interconnect. | |
Cloud VPN | allowVpn |
VPN_BLOCKED |
The network doesn't support Cloud VPN. | |
Cloud Load Balancing | allowLoadBalancing |
LOAD_BALANCING_BLOCKED |
The network doesn't support Cloud Load Balancing. You can't create load balancers in the network. Additionally, you can't use Google Cloud Armor in the network, because Google Cloud Armor security policies apply only to load balancers and VMs with external IP addresses. | |
Private Google Access | allowPrivateGoogleAccess |
PRIVATE_GOOGLE_ACCESS_BLOCKED |
The network doesn't support Private Google Access. | |
Private Service Connect | allowPsc |
PSC_BLOCKED |
The network doesn't support any Private Service Connect configurations. |
Additional features that don't apply to traffic from RDMA NICs
Because VPC networks with the RDMA network profile are optimized for performance, some features of regular VPC networks that are available for traffic of other protocols don't apply to any traffic in a network with the RDMA network profile, such as the following:
While Google Cloud doesn't prevent you from configuring these features, they aren't effective in VPC networks with the RDMA network profile.
Performance considerations for multi-NIC in the same VPC network
To support workloads that benefit from cross-rail GPU-to-GPU communication, the
RDMA network profile lets you create VMs that have multiple MRDMA
NICs
attached to the same network. However, cross-rail
connectivity might affect network performace, such as through
increased latency. VMs that have MRDMA
NICs use NCCL,
which attempts to rail-align all network transfers even for cross-rail
communication, for example by using PXN to copy data through NVlink to a
rail-aligned GPU prior to transferring over the network.