Prerequisites for VMs and MIGs

This document describes the prerequisites required for creating A3 Ultra VMs and managed instance groups (MIGs) that are deployed on Hypercompute Cluster. For more information about Hypercompute Cluster, see Hypercompute Cluster.

These prerequisites are required if you want to create VMs using any of the following methods:

Creating VMs is ideal if you don't need a GKE or Slurm orchestrator or if you want to set up an environment that uses a custom orchestrator.

Before you begin

  • Select the tab for how you plan to use the samples on this page:

    gcloud

    In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

    REST

    To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

      Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init

    For more information, see Authenticate for using REST in the Google Cloud authentication documentation.

Overview

To prepare for creating your VMs, complete the following tasks:

  1. Optional: Create a placement policy.
  2. Create an instance template. To create an instance template, you have the following options:

Optional: Create a compact placement policy

When you requested capacity, you were provisioned multiple blocks of resources. However to reduce latency you can optionally use a placement policy to specify that you only want to use a single block of resources for your VMs. To consume resources from a single block, create a placement policy and set the maxDistance flag to 2.

To create a compact placement policy, select one of the following options:

gcloud

To create a compact placement policy, use the gcloud beta compute resource-policies create group-placement command:

gcloud beta compute resource-policies create group-placement POLICY_NAME \
    --collocation=collocated \
    --max-distance=2 \
    --region=REGION

Replace the following:

  • POLICY_NAME: the name of the compact placement policy.
  • REGION: the region where you want to create the placement policy. For the preview, the only supported region is europe-west1.

REST

To create a compact placement policy by making a POST request to the beta resourcePolicies.insert method. In the request body, include the collocation field set to COLLOCATED, and the maxDistance field set to the maximum distance between the VMs.

POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/resourcePolicies
  {
    "name": "POLICY_NAME",
    "groupPlacementPolicy": {
      "collocation": "COLLOCATED",
      "maxDistance": 2
    }
  }

Replace the following:

  • PROJECT_ID: your project ID
  • POLICY_NAME: the name of the compact placement policy.
  • REGION: the region where you want to create the placement policy. For the preview, the only supported region is europe-west1.

Placement policy and VM limits

The number of VMs that a policy can apply to depends on the maximum distance specified. The following table shows the maximum number of VMs that can be provisioned for each maximum distance value:

Machine type Maximum distance value Description Maximum number of VMs supported
A3 Ultra 2 Consume resources from any single block 224

Create instance template

A3 Ultra VMs have ten NICs: two Google Virtual NICs (gVNIC) and eight MRDMA NICs. You can choose to create VMs that use all ten NICs or create VMs that only use a single gVNIC.

Things to bear in mind when creating instance templates that run on reservation blocks:

  • The provisioning model must be reservation-bound. For more information about this provisioning model, see reservation-bound.
  • The reservation affinity must be specific.
  • You can limit the run time of a VM as long as the run time duration is shorter than the reservation end date. If neither max-run-duration nor termination-time is specified, the VMs terminationTimestamp is set to the reservation end date.

Create single NIC instance template

If your use case does not require the use of the multiple NICs that are available with the A3 Ultra machine type, you can create an instance template for VMs that use a single NIC by completing the following steps. However, if you need multi-NICs, see Create multi-NIC instance template.

gcloud

To create a regional instance template, use the gcloud beta compute instance-templates create command:

If you chose to use a compact placement policy, also add the following flag: --resource-policies=POLICY_NAME. Replace POLICY_NAME with the name of the compact placement policy.

gcloud beta compute instance-templates create INSTANCE_TEMPLATE_NAME  \
    --machine-type=MACHINE_TYPE \
    --image-family=IMAGE_FAMILY \
    --image-project=IMAGE_PROJECT \
    --reservation-affinity=specific \
    --reservation=RESERVATION \
    --provisioning-model=RESERVATION_BOUND \
    --instance-termination-action=DELETE \
    --instance-template-region=REGION \
    --boot-disk-type=hyperdisk-balanced \
    --boot-disk-size=DISK_SIZE \
    --scopes=cloud-platform \
    --network-interface=nic-type=GVNIC

Replace the following:

  • INSTANCE_TEMPLATE_NAME: the name of the instance template.
  • MACHINE_TYPE: the machine type to use for the instance template. For this preview, the only supported machine type is a3-ultragpu-8g.
  • IMAGE_FAMILY: the image family of the OS image that you want to use. For a list of supported operating systems, see Supported operating systems.
  • IMAGE_PROJECT: the project ID of the OS image.
  • RESERVATION: for this value, you can either specify the reservation name or a specific block within a reservation. To get the reservation name or the available blocks, see View capacity. Choose one of the following:
    Reservation value When to use
    RESERVATION_NAME

    For example: exr-5010-01

    • If you are using a placement policy. The placement policy will be applied to the reservation and the VMs are placed on a single block.
    • If you aren't using a placement policy and are ok with VMs placed anywhere in your reservation.
    RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME

    For example: exr-5010-01/reservationBlocks/exr-5010-01-block-1

    • If you aren't using a placement policy and want your VMs to be placed in a specific block.
  • REGION: the region where you want to create the instance template. For the preview, the only supported region is europe-west1.
  • DISK_SIZE: the size of the boot disk in GB.

REST

To create a regional instance template, make a POST request to the regionInstanceTemplates.insert method as follows:

If you chose to use a compact placement policy, also add the placement policy parameter to the request body.

  POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/instanceTemplates
  {
    "name":"INSTANCE_TEMPLATE_NAME",
    "properties":{
       "disks":[
          {
             "boot":true,
             "initializeParams":{
                "diskSizeGb":"DISK_SIZE",
                "diskType":"hyperdisk-balanced",
                "sourceImage":"projects/IMAGE_PROJECT/global/images/family/IMAGE_FAMILY"
             },
             "mode":"READ_WRITE",
             "type":"PERSISTENT"
          }
       ],
       "machineType":"MACHINE_TYPE",
       "networkInterfaces":[
          {
            "network":"NETWORK",
            "nicType":"GVNIC"
          }
          ],
       "reservationAffinity":{
          "consumeReservationType":"SPECIFIC_RESERVATION",
          "key":"compute.googleapis.com/reservation-name",
          "values":[
             "RESERVATION"
          ],
          "scheduling":{
             "provisioningModel":"RESERVATION_BOUND",
             "instanceTerminationAction": "DELETE"
             "automaticRestart":true
          }
       }
    }
 }

Replace the following:

  • INSTANCE_TEMPLATE_NAME: the name of the instance template.
  • MACHINE_TYPE: the machine type to use for the instance template. For this preview, the only supported machine type is a3-ultragpu-8g.
  • IMAGE_FAMILY: the image family of the OS image that you want to use. For a list of supported operating systems, see Supported operating systems.
  • IMAGE_PROJECT: the project ID of the OS image.
  • RESERVATION: for this value, you can either specify the the reservation name or a specific block within a reservation. To get the reservation name or the available blocks, see View capacity. Choose one of the following:
    Reservation value When to use
    RESERVATION_NAME

    For example: exr-5010-01

    • If you are using a placement policy. The placement policy will be applied to the reservation and the VMs are placed on a single block.
    • If you aren't using a placement policy and are ok with VMs placed anywhere in your reservation.
    RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME

    For example: exr-5010-01/reservationBlocks/exr-5010-01-block-1

    • If you aren't using a placement policy and want your VMs to be placed in a specific block.
  • REGION: the region where you want to create the instance template. For the preview, the only supported region is europe-west1.
  • DISK_SIZE: the size of the boot disk in GB.
Compact placement policy

If you chose to use a compact placement policy, also add the following flag to the request body:

  "resourcePolicies": [
    "projects/PROJECT_ID/regions/REGION/resourcePolicies/POLICY_NAME"
  ],

Replace the following:

  • PROJECT_ID: the project ID of the compact placement policy.
  • REGION: the region of the compact placement policy.
  • POLICY_NAME: the name of the compact placement policy.

Create multi-NIC instance template

To create an instance template for VMs that uses the multiple NICs that are available with the A3 Ultra machine type, complete the following steps:

  1. Create Virtual Private Cloud (VPC) networks
  2. Create the instance template

Create VPC networks

A3 Ultra VMs have ten NICs: two for the host machine and eight for the GPUs. To use these multi-NICs, you need to create three Virtual Private Cloud networks as follows:

  • 2 gVNIC networks, each with a subnetwork: these are used for host to host communication. For more information about GVNIC, see Using Google Virtual NIC.
  • 1 RDMA network with 8 subnetworks: these are designed for GPU to GPU communication by using the NVIDIA ConnectX-7 NICs that are available with your A3 Ultra VMs. For more information about the RDMA network profile, see RDMA network profiles.
To set up the networks, you can either use the following instruction guides or use the provided script.

Instruction guides

To create the networks, you can use the following instructions:

Script

To create the networks, you can use the following script.

  #!/bin/bash

  # Create standard VPCs (network and subnets) for the GVNICs
  for N in $(seq 0 1); do
    gcloud beta compute networks create GVNIC_NAME_PREFIX-net-$N \
      --subnet-mode=custom

    gcloud beta compute networks subnets create GVNIC_NAME_PREFIX-sub-$N \
      --network=GVNIC_NAME_PREFIX-net-$N \
      --region=REGION \
      --range=10.$N.0.0/16

    gcloud beta compute firewall-rules create GVNIC_NAME_PREFIX-internal-$N \
      --network=GVNIC_NAME_PREFIX-net-$N \
      --action=ALLOW \
      --rules=tcp:0-65535,udp:0-65535,icmp \
      --source-ranges=10.0.0.0/8
  done

  # Create SSH firewall rules
  gcloud beta compute firewall-rules create GVNIC_NAME_PREFIX-ssh \
    --network=GVNIC_NAME_PREFIX-net-0 \
    --action=ALLOW \
    --rules=tcp:22 \
    --source-ranges=IP_RANGE

  # Assumes that an external IP is only created for vNIC 0
  gcloud beta compute firewall-rules create GVNIC_NAME_PREFIX-allow-ping-net-0 \
    --network=GVNIC_NAME_PREFIX-net-0 \
    --action=ALLOW \
    --rules=icmp \
    --source-ranges=IP_RANGE

  # List and make sure network profiles exist
  gcloud beta compute network-profiles list

  # Create network for CX-7
  gcloud beta compute networks create RDMA_NAME_PREFIX-mrdma \
    --network-profile=ZONE-vpc-roce \
    --subnet-mode custom

  # Create subnets.
  for N in $(seq 0 7); do
    gcloud beta compute networks subnets create RDMA_NAME_PREFIX-mrdma-sub-$N \
      --network=RDMA_NAME_PREFIX-mrdma \
      --region=REGION \
      --range=10.$((N+2)).0.0/16  # offset to avoid overlap with gvnics
  done
  

Replace the following:

  • GVNIC_NAME_PREFIX: the name prefix to use for the standard Virtual Private Cloud networks and subnets that use GVNIC NICs.
  • RDMA_NAME_PREFIX: the name prefix to use for the Virtual Private Cloud networks and subnets that use RDMA NICs.
  • ZONE: the zone where you want to create the networks. For the preview, the only supported zone is europe-west1-b.
  • REGION: the region where you want to create the networks. This must correspond to the zone specified. For example, if your zone is europe-west1-b, then your region is europe-west1.
  • IP_RANGE: the IP range to use for the SSH firewall rules.

Create multi-NIC instance template

gcloud

To create a regional instance template, use the gcloud beta compute instance-templates create command:

If you chose to use a compact placement policy, also add the following flag: --resource-policies=POLICY_NAME. Replace POLICY_NAME with the name of the compact placement policy.

gcloud beta compute instance-templates create INSTANCE_TEMPLATE_NAME  \
    --machine-type=MACHINE_TYPE \
    --image-family=IMAGE_FAMILY \
    --image-project=IMAGE_PROJECT \
    --reservation-affinity=specific \
    --reservation=RESERVATION \
    --provisioning-model=RESERVATION_BOUND \
    --instance-termination-action=DELETE \
    --instance-template-region=REGION \
    --boot-disk-type=hyperdisk-balanced \
    --boot-disk-size=DISK_SIZE \
    --scopes=cloud-platform \
    --network-interface=nic-type=GVNIC,network=GVNIC_NAME_PREFIX-net-0,subnet=GVNIC_NAME_PREFIX-sub-0 \
    --network-interface=nic-type=GVNIC,network=GVNIC_NAME_PREFIX-net-1,subnet=GVNIC_NAME_PREFIX-net-1,no-address \
    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-0,no-address \
    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-1,no-address \
    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-2,no-address \
    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-3,no-address \
    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-4,no-address \
    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-5,no-address \
    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-6,no-address \
    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-7,no-address

Replace the following:

  • INSTANCE_TEMPLATE_NAME: the name of the instance template.
  • MACHINE_TYPE: the machine type to use for the instance template. For this preview, the only supported machine type is a3-ultragpu-8g.
  • IMAGE_FAMILY: the image family of the OS image that you want to use. For a list of supported operating systems, see Supported operating systems.
  • IMAGE_PROJECT: the project ID of the OS image.
  • RESERVATION: for this value, you can either specify the reservation name or a specific block within a reservation. To get the reservation name or the available blocks, see View capacity. Choose one of the following:
    Reservation value When to use
    RESERVATION_NAME

    For example: exr-5010-01

    • If you are using a placement policy. The placement policy will be applied to the reservation and the VMs are placed on a single block.
    • If you aren't using a placement policy and are ok with VMs placed anywhere in your reservation.
    RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME

    For example: exr-5010-01/reservationBlocks/exr-5010-01-block-1

    • If you aren't using a placement policy and want your VMs to be placed in a specific block.
  • REGION: the region where you want to create the instance template. For the preview, the only supported region is europe-west1.
  • DISK_SIZE: the size of the boot disk in GB.

REST

To create a regional instance template, make a POST request to the regionInstanceTemplates.insert method as follows:

If you chose to use a compact placement policy, also add the placement policy parameter to the request body.

  POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/instanceTemplates
  {
    "name":"INSTANCE_TEMPLATE_NAME",
    "properties":{
       "disks":[
          {
             "boot":true,
             "initializeParams":{
                "diskSizeGb":"DISK_SIZE",
                "diskType":"hyperdisk-balanced",
                "sourceImage":"projects/IMAGE_PROJECT/global/images/family/IMAGE_FAMILY"
             },
             "mode":"READ_WRITE",
             "type":"PERSISTENT"
          }
       ],
       "machineType":"MACHINE_TYPE",
       "networkInterfaces": [
      {
        "accessConfigs": [
          {
            "name": "external-nat",
            "type": "ONE_TO_ONE_NAT"
          }
        ],
        "network": "projects/PROJECT_ID/global/networks/GVNIC_NAME_PREFIX-net-0",
        "nicType": "GVNIC",
        "subnetwork": "projects/PROJECT_ID/region/REGION/subnetworks/GVNIC_NAME_PREFIX-sub-0"
      },
      {
        "network": "projects/PROJECT_ID/global/networks/GVNIC_NAME_PREFIX-net-1",
        "nicType": "GVNIC",
        "subnetwork": "projects/PROJECT_ID/region/REGION/subnetworks/GVNIC_NAME_PREFIX-sub-1"
      },
      {
        "network": "projects/PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",
        "nicType": "MRDMA",
        "subnetwork": "projects/PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-0"
      },
      {
        "network": "projects/PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",
        "nicType": "MRDMA",
        "subnetwork": "projects/PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-1"
      },
      {
        "network": "projects/PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",
        "nicType": "MRDMA",
        "subnetwork": "projects/PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-2"
      },
      {
        "network": "projects/PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",
        "nicType": "MRDMA",
        "subnetwork": "projects/PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-3"
      },
      {
        "network": "projects/PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",
        "nicType": "MRDMA",
        "subnetwork": "projects/PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-4"
      },
      {
        "network": "projects/PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",
        "nicType": "MRDMA",
        "subnetwork": "projects/PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-5"
      },
      {
        "network": "projects/PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",
        "nicType": "MRDMA",
        "subnetwork": "projects/PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-6"
      },
      {
        "network": "projects/PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",
        "nicType": "MRDMA",
        "subnetwork": "projects/PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-7"
      }
      ],
       "reservationAffinity":{
          "consumeReservationType":"SPECIFIC_RESERVATION",
          "key":"compute.googleapis.com/reservation-name",
          "values":[
             "RESERVATION"
          ],
          "scheduling":{
             "provisioningModel":"RESERVATION_BOUND",
             "instanceTerminationAction": "DELETE"
             "automaticRestart":true
          }
       }
    }
 }

Replace the following:

  • INSTANCE_TEMPLATE_NAME: the name of the instance template.
  • MACHINE_TYPE: the machine type to use for the instance template. For this preview, the only supported machine type is a3-ultragpu-8g.
  • IMAGE_FAMILY: the image family of the OS image that you want to use. For a list of supported operating systems, see Supported operating systems.
  • IMAGE_PROJECT: the project ID of the OS image.
  • RESERVATION: for this value, you can either specify the the reservation name or a specific block within a reservation. To get the reservation name or the available blocks, see View capacity. Choose one of the following:
    Reservation value When to use
    RESERVATION_NAME

    For example: exr-5010-01

    • If you are using a placement policy. The placement policy will be applied to the reservation and the VMs are placed on a single block.
    • If you aren't using a placement policy and are ok with VMs placed anywhere in your reservation.
    RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME

    For example: exr-5010-01/reservationBlocks/exr-5010-01-block-1

    • If you aren't using a placement policy and want your VMs to be placed in a specific block.
  • REGION: the region where you want to create the instance template. For the preview, the only supported region is europe-west1.
  • DISK_SIZE: the size of the boot disk in GB.
Compact placement policy

If you chose to use a compact placement policy, also add the following flag to the request body:

  "resourcePolicies": [
    "projects/PROJECT_ID/regions/REGION/resourcePolicies/POLICY_NAME"
  ],

Replace the following:

  • PROJECT_ID: the project ID of the compact placement policy.
  • REGION: the region of the compact placement policy.
  • POLICY_NAME: the name of the compact placement policy.

What's next?