Create a MIG with H4D machine types and flex-start


This tutorial shows you how to create a managed instance group (MIG) that uses an H4D HPC-optimized machine type. The MIG uses the Dynamic Workload Scheduler flex-start consumption model to obtain compute resources for up to seven days.

Creating a MIG lets you manage multiple virtual machines (VMs) as a single entity. Each VM in a MIG is based on an instance template. By automatically managing the VMs in the group, MIGs offer high availability and scalability. To learn more about MIGs, see Managed instance groups.

To learn about HPC VM and HPC cluster creation options, see Overview of HPC cluster creation.

This tutorial is intended for HPC engineers, platform administrators and operators, and for data and MPI specialists who are interested in creating a group of interconnected HPC instances for short duration workloads. The resulting instances don't use an orchestrator for instance management or job scheduling.

Objectives

  1. Optional: Request preemptible quota.
  2. Optional: Create Virtual Private Cloud networks.
  3. Create an instance template.
  4. Create a MIG and a resize request.
  5. Clean up.

Costs

This tutorial uses billable components of Google Cloud, including:

To generate a cost estimate based on your projected usage, use the Pricing Calculator.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. Install the Google Cloud CLI.

  3. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  4. To initialize the gcloud CLI, run the following command:

    gcloud init
  5. Create or select a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the required API:

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    gcloud services enable compute.googleapis.com
  8. Install the Google Cloud CLI.

  9. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  10. To initialize the gcloud CLI, run the following command:

    gcloud init
  11. Create or select a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  12. Verify that billing is enabled for your Google Cloud project.

  13. Enable the required API:

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    gcloud services enable compute.googleapis.com
  14. Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/compute.instanceAdmin.v1,roles/compute.networkAdmin

    gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE

    Replace the following:

    • PROJECT_ID: your project ID.
    • USER_IDENTIFIER: the identifier for your user account—for example, myemail@example.com.
    • ROLE: the IAM role that you grant to your user account.

Optional: Request preemptible quota

The VM instances added to the MIG consume regional quota. VM instance, instance group, CPU, and disk quotas can be consumed by any VM instance in the region, regardless of zone.

The quota can be either standard quota or preemptible quota when using Flex-start.

  • Standard quota: If your project does not have preemptible quota, and you have never requested preemptible quota, then the instance resources consume standard quota.
  • Preemptible quota: Requesting preemptible quotas can help you improve quota obtainability by providing separate quotas for temporary resources. However, after Compute Engine grants you preemptible quota in a region, all applicable resources consume only preemptible quota. If this quota is depleted, you must request additional preemptible quota for the VM resources.

You can request preemptible quota by following the steps documented in Request a quota adjustment.

Types of quota needed

To use instance groups, you must have available quota for all the resources that the group uses (for example, CPU quota) and available quota for the group resource itself. For H4D instances, the following quota types might be used, depending on the machine type used by the instances:

Resource Standard quota Preemptible quota
CPUs CPUS_PER_VM_FAMILY Preemptible CPUs
Local SSDs Local SSD per machine family (GB) Preemptible Local SSDs (GB)

To create the resources in this tutorial, the following additional regional quota might be required:

  • Zonal (single-zone) managed instance group: Instance group managers and Instance groups
  • Google Cloud Hyperdisk:

    • Hyperdisk Balanced Capacity (GB)
    • Hyperdisk Balanced Throughput (MB/s)
    • Hyperdisk Balanced IOPS

Optional: Create VPC networks

Unless you choose to disable it, each project has a default network, which can be used to provide network connectivity for your instances. When you create a VM, you can specify a VPC network and subnet. If you omit this configuration, the default network and subnet are used.

H4D instances can be configured to use Cloud RDMA. Cloud RDMA enables low-latency reliable messaging capabilities by using an IRDMA network driver that supports Remote Direct Memory Access (RDMA) between Compute Engine instances.

For this tutorial:

  • If you want to configure the H4D instances to use Cloud RDMA, complete the steps in this section.
  • If you don't want to use Cloud RDMA, then you can skip this section and use the default network instead.

RDMA-enabled instances require a minimum of two network interfaces (NICs):

  • NIC type GVNIC: uses the gve driver for TCP/IP and Internet traffic for normal VM-VM and VM-Internet communication.
  • NIC type IRDMA: uses IDPF/iRDMA drivers for Cloud RDMA networking between instances.

Instances that use Cloud RDMA can have only one IRDMA interface. You can add up to eight additional GVNIC network interfaces for a total of 10 NICs per instance.

To set up the Falcon VPC networks to use with your instances, you can either follow the documented instructions or use the provided script.

Instruction guides

To create the networks, you can use the following instructions:

Script

You can create up to nine GVNIC network interfaces and one IRDMA network interface per instance. Each network interface must attach to a separate network. To create the networks, you can use the following script, which creates two networks for GVNIC and one network for IRDMA.

  1. Optional: Before running the script, list the RDMA network profiles to verify there is one available.
      gcloud beta compute network-profiles list
      
  2. Copy the following code and run it in a Linux shell window.

      #!/bin/bash
      # Set the number of GVNIC interfaces to create. You can create up to 9.
      NUM_GVNIC=NUMBER_OF_GVNIC
    
      # Create standard VPC (networks and subnets) for the GVNIC interfaces
        for N in $(seq 0 $(($NUM_GVNIC - 1))); do
          gcloud compute networks create GVNIC_NAME_PREFIX-net-$N \
              --subnet-mode=custom
    
          gcloud compute networks subnets create GVNIC_NAME_PREFIX-sub-$N \
              --network=GVNIC_NAME_PREFIX-net-$N \
              --region=REGION \
              --range=10.$N.0.0/16
    
          gcloud compute firewall-rules create GVNIC_NAME_PREFIX-internal-$N \
              --network=GVNIC_NAME_PREFIX-net-$N \
              --action=ALLOW \
              --rules=tcp:0-65535,udp:0-65535,icmp \
              --source-ranges=10.0.0.0/8
      done
    
      # Create SSH firewall rules
      gcloud compute firewall-rules create GVNIC_NAME_PREFIX-ssh \
          --network=GVNIC_NAME_PREFIX-net-0 \
          --action=ALLOW \
          --rules=tcp:22 \
          --source-ranges=IP_RANGE
    
      # Optional: Create a firewall rule for the external IP address for the
      #  first GVNIC network interface
      gcloud compute firewall-rules create GVNIC_NAME_PREFIX-allow-ping-net-0 \
          --network=GVNIC_NAME_PREFIX-net-0 \
          --action=ALLOW \
          --rules=icmp \
          --source-ranges=IP_RANGE
    
      # Create a network for the RDMA over Falcon network interface
      gcloud beta compute networks create RDMA_NAME_PREFIX-irdma \
          --network-profile=ZONE-vpc-falcon \
          --subnet-mode custom
    
      # Create a subnet for the RDMA network
      gcloud beta compute networks subnets create RDMA_NAME_PREFIX-irdma-sub \
          --network=RDMA_NAME_PREFIX-irdma \
          --region=REGION \
          --range=10.2.0.0/16  # offset to avoid overlap with GVNIC subnet ranges
      

    Replace the following:

    • NUMBER_OF_GVNIC: the number of GVNIC interfaces to create. Specify a number from 1 to 9.
    • GVNIC_NAME_PREFIX: the name prefix to use for the standard VPC network and subnet that uses a GVNIC NIC type.
    • REGION: the region where you want to create the networks. This must correspond to the zone specified for the --network-profile flag, when creating the RDMA network. For example, if you specify the zone as europe-west4-b, then your region is europe-west4.
    • IP_RANGE: the range of IP addresses outside of the VPC network to use for the SSH firewall rules. As a best practice, specify the specific IP address ranges that you need to allow access from, rather than all IPv4 or IPv6 sources. Don't use 0.0.0.0/0 or ::/0 as a source range because this allows traffic from all IPv4 or IPv6 sources, including sources outside of Google Cloud.
    • RDMA_NAME_PREFIX: the name prefix to use for the VPC network and subnet that uses the IRDMA NIC type.
    • ZONE: the zone where you want to create the networks and compute instances. Use either us-central1-a or europe-west4-b.
  3. Optional: To verify that the VPC network resources are created successfully, check the network settings in the Google Cloud console:

    1. In the Google Cloud console, go to the VPC networks page.

      Go to VPC networks

    2. Search the list for the networks that you created in the previous step.
    3. To view the subnets, firewall rules, and other network settings, click the name of the network.

Create an instance template

To use the Flex-start consumption option, you create an empty MIG and then create a resize request for the MIG. When your requested capacity becomes available, Compute Engine provisions it and creates the instances in the MIG. You obtain resources for up to seven days.

To specify the instance and consumption properties for each instance in the MIG, create an instance template by using one of the following methods:

gcloud

To create a regional instance template, use the gcloud beta compute instance-templates create command.

gcloud beta compute instance-templates create INSTANCE_TEMPLATE_NAME \
    --machine-type=MACHINE_TYPE \
    --image-family=IMAGE_FAMILY \
    --image-project=IMAGE_PROJECT \
    --instance-template-region=REGION \
    --boot-disk-type=hyperdisk-balanced \
    --boot-disk-size=DISK_SIZE \
    --scopes=cloud-platform \
    
--network-interface=nic-type=GVNIC, \
      network=GVNIC_NAME_PREFIX-net-0, \
      subnet=GVNIC_NAME_PREFIX-sub-0, \
      stack-type=STACK_TYPE, \
      address=EXTERNAL_IPV4_ADDRESS \
--network-interface=nic-type=GVNIC, \
      network=GVNIC_NAME_PREFIX-net-1, \
      subnet=GVNIC_NAME_PREFIX-sub-1,no-address \
--network-interface=nic-type=IRDMA, \
      network=RDMA_NAME_PREFIX-irdma, \
      subnet=RDMA_NAME_PREFIX-irdma-sub, \
      stack-type=IPV4_ONLY,no-address \
    
    --reservation-affinity=none \
    --instance-termination-action=DELETE \
    --max-run-duration=RUN_DURATION \
    --maintenance-policy=TERMINATE \
    --provisioning-model=FLEX_START

Replace the following:

  • INSTANCE_TEMPLATE_NAME: the name of the instance template.
  • MACHINE_TYPE: the H4D machine type to use for the instance.
  • IMAGE_FAMILY: the image family of the OS image that you want to use. For a list of supported operating systems, see Supported operating systems.
  • IMAGE_PROJECT: the project ID of the OS image.
  • REGION: the region where you want to create the instance template. Specify a region in which the machine type that you want to use is available.
  • DISK_SIZE: the size of the boot disk in GiB.
  • GVNIC_NAME_PREFIX: the name prefix that you used when creating the standard VPC networks and subnets for the gVNIC interfaces.

    If you are using the default network, include only a single --network-interface field with the nic-type field set to GVNIC. Also, omit the network and subnetwork settings for this network interface.

  • STACK_TYPE: Optional: the stack type to use for the gVNIC interface. Specify either IPV4_ONLY or IPV4_IPV6. If you don't specify a value, IPV4_ONLY is used by default.
  • EXTERNAL_IPV4_ADDRESS: Optional: a static external IPv4 address to use with the gVNIC network interface. You must have previously reserved an external IPv4 address. Do one of the following:

    • Specify a valid IPv4 address from the subnet.
    • Use the flag no-address if you don't want the network interface to have an external IP address.
    • Specify address='' if you want the network interface to receive an ephemeral external IP address.

    To specify an external IPv6 address for the GVNIC network interface, use the flag --external-ipv6-address instead.

  • RDMA_NAME_PREFIX: the name prefix that you used when creating the VPC network and subnet for the IRDMA network interface.

    If you are not using Cloud RDMA with your H4D instances, omit the --network-interface field for the IRDMA interface.

  • RUN_DURATION: the duration you want the requested instances to run. You must format the value as the number of days, hours, minutes, or seconds followed by d, h, m, and s respectively. For example, specify 30m for 30 minutes or 1d2h3m4s for one day, two hours, three minutes, and four seconds. The value must be between 10 minutes and seven days.

REST

To create a regional instance template, make a POST request to the beta regionInstanceTemplates.insert method.

POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/instanceTemplates
{
  "name":"INSTANCE_TEMPLATE_NAME",
  "properties":{
    "disks":[
      {
        "boot":true,
        "initializeParams":{
          "diskSizeGb":"DISK_SIZE",
          "diskType":"hyperdisk-balanced",
          "sourceImage":"projects/IMAGE_PROJECT/global/images/family/IMAGE_FAMILY"
        },
        "mode":"READ_WRITE",
        "type":"PERSISTENT"
      }
    ],
    "machineType":"MACHINE_TYPE",
    
        "networkInterfaces": [
        {
          "network": "GVNIC_NAME_PREFIX-net-0",
          "subnetwork": "GVNIC_NAME_PREFIX-sub-0",
          "accessConfigs": [
            {
                "type": "ONE_TO_ONE_NAT",
                "name": "External IP",
                "natIP": "EXTERNAL_IPV4_ADDRESS"
            }
          ],
          "stackType": "IPV4_ONLY",
          "nicType": "GVNIC",
      },
      {
          "network": "GVNIC_NAME_PREFIX-net-1",
          "subnetwork": "GVNIC_NAME_PREFIX-sub-1",
          "stackType": "IPV4_ONLY",
          "nicType": "GVNIC",
      },
      {
          "network": "RDMA_NAME_PREFIX-irdma",
          "subnetwork": "RDMA_NAME_PREFIX-irdma-sub",
          "stackType": "IPV4_ONLY",
          "nicType": "IRDMA",
      }
    ],
,
    "reservationAffinity": {
        "consumeReservationType": "NO_RESERVATION"
      },
    "scheduling": {
        "instanceTerminationAction": "DELETE",
        "maxRunDuration": {
          "seconds": RUN_DURATION
        },
        "onHostMaintenance": "TERMINATE",
        "provisioningModel": "FLEX_START"
      }

  }
}

Replace the following:

  • INSTANCE_TEMPLATE_NAME: the name of the instance template.
  • MACHINE_TYPE: the machine type to use for the instance. Specify a H4D machine type. For more information, see H4D machine types.
  • IMAGE_FAMILY: the image family of the OS image that you want to use. For a list of supported operating systems, see Supported operating systems.
  • IMAGE_PROJECT: the project ID of the OS image.
  • REGION: the region where you want to create the instance template. Specify a region in which the machine type that you want to use is available. For information about regions, see Regions and zones.
  • DISK_SIZE: the size of the boot disk in GiB.
  • GVNIC_NAME_PREFIX: the name prefix that you used when creating the standard VPC networks and subnets for the gVNIC interfaces.

    If you are using the default network, include only a single --network-interface field with the nic-type field set to GVNIC. Also, omit the network and subnetwork settings for this network interface.

  • EXTERNAL_IPV4_ADDRESS: Optional: a static external IPv4 address to use with the gVNIC network interface. You must have previously reserved an external IPv4 address.

    To specify an external IPv6 address for the GVNIC network interface, use the flag --external-ipv6-address instead.

  • RDMA_NAME_PREFIX: the name prefix that you used when creating the VPC network and subnet for the IRDMA network interface.

    If you are not using Cloud RDMA with your H4D instances, omit the --network-interface field for the IRDMA interface.

  • RUN_DURATION: the duration, in seconds, you want the requested instances to run. The value must be between 600, which is 600 seconds (10 minutes), and 604800, which is 604,800 seconds (seven days).

After you create the instance template, you can view the template to see its ID and review its instance properties.

Create a MIG with a resize request

To create all the requested Flex-start instances at the same time, create a MIG and then create a resize request in the MIG as described in this section.

Create the MIG

To create the MIG, select one of the following options:

gcloud

Create a zonal or regional MIG as follows:

  • To create a zonal MIG, use the instance-groups managed create command as follows.

        gcloud compute instance-groups managed create MIG_NAME \
            --template=INSTANCE_TEMPLATE_URL \
            --size=0 \
            --default-action-on-vm-failure=do-nothing \
            --zone=ZONE
        
  • To create a regional MIG, use the instance-groups managed create command as follows.

        gcloud compute instance-groups managed create MIG_NAME \
            --template=INSTANCE_TEMPLATE_URL \
            --size=0 \
            --default-action-on-vm-failure=do-nothing \
            --zones=ZONE \
            --target-distribution-shape=any-single-zone \
            --instance-redistribution-type=none
        

REST

Create a zonal or regional MIG as follows:

  • To create a zonal MIG, make a POST request to the instanceGroupManagers.insert method as follows.
          POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instanceGroupManagers
          {
            "versions": [
            {
              "instanceTemplate": "INSTANCE_TEMPLATE_URL"
            }
            ],
            "name": "MIG_NAME",
            "targetSize": 0,
            "instanceLifecyclePolicy": {
                "defaultActionOnFailure": "DO_NOTHING"
            }
          }
         
  • To create a regional MIG, make a POST request to the regionInstanceGroupManagers.insert method as follows.
          POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/regions/REGION/instanceGroupManagers
          {
            "versions": [
              {
                "instanceTemplate": "INSTANCE_TEMPLATE_URL"
              }
            ],
            "name": "MIG_NAME",
            "targetSize": 0,
            "distributionPolicy": {
              "targetShape": "ANY_SINGLE_ZONE",
              "zones": [
                {
                "zone": "projects/PROJECT_ID/zones/ZONE"
                }
              ]
            },
            "updatePolicy": {
              "instanceRedistributionType": "NONE"
            },
            "instanceLifecyclePolicy": {
              "defaultActionOnFailure": "DO_NOTHING"
            }
          }
         

Create the resize request

To create the resize request in the MIG, select one of the following options:

gcloud

Create a resize request as follows:

REST

Create a resize request in a zonal or regional MIG as follows:

  • To create a resize request in a zonal MIG, make a POST request to the instanceGroupManagerResizeRequests.insert method as follows:
          POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instanceGroupManagers/MIG_NAME/resizeRequests
          {
            "name": "RESIZE_REQUEST_NAME",
            POPULATION_METHOD
          }
          
  • To create a resize request in a regional MIG, make a POST request to the beta.regionInstanceGroupManagerResizeRequests.insert method as follows:
          POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/instanceGroupManagers/MIG_NAME/resizeRequests
          {
            "name": "RESIZE_REQUEST_NAME",
            POPULATION_METHOD
          }
          

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete your project

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

Delete the resources

  1. Delete resize requests in a MIG.

  2. Delete the MIG and instances.

  3. If the auto-delete state for the disks was set to False in the instance template, then the disks are not automatically deleted when the VM instance is deleted. You can delete the disks using one of the following methods:

    Console

    1. In the Google Cloud console, go to the Disks page.

    Go to Disks

    1. Select the rows that contain the disks that you created in this tutorial. Make sure the In use by column is empty for each disk.

    2. Click Delete, and then click Delete to confirm.

    gcloud

    Use the gcloud compute disks deletecommand.

    gcloud compute disks delete DISK_NAME \
        --project PROJECT_ID --zone ZONE
    

    Replace the following:

    • DISK_NAME : the name of the disk to delete
    • PROJECT_ID: the ID of the project that contains the disk
    • ZONE: the zone of the disk

    REST

    Use the disks.delete method to delete the disks.

    DELETE https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/disks/DISK_NAME 
    

    Replace the following:

    • PROJECT_ID: the ID of the project that contains the disk
    • ZONE: the zone of the disk
    • DISK_NAME : the name of the disk to delete
  4. Delete the networks.

  5. Delete the instance template.

What's next