Ensure resource availability using VM reservations

This document explains how to create jobs that run on reserved resources and how to block jobs from consuming reservations.

Reservations are a feature of Compute Engine. A reservation provides a very high level of assurance in obtaining capacity for one or more VMs with the specified hardware configuration. A reservation for a VM incurs the costs of that VM from when you create and until you delete the reservation. But, while you are consuming that VM, the total cost is equivalent to a VM without a reservation.

Generally, reservations are useful when capacity availability is critically important or to prevent errors for obtaining resources. For Batch specifically, consider using dedicated reservations to help minimize job scheduling time, or attempt to use existing reservations while they're not being used. If you have underused reservations—such as reservations required for committed-use discounts—you can configure jobs to attempt to consume them while they're not being used to try to help optimize your incurred costs. Alternatively, if you want to prioritize resource availability for other workloads in your project, you can explicitly block a job from consuming reservations.

To learn more about reservations, see the Compute Engine documentation for reservations.

Before you begin

If you haven't used Batch before, review Get started with Batch and enable Batch by completing the prerequisites for projects and users.
Make sure you have the permissions to create a reservation or view an existing reservation that you want a job's VMs to consume as needed.
To get the permissions that you need to create a job, ask your administrator to grant you the following IAM roles:
- Batch Job Editor (roles/batch.jobsEditor) on the project
- Service Account User (roles/iam.serviceAccountUser) on the job's service account, which by default is the default Compute Engine service account
For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Restrictions

In addition to the general restrictions for reservations, Batch also has the following restrictions:

A job's VMs can't consume shared reservations.
A job's VMs can't consume reservations if either specify a compact placement policy.
If you create a job using the Google Cloud console, its VMs automatically consume matching reservations. To consume a specific reservation or block VMs from consuming reservations, you must define the reservation field when you create a job by using the gcloud CLI or Batch API instead.

Requirements

This section summarizes the requirements for a job's VMs to consume a reservation. For more information about all the requirements, see the general requirements for reservations in the Compute Engine documentation and the procedure for planning your configuration later in this document.

For a job's VMs to be generally capable of consuming a reservation, all the following conditions must be met:
- The job and reservation must specify VM properties that exactly match.
- You must comply with all the restrictions in this document and all the other general requirements for reservations.
For each of a job's VMs to successfully consume a reservation, the reservation must have unused capacity available during the VM's run time.

A reservation's unused capacity is the difference between its VM count and the number of VMs currently consuming it. VMs attempt to consume reservations whenever you have unused reservation capacity. So, a VM can start consuming a reservation when the VM is created or later in its run time. A VM doesn't stop consuming a reservation until the VM stops running or the reservation is deleted.

Depending on your total unused reservation capacity, none, some, or all of a job's VMs might consume reservations, and the amount of reserved VMs might vary throughout the job's run time.

Create and run a job that can consume reserved VMs

Plan your configuration. To make sure your job and reservation are compatible, complete the following steps.

If you want to consume a reservation that already exists, you must create a job with a corresponding configuration. Otherwise, if you plan to create a new reservation, select the configuration options that you prefer.
1. Determine the reservation properties. Due to the restrictions, the share type must be single-project, which is the default option for a reservation. Determine the values that you want to use for the following reservation properties:
  - Consumption type^*
  - VM count^†
  ^*The reservation's consumption type (specifically targeted or automatically consumed) determines which VMs can consume the reservation.
  
  ^†The VM count represents total capacity of a reservation. When deciding this value, consider the job's number of VMs.
2. Determine the VM properties for the job and reservation. Due to the restrictions, neither the job nor the reservation can specify a compact placement policy, which is the default option for both reservations and jobs. Determine the values that you want to use for the following VM properties, which must match exactly for the reservation and the job:
  - Project
  - Zone^*
  - Machine type^†
  - Minimum CPU platform^† (if any^‡)
  - GPU type and count^† (if any^‡)
  - Local SSD type and count^† (if any^‡)
  - Reservation affinity^†^#
  ^*The job VM's must be located in the same zone as the reserved VMs. You must either include this zone in the job's allowedLocations[] field or, if omitting the allowedLocations[] field, set the job's location to the region that contains this zone.
  
  ^†The job must define all these properties using either the policy subfields or a VM instance template. A job can't specify a combination of policy subfields and a template.
  
  ^‡ An optional field can't be defined for one resource and omitted from the other. Define or omit the optional field for both the reservation and the job. If the job specifies a VM instance template, this also applies to the fields of the specified template.
  
  ^#The reservation's consumption type determines the reservation affinity required for the job's VMs, which you must specify in the job as follows:
  - If the job is using a VM instance template, the template needs to configure the reservation affinity as explained in the reservations documentation.
  - If the job isn't using a template and the reservation is specifically targeted, specify the name of the reservation in the job's reservation field.
  - Otherwise, if the job isn't using a template and the reservation is automatically consumed, omit the job's reservation field.
Prepare the reservation. If you haven't already, create the reservation that you want the job's VMs to consume. Make sure the reservation has the properties you planned.
Create and run the job. You can create and run a job that consumes VMs from the prepared reservation by using the gcloud CLI or Batch API:

Important: Due to a known issue, jobs that consume reserved VMs might be incorrectly delayed or prevented from running. To workaround this issue, add a label with the name goog-batch-skip-quota-check and value true to the job-level labels field. For more information, see Known issues.
gcloud
1. Create a JSON file that specifies the job's configuration details that sets the VM instance resource (instances[]) subfields to exactly match the VM properties of a reservation.
  
  For example, to create a basic script job that consumes VMs from a reservation, create a JSON file with the following contents:
  { "taskGroups": [ { "taskSpec": { "runnables": [ { "script": { "text": "echo Hello world from task ${BATCH_TASK_INDEX}" } } ] }, "taskCount": 3 } ], "allocationPolicy": { "instances": [ { VM_RESOURCES } ], }, "logsPolicy": { "destination": "CLOUD_LOGGING" } }
  Replace VM_RESOURCES with the VM resources that match the reservation that you want to the job to consume by specifying the instances[] subfields that you planned in the previous steps.
  
  For example, start from the following value for VM_RESOURCES:
  "installGpuDrivers": INSTALL_GPU_DRIVERS, "policy": { "machineType": "MACHINE_TYPE", "minCpuPlatform": "MIN_CPU_PLATFORM", "accelerators": [ { "type": "GPU_TYPE", "count": GPU_COUNT } ], "disks": [ { "newDisk": { "sizeGb": LOCAL_SSD_SIZE, "type": "local-ssd" }, "deviceName": "LOCAL_SSD_NAME" } ], "reservation": "SPECIFIC_RESERVATION_NAME" }
  To use this value, make all the following changes:
  1. Do you want to use an instance template?
    
    Yes: Replace the policy field with the instanceTemplate field and specify an existing VM instance template that matches the reservation. For example, see the code sample for using a VM instance template. If the reservation uses GPUs or local SSDs, you also need to configure the job's installGpuDrivers field and volumes[] field respectively. Otherwise, skip the remaining changes.
    
    No: Replace MACHINE_TYPE with the same machine type as the reservation.
  2. Does the reservation include a minimum CPU platform?
    
    Yes: Replace MIN_CPU_PLATFORM with the same minimum CPU platform.
    
    No: Remove the minCpuPlatform field.
  3. Does the reservation include GPUs?
    
    Yes: Replace INSTALL_GPU_DRIVERS, GPU_TYPE, and GPU_COUNT to match the reservation. For example, see the code sample for using GPUs.
    
    No: Remove the installGpuDrivers field and the accelerators[] field.
  4. Does the reservation include local SSDs?
    
    Yes: Replace LOCAL_SSD_SIZE and LOCAL_SSD_NAME to match the reservation, and mount the local SSDs by adding the volumes[] field to the job. For example, see the code sample for using local SSDs.
    
    No: Remove the disks[] field.
  5. Does the reservation use the specifically targeted consumption type?
    
    Yes: Replace SPECIFIC_RESERVATION_NAME with the name of the reservation.
    
    No: Remove the reservation field.
  For example, suppose you're using an automatically consumed reservation for n2-standard-32 VMs that doesn't specify any minimum CPU platform, GPUs, or local SSDs. Additionally, you don't want to specify a VM instance template. In that case, you must replace VM_RESOURCES with the following value:
  "policy": { "machineType": "n2-standard-32" }
2. To create and run the job, use the gcloud batch jobs submit command:
  gcloud batch jobs submit JOB_NAME \ --location LOCATION \ --config JSON_CONFIGURATION_FILE
  Replace the following:
  - JOB_NAME: the name of the job.
  - LOCATION: the location of the job. Unless the job specifies the allowedLocations[] field, this must be the region that contains the reservation's zone.
  - JSON_CONFIGURATION_FILE: the path for a JSON file with the job's configuration details.
API
Make a POST request to the jobs.create method that sets the VM instance resource (instances[]) subfields to exactly match the VM properties of a reservation.

For example, to create a basic script job that consumes VMs from a reservation, make the following request:
```
POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME
{
  "taskGroups": [
    {
      "taskSpec": {
        "runnables": [
          {
            "script": {
              "text": "echo Hello world from task ${BATCH_TASK_INDEX}"
            }
          }
        ]
      },
      "taskCount": 3
    }
  ],
  "allocationPolicy": {
    "instances": [
      {
        VM_RESOURCES
      }
    ],
  },
  "logsPolicy": {
    "destination": "CLOUD_LOGGING"
  }
}
```
Replace the following:
- PROJECT_ID: the project ID of your project.
- LOCATION: the location of the job. Unless the job specifies the allowedLocations[] field, this must be the region that contains the reservation's zone.
- JOB_NAME: the name of the job.
- VM_RESOURCES: the VM resources that match the reservation that you want to the job to consume by specifying the instances[] subfields that you planned in the previous steps.
  
  For example, start from the following value for VM_RESOURCES:
  "installGpuDrivers": INSTALL_GPU_DRIVERS, "policy": { "machineType": "MACHINE_TYPE", "minCpuPlatform": "MIN_CPU_PLATFORM", "accelerators": [ { "type": "GPU_TYPE", "count": GPU_COUNT } ], "disks": [ { "newDisk": { "sizeGb": LOCAL_SSD_SIZE, "type": "local-ssd" }, "deviceName": "LOCAL_SSD_NAME" } ], "reservation": "SPECIFIC_RESERVATION_NAME" }
  To use this value, make all the following changes:
  1. Do you want to use an instance template?
    
    Yes: Replace the policy field with the instanceTemplate field and specify an existing VM instance template that matches the reservation. For example, see the code sample for using a VM instance template. If the reservation uses GPUs or local SSDs, you also need to configure the job's installGpuDrivers field and volumes[] field respectively. Otherwise, skip the remaining changes.
    
    No: Replace MACHINE_TYPE with the same machine type as the reservation.
  2. Does the reservation include a minimum CPU platform?
    
    Yes: Replace MIN_CPU_PLATFORM with the same minimum CPU platform.
    
    No: Remove the minCpuPlatform field.
  3. Does the reservation include GPUs?
    
    Yes: Replace INSTALL_GPU_DRIVERS, GPU_TYPE, and GPU_COUNT to match the reservation. For example, see the code sample for using GPUs.
    
    No: Remove the installGpuDrivers field and the accelerators[] field.
  4. Does the reservation include local SSDs?
    
    Yes: Replace LOCAL_SSD_SIZE and LOCAL_SSD_NAME to match the reservation, and mount the local SSDs by adding the volumes[] field to the job. For example, see the code sample for using local SSDs.
    
    No: Remove the disks[] field.
  5. Does the reservation use the specifically targeted consumption type?
    
    Yes: Replace SPECIFIC_RESERVATION_NAME with the name of the reservation.
    
    No: Remove the reservation field.
  For example, suppose you're using an automatically consumed reservation for n2-standard-32 VMs that doesn't specify any minimum CPU platform, GPUs, or local SSDs. Additionally, you don't want to specify a VM instance template. In that case, you must replace VM_RESOURCES with the following value:
  "policy": { "machineType": "n2-standard-32" }

Create and run a job that can't consume reserved VMs

To block a job from consuming any reservations, set the reservation field to NO_RESERVATION. For more information about preventing reservation consumption, see Prevent compute instances from consuming reservations in the Compute Engine documentation.

You can create and run a job that can't consume any reserved VMs by using the gcloud CLI or Batch API.

gcloud

Create a JSON file that specifies the job's configuration details and sets the reservation field to NO_RESERVATION.

For example, to create a basic script job that can't consume reservations, create a JSON file with the following contents:

{
  "taskGroups": [
    {
      "taskSpec": {
        "runnables": [
          {
            "script": {
              "text": "echo Hello world from task ${BATCH_TASK_INDEX}"
            }
          }
        ]
      },
      "taskCount": 3
    }
  ],
  "allocationPolicy": {
    "instances": [
      {
        "policy": {
          "reservation": "NO_RESERVATION"
        }
      }
    ],
  },
  "logsPolicy": {
    "destination": "CLOUD_LOGGING"
  }
}

To create and run the job, use the gcloud batch jobs submit command:
```
gcloud batch jobs submit JOB_NAME \
  --location LOCATION \
  --config JSON_CONFIGURATION_FILE
```
Replace the following:
- JOB_NAME: the name of the job.
- LOCATION: the location of the job.
- JSON_CONFIGURATION_FILE: the path for a JSON file with the job's configuration details.

API

Make a POST request to the jobs.create method that sets the reservation field to NO_RESERVATION.

For example, to create a basic script job that can't consume reservations, make the following request:

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME

{
  "taskGroups": [
    {
      "taskSpec": {
        "runnables": [
          {
            "script": {
              "text": "echo Hello world from task ${BATCH_TASK_INDEX}"
            }
          }
        ]
      },
      "taskCount": 3
    }
  ],
  "allocationPolicy": {
    "instances": [
      {
        "policy": {
          "reservation": "NO_RESERVATION"
        }
      }
    ],
  },
  "logsPolicy": {
    "destination": "CLOUD_LOGGING"
  }
}

Replace the following:

PROJECT_ID: the project ID of your project.
LOCATION: the location of the job.
JOB_NAME: the name of the job.

What's next

If you have issues creating or running a job, see Troubleshooting.
View jobs and tasks.
Learn about more job creation options.