Create and run a job that uses GPUs

This document explains how to create and run a job that uses a graphics processing unit (GPU). To learn more about the features and restrictions for GPUs, see About GPUs in the Compute Engine documentation.

When you create a Batch job, you can optionally use GPUs to accelerate specific workloads. Common use cases for jobs that use GPUs include intensive data processing and artificial intelligence workloads (AI) such as machine learning (ML).

Before you begin

If you haven't used Batch before, review Get started with Batch and enable Batch by completing the prerequisites for projects and users.
To get the permissions that you need to create a job, ask your administrator to grant you the following IAM roles:
- Batch Job Editor (roles/batch.jobsEditor) on the project
- Service Account User (roles/iam.serviceAccountUser) on the job's service account, which by default is the default Compute Engine service account
For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Create a job that uses GPUs

To create a job that uses GPUs, do the following:

Plan the requirements for a job that uses GPUs.
Create a job with the requirements and methods that you identified. For examples of how to create a job using the recommended options, see Create an example job that uses GPUs in this document.

Plan the requirements for a job that uses GPUs

Before creating a job that uses GPUs, plan the job's requirements as explained in the following sections:

Select the GPU machine type and provisioning method
Install the GPU drivers
Define compatible VM resources

Step 1: Select the GPU machine type and provisioning method

A job's requirements vary based on your preferred GPU machine type and provisioning method, and the options for each might be interdependent. Based on your requirements and priorities, you can either select the GPU machine type first or select the provisioning method first. Generally, the GPU machine type primarily affects performance and base pricing, and the provisioning method primarily affects resource availability and additional costs or discounts.

Select the GPU machine type

The available GPU machine types (the valid combinations of GPU type, number of GPUs, and machine type (vCPUs and memory)) and their use cases are listed on the GPU machine types page in the Compute Engine documentation.

The fields required for a job to specify a GPU machine type vary based on the categories in the following table:

GPU machine types and their job requirements
GPUs for accelerator-optimized VMs: VMs with a machine type from the accelerator-optimized machine family have a specific type and number of these GPUs automatically attached.	To use GPUs for accelerator-optimized VMs, we recommend that you specify the machine type. Each accelerator-optimized machine type supports only a specific type and number of GPUs, so it's functionally equivalent whether you do or don't specify those values in addition to the accelerator-optimized machine type. Specifically, Batch also supports specifying only the type and number of GPUs for accelerator-optimized VMs, but the resulting vCPU and memory options are often very limited. As a result, we recommend that you verify that the available vCPU and memory options are compatible with the job's task requirements.
GPUs for N1 VMs: These GPUs require you to specify the type and amount to attach to each VM and must be attached to VMs with a machine type from the N1 machine series.	To use GPUs for N1 VMs, we recommend that you specify at least the type of GPUs and number of GPUs. Make sure that the combination of values matches one of the valid GPU options for the N1 machine types. The vCPU and memory options for N1 VMs that use any specific type and number GPUs is quite flexible. Unless you create the job using the Google Cloud console, you can let Batch automatically select a machine type that meets the job's task requirements. Note: Batch doesn't use GPUs for a job that specifies an N1 machine type but specifies neither a GPU type nor number of GPUs.

GPU machine types and their job requirements

GPUs for accelerator-optimized VMs: VMs with a machine type from the accelerator-optimized machine family have a specific type and number of these GPUs automatically attached.

To use GPUs for accelerator-optimized VMs, we recommend that you specify the machine type. Each accelerator-optimized machine type supports only a specific type and number of GPUs, so it's functionally equivalent whether you do or don't specify those values in addition to the accelerator-optimized machine type.

Specifically, Batch also supports specifying only the type and number of GPUs for accelerator-optimized VMs, but the resulting vCPU and memory options are often very limited. As a result, we recommend that you verify that the available vCPU and memory options are compatible with the job's task requirements.

GPUs for N1 VMs: These GPUs require you to specify the type and amount to attach to each VM and must be attached to VMs with a machine type from the N1 machine series.

To use GPUs for N1 VMs, we recommend that you specify at least the type of GPUs and number of GPUs. Make sure that the combination of values matches one of the valid GPU options for the N1 machine types. The vCPU and memory options for N1 VMs that use any specific type and number GPUs is quite flexible. Unless you create the job using the Google Cloud console, you can let Batch automatically select a machine type that meets the job's task requirements.

Select the provisioning method

Batch uses different methods to provision the VM resources for jobs that use GPUs based on the type of resources that your job requests. The available provisioning methods and their requirements are explained in the following table, which lists them based on their use cases: from highest to lowest resource availability.

In summary, we recommend that most users do the following:

When you intend to use A3 GPU machine types without a reservation, use Dynamic Workload Scheduler for Batch (Preview).

Note: If you want to use Dynamic Workload Scheduler for Batch with other GPU machine types, contact Google Cloud sales or your account team.
For all other GPU machine types, use the default provisioning method. The default provisioning method is usually on-demand; an exception is if your project has unused reservations that the job can automatically consume.

Provisioning methods and their job requirements

Provisioning methods and their job requirements
Reservations Use case: We recommend reservations for jobs if you want a very high level of assurance of resource availability or if you already have existing reservations that might be unused. Details: A reservation incurs the costs of the specified VM(s) at the same price as running the VM(s) until you delete the reservation. VMs that are consuming a reservation don't incur separate costs, but reservations incur costs regardless of consumption.	Batch uses reservations for jobs that can consume unused reservations. For more information about reservations and their requirements, see the Ensure resource availability using VM reservations page.
Dynamic Workload Scheduler for Batch (Preview) Use case: We recommend Dynamic Workload Scheduler if you want to use GPUs for VMs with a machine type from the A3 machine series without consuming a reservation. Details: Dynamic Workload Scheduler can make it easier for you to simultaneously access many resources that accelerate AI and ML workloads. For example, Dynamic Workload Scheduler can be helpful for job scheduling by mitigating delays or issues that are caused by resource unavailability. Important: Unlike other jobs, Batch jobs that use GPUs through Dynamic Workload Scheduler use resize requests for Compute Engine managed instance groups (MIGs), which have slightly different behaviors. Specifically, jobs that use GPUs through Dynamic Workload Scheduler might require preemptible allocation quota, which is a recommended option to ease quota friction with Dynamic Workload Scheduler GPUs. For more information, see GPU VMs and preemptible allocation quotas.	Batch uses Dynamic Workload Scheduler for jobs that do all of the following: Specify an A3 GPU machine type. Block reservations. Specifically, the job must set the `reservation` field to `NO_RESERVATION`. For more information, see Create and run a job that can't consume reserved VMs. Don't use Spot VMs. Specifically, the job can either omit the `provisioningModel` field or set the `provisioningModel` field to `STANDARD`. Tip: Although you can run the job in any of the locations that offer A3 VMs, we recommend using the location `us-central1` because it has dedicated capacity for Dynamic Workload Scheduler.
On-demand Use case: We recommend on-demand for all other jobs. Details: On-demand is usually the default way to access Compute Engine VMs. On-demand lets you request and (if available) immediately access resources one VM at a time.	Batch uses on-demand for all other jobs.
Spot VMs Use case: We recommend trying to use Spot VMs to reduce costs for fault-tolerant workloads. Caution: Spot VMs might not always be available. You might be able to increase resource availability by following the best practices for Spot VMs. However, if issues persist, you might need to use a different provisioning method instead. Details: Spot VMs provide significant discounts, but might not always be available and can be preempted at any time. For more information, see Spot VMs in the Compute Engine documentation.	Batch uses Spot VMs for jobs that set the `provisioningModel` field to `SPOT`.

Reservations

Use case: We recommend reservations for jobs if you want a very high level of assurance of resource availability or if you already have existing reservations that might be unused.
Details: A reservation incurs the costs of the specified VM(s) at the same price as running the VM(s) until you delete the reservation. VMs that are consuming a reservation don't incur separate costs, but reservations incur costs regardless of consumption.

Batch uses reservations for jobs that can consume unused reservations. For more information about reservations and their requirements, see the Ensure resource availability using VM reservations page.

Dynamic Workload Scheduler for Batch (Preview)

Use case: We recommend Dynamic Workload Scheduler if you want to use GPUs for VMs with a machine type from the A3 machine series without consuming a reservation.
Details: Dynamic Workload Scheduler can make it easier for you to simultaneously access many resources that accelerate AI and ML workloads. For example, Dynamic Workload Scheduler can be helpful for job scheduling by mitigating delays or issues that are caused by resource unavailability.

Important: Unlike other jobs, Batch jobs that use GPUs through Dynamic Workload Scheduler use resize requests for Compute Engine managed instance groups (MIGs), which have slightly different behaviors. Specifically, jobs that use GPUs through Dynamic Workload Scheduler might require preemptible allocation quota, which is a recommended option to ease quota friction with Dynamic Workload Scheduler GPUs. For more information, see GPU VMs and preemptible allocation quotas.

Batch uses Dynamic Workload Scheduler for jobs that do all of the following:

Specify an A3 GPU machine type.
Block reservations. Specifically, the job must set the reservation field to NO_RESERVATION. For more information, see Create and run a job that can't consume reserved VMs.
Don't use Spot VMs. Specifically, the job can either omit the provisioningModel field or set the provisioningModel field to STANDARD.

On-demand

Use case: We recommend on-demand for all other jobs.
Details: On-demand is usually the default way to access Compute Engine VMs. On-demand lets you request and (if available) immediately access resources one VM at a time.

Batch uses on-demand for all other jobs.

Spot VMs

Use case: We recommend trying to use Spot VMs to reduce costs for fault-tolerant workloads.

Caution: Spot VMs might not always be available. You might be able to increase resource availability by following the best practices for Spot VMs. However, if issues persist, you might need to use a different provisioning method instead.
Details: Spot VMs provide significant discounts, but might not always be available and can be preempted at any time. For more information, see Spot VMs in the Compute Engine documentation.

Batch uses Spot VMs for jobs that set the provisioningModel field to SPOT.

Step 2: Install the GPU drivers

To use GPUs for a job, you must install the GPU drivers. To install GPU drivers, select one of the following methods:

Install GPU drivers automatically (recommended if possible): As shown in the examples, to let Batch fetch the required GPU drivers from a third-party location and install them on your behalf, set the installGpuDrivers field for the job to true. This method is recommended if your job does not require you to install drivers manually.

Optionally, if you need to specify which version of the GPU driver that Batch installs, also set the driverVersion field.
Install GPU drivers manually: This method is required if any of the following are true:

Important: Due to a known issue, you might also need to install drivers manually for jobs that specify some Compute Engine images. For more information, see Jobs with GPUs and VM OS images with outdated kernels fail only when automatically installing drivers.
- A job uses both script and container runnables and does not have internet access. For more information about the access a job has, see Batch networking overview.
- A job uses a custom VM image. To learn more about VM OS images and which VM OS images you can use, see VM OS environment overview.
To manually install the required GPU drivers, the following method is recommended:
1. Create a custom VM image that includes the GPU drivers.
  1. To install GPU drivers, run an installation script based on the OS that you want to use:
    - GPU drivers for Container-Optimized OS
    - GPU drivers for other OSes
  2. If your job has any container runnables and does not use Container-Optimized OS, you must also install the NVIDIA Container Toolkit
2. When you create and submit a job that uses GPUs, specify the custom VM image that includes the GPU drivers, and set the installGpuDrivers field for the job to false (default).

Step 3: Define compatible VM resources

To learn about the requirements and options for defining the VM resources for a job, see Job resources.

In summary, you must do all of the following when defining the VM resources for a job that uses GPUs:

Make sure that the GPU machine type is available in the location of your job's VMs.

To learn where GPU machine types are available, see GPU availability by regions and zones in the Compute Engine documentation.
If you specify the job's machine type, make sure that machine type has enough vCPUs and memory for the job's task requirements. Specifying the job's machine type is required whenever you create a job using the Google Cloud console and is recommended whenever you are creating a job that uses GPUs for accelerator-optimized VMs.
Make sure you define the VM resources for a job using a valid method:
- Define VM resources directly by using the instances[].policy field (recommended if possible). This method is shown in the examples.
- Define VM resources through a template by using the instances[].instanceTemplate field. This method is required to manually install GPU drivers through a custom image. For more information, see Define job resources using a VM instance template.

Create an example job that uses GPUs

The following sections explain how to create an example job for each GPU machine type using the recommended options. Specifically, the example jobs all install GPU drivers automatically, all directly define VM resources, and either specify the provisioning method or use the default provisioning method.

Use GPUs for A3 VMs through Dynamic Workload Scheduler (Preview)
Use GPUs for accelerator-optimized VMs
Use GPUs for N1 VMs

Use GPUs for A3 VMs through Dynamic Workload Scheduler for Batch (Preview)

You can create a job that uses GPUs for A3 VMs through Dynamic Workload Scheduler using gcloud CLI or Batch API.

gcloud

Create a JSON file that installs GPU drivers, specifies a machine type from the A3 machine series, blocks reservations, and runs in a location that has the GPU machine type.

For example, to create a basic script job that uses GPUs for A3 VMs through Dynamic Workload Scheduler, create a JSON file with the following contents:

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "machineType": "MACHINE_TYPE",
                    "reservation": "NO_RESERVATION"
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

Replace the following:

INSTALL_GPU_DRIVERS: When set to true, Batch fetches the drivers required for the GPU type that you specify in the policy field from a third-party location, and Batch installs them on your behalf. If you set this field to false (default), you need to install GPU drivers manually to use any GPUs for this job.
MACHINE_TYPE: a machine type from the A3 machine series.
ALLOWED_LOCATIONS: You can optionally use the allowedLocations[] field to specify a region or specific zone(s) in a region where the VMs for your job are allowed to run—for example, regions/us-central1 allows all zones in the region us-central1. Make sure that you specify locations that offer the GPU machine type that you want for this job. Otherwise, if you omit this field, make sure the job's location offers the GPU machine type.

To create and run the job, use the gcloud batch jobs submit command:
```
gcloud batch jobs submit JOB_NAME \
    --location LOCATION \
    --config JSON_CONFIGURATION_FILE
```
Replace the following:
- JOB_NAME: the name of the job.
- LOCATION: the location of the job.
- JSON_CONFIGURATION_FILE: the path for a JSON file with the job's configuration details.

API

Make a POST request to the jobs.create method that installs GPU drivers, specifies a machine type from the A3 machine series, blocks reservations, and runs in a location that has the GPU machine type.

For example, to create a basic script job that uses GPUs for A3 VMs through Dynamic Workload Scheduler, make the following request:

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "machineType": "MACHINE_TYPE",
                    "reservation": "NO_RESERVATION"
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

Replace the following:

PROJECT_ID: the project ID of your project.
LOCATION: the location of the job.
JOB_NAME: the name of the job.
INSTALL_GPU_DRIVERS: When set to true, Batch fetches the drivers required for the GPU type that you specify in the policy field from a third-party location, and Batch installs them on your behalf. If you set this field to false (default), you need to install GPU drivers manually to use any GPUs for this job.
MACHINE_TYPE: a machine type from the A3 machine series.
ALLOWED_LOCATIONS: You can optionally use the allowedLocations[] field to specify a region or specific zone(s) in a region where the VMs for your job are allowed to run—for example, regions/us-central1 allows all zones in the region us-central1. Make sure that you specify locations that offer the GPU machine type that you want for this job. Otherwise, if you omit this field, make sure the job's location offers the GPU machine type.

Use GPUs for accelerator-optimized VMs

You can create a job that uses GPUs for accelerator-optimized VMs using the Google Cloud console, gcloud CLI, Batch API, Java, Node.js, or Python.

Console

To create a job that uses GPUs by using the Google Cloud console, do the following:

In the Google Cloud console, go to the Job list page.

Go to Job list
Click Create. The Create batch job page opens. In the left pane, the Job details page is selected.
Configure the Job details page:
1. Optional: In the Job name field, customize the job name.
  
  For example, enter example-gpu-job.
2. Configure the Task details section:
  1. In the New runnable window, add at least one script or container for this job to run.
    
    For example, to create a basic script job, do the following:
    1. Select the Script checkbox. A field appears.
    2. In the field, enter the following script:
      echo Hello world from task ${BATCH_TASK_INDEX}.
    3. Click Done.
  2. In the Task count field, enter the number of tasks for this job.
    
    For example, enter 3.
  3. Optional: In the Parallelism field, enter the number of tasks to run concurrently.
    
    For example, enter 1 (default).
Configure the Resource specifications page:
1. In the left pane, click Resource specifications. The Resource specifications page opens.
2. Optional: In the VM provisioning model section, select one of the following options for the provisioning model for this job's VMs:
  - If your job can withstand preemption and you want discounted VMs, select Spot.
  - Otherwise, select Standard (default).
3. Select the location for this job.
  1. In the Region field, select a region.
  2. In the Zone field, do one of the following:
    - If you want to restrict this job to run in a specific zone only, select a zone.
    - Otherwise, select any (default).
  Important: Make sure that you specify only locations that offer the GPU machine type that you want for this job.
4. Select the GPU machine type for this job's VMs:
  1. In the machine family options, click GPUs.
  2. In the GPU type field, select the type of GPUs. Then, in the Number of GPUs field, select the number of GPUs for each VM.
    
    If you selected one of the GPU types for accelerator-optimized VMs, then the Machine type field only allows one option for the machine type based on the type and number of GPUs that you selected.
  3. To automatically install GPU drivers, select GPU driver installation (default).
5. Configure the amount of VM resources required for each task:
  
  Important: Make sure that the GPU machine type has enough VM resources for the job's task requirements.
  1. In the Cores field, enter the amount of vCPUs per task.
    
    For example, enter 1 (default).
  2. In the Memory field, enter the amount of RAM in GB per task.
    
    For example, enter 0.5 (default).
6. Click Done.
Optional: Configure the other fields for this job.
Optional: To review the job configuration, in the left pane, click Preview.
Click Create.

The Job details page displays the job that you created.

gcloud

Create a JSON file that installs GPU drivers, specifies a machine type from the accelerator-optimized machine family, and runs in a location that has the GPU machine type.

For example, to create a basic script job that uses GPUs for accelerator-optimized VMs, create a JSON file with the following contents:

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "machineType": "MACHINE_TYPE"
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

Replace the following:

INSTALL_GPU_DRIVERS: When set to true, Batch fetches the drivers required for the GPU type that you specify in the policy field from a third-party location, and Batch installs them on your behalf. If you set this field to false (default), you need to install GPU drivers manually to use any GPUs for this job.
MACHINE_TYPE: a machine type from the accelerator-optimized machine family.
ALLOWED_LOCATIONS: You can optionally use the allowedLocations[] field to specify a region or specific zone(s) in a region where the VMs for your job are allowed to run—for example, regions/us-central1 allows all zones in the region us-central1. Make sure that you specify locations that offer the GPU machine type that you want for this job. Otherwise, if you omit this field, make sure the job's location offers the GPU machine type.

To create and run the job, use the gcloud batch jobs submit command:
```
gcloud batch jobs submit JOB_NAME \
    --location LOCATION \
    --config JSON_CONFIGURATION_FILE
```
Replace the following:
- JOB_NAME: the name of the job.
- LOCATION: the location of the job.
- JSON_CONFIGURATION_FILE: the path for a JSON file with the job's configuration details.

API

Make a POST request to the jobs.create method that installs GPU drivers, specifies a machine type from the accelerator-optimized machine family, and runs in a location that has the GPU machine type.

For example, to create a basic script job that uses GPUs for accelerator-optimized VMs, make the following request:

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "machineType": "MACHINE_TYPE"
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

Replace the following:

PROJECT_ID: the project ID of your project.
LOCATION: the location of the job.
JOB_NAME: the name of the job.
INSTALL_GPU_DRIVERS: When set to true, Batch fetches the drivers required for the GPU type that you specify in the policy field from a third-party location, and Batch installs them on your behalf. If you set this field to false (default), you need to install GPU drivers manually to use any GPUs for this job.
MACHINE_TYPE: a machine type from the accelerator-optimized machine family.
ALLOWED_LOCATIONS: You can optionally use the allowedLocations[] field to specify a region or specific zone(s) in a region where the VMs for your job are allowed to run—for example, regions/us-central1 allows all zones in the region us-central1. Make sure that you specify locations that offer the GPU machine type that you want for this job. Otherwise, if you omit this field, make sure the job's location offers the GPU machine type.

Java


import com.google.cloud.batch.v1.AllocationPolicy;
import com.google.cloud.batch.v1.AllocationPolicy.Accelerator;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicy;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicyOrTemplate;
import com.google.cloud.batch.v1.BatchServiceClient;
import com.google.cloud.batch.v1.CreateJobRequest;
import com.google.cloud.batch.v1.Job;
import com.google.cloud.batch.v1.LogsPolicy;
import com.google.cloud.batch.v1.Runnable;
import com.google.cloud.batch.v1.Runnable.Script;
import com.google.cloud.batch.v1.TaskGroup;
import com.google.cloud.batch.v1.TaskSpec;
import com.google.protobuf.Duration;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateGpuJob {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to use.
    String projectId = "YOUR_PROJECT_ID";
    // Name of the region you want to use to run the job. Regions that are
    // available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
    String region = "europe-central2";
    // The name of the job that will be created.
    // It needs to be unique for each project and region pair.
    String jobName = "JOB_NAME";
    // Optional. When set to true, Batch fetches the drivers required for the GPU type
    // that you specify in the policy field from a third-party location,
    // and Batch installs them on your behalf. If you set this field to false (default),
    // you need to install GPU drivers manually to use any GPUs for this job.
    boolean installGpuDrivers = false;
    // Accelerator-optimized machine types are available to Batch jobs. See the list
    // of available types on: https://cloud.google.com/compute/docs/accelerator-optimized-machines
    String machineType = "g2-standard-4";

    createGpuJob(projectId, region, jobName, installGpuDrivers, machineType);
  }

  // Create a job that uses GPUs
  public static Job createGpuJob(String projectId, String region, String jobName,
                                  boolean installGpuDrivers, String machineType)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (BatchServiceClient batchServiceClient = BatchServiceClient.create()) {
      // Define what will be done as part of the job.
      Runnable runnable =
          Runnable.newBuilder()
              .setScript(
                  Script.newBuilder()
                      .setText(
                          "echo Hello world! This is task ${BATCH_TASK_INDEX}. "
                                  + "This job has a total of ${BATCH_TASK_COUNT} tasks.")
                      // You can also run a script from a file. Just remember, that needs to be a
                      // script that's already on the VM that will be running the job.
                      // Using setText() and setPath() is mutually exclusive.
                      // .setPath("/tmp/test.sh")
                      .build())
              .build();

      TaskSpec task = TaskSpec.newBuilder()
                  // Jobs can be divided into tasks. In this case, we have only one task.
                  .addRunnables(runnable)
                  .setMaxRetryCount(2)
                  .setMaxRunDuration(Duration.newBuilder().setSeconds(3600).build())
                  .build();

      // Tasks are grouped inside a job using TaskGroups.
      // Currently, it's possible to have only one task group.
      TaskGroup taskGroup = TaskGroup.newBuilder()
          .setTaskCount(3)
          .setParallelism(1)
          .setTaskSpec(task)
          .build();

      // Policies are used to define on what kind of virtual machines the tasks will run.
      // Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
      InstancePolicy instancePolicy =
          InstancePolicy.newBuilder().setMachineType(machineType).build();  

      // Policies are used to define on what kind of virtual machines the tasks will run on.
      AllocationPolicy allocationPolicy =
          AllocationPolicy.newBuilder()
              .addInstances(
                  InstancePolicyOrTemplate.newBuilder()
                      .setInstallGpuDrivers(installGpuDrivers)
                      .setPolicy(instancePolicy)
                      .build())
              .build();

      Job job =
          Job.newBuilder()
              .addTaskGroups(taskGroup)
              .setAllocationPolicy(allocationPolicy)
              .putLabels("env", "testing")
              .putLabels("type", "script")
              // We use Cloud Logging as it's an out of the box available option.
              .setLogsPolicy(
                  LogsPolicy.newBuilder().setDestination(LogsPolicy.Destination.CLOUD_LOGGING))
              .build();

      CreateJobRequest createJobRequest =
          CreateJobRequest.newBuilder()
              // The job's parent is the region in which the job will run.
              .setParent(String.format("projects/%s/locations/%s", projectId, region))
              .setJob(job)
              .setJobId(jobName)
              .build();

      Job result =
          batchServiceClient
              .createJobCallable()
              .futureCall(createJobRequest)
              .get(5, TimeUnit.MINUTES);

      System.out.printf("Successfully created the job: %s", result.getName());

      return result;
    }
  }
}

Node.js

// Imports the Batch library
const batchLib = require('@google-cloud/batch');
const batch = batchLib.protos.google.cloud.batch.v1;

// Instantiates a client
const batchClient = new batchLib.v1.BatchServiceClient();

/**
 * TODO(developer): Update these variables before running the sample.
 */
// Project ID or project number of the Google Cloud project you want to use.
const projectId = await batchClient.getProjectId();
// Name of the region you want to use to run the job. Regions that are
// available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
const region = 'europe-central2';
// The name of the job that will be created.
// It needs to be unique for each project and region pair.
const jobName = 'batch-gpu-job';
// The GPU type. You can view a list of the available GPU types
// by using the `gcloud compute accelerator-types list` command.
const gpuType = 'nvidia-l4';
// The number of GPUs of the specified type.
const gpuCount = 1;
// Optional. When set to true, Batch fetches the drivers required for the GPU type
// that you specify in the policy field from a third-party location,
// and Batch installs them on your behalf. If you set this field to false (default),
// you need to install GPU drivers manually to use any GPUs for this job.
const installGpuDrivers = false;
// Accelerator-optimized machine types are available to Batch jobs. See the list
// of available types on: https://cloud.google.com/compute/docs/accelerator-optimized-machines
const machineType = 'g2-standard-4';

// Define what will be done as part of the job.
const runnable = new batch.Runnable({
  script: new batch.Runnable.Script({
    commands: ['-c', 'echo Hello world! This is task ${BATCH_TASK_INDEX}.'],
  }),
});

const task = new batch.TaskSpec({
  runnables: [runnable],
  maxRetryCount: 2,
  maxRunDuration: {seconds: 3600},
});

// Tasks are grouped inside a job using TaskGroups.
const group = new batch.TaskGroup({
  taskCount: 3,
  taskSpec: task,
});

// Policies are used to define on what kind of virtual machines the tasks will run on.
// In this case, we tell the system to use "g2-standard-4" machine type.
// Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
const instancePolicy = new batch.AllocationPolicy.InstancePolicy({
  machineType,
  // Accelerator describes Compute Engine accelerators to be attached to the VM
  accelerators: [
    new batch.AllocationPolicy.Accelerator({
      type: gpuType,
      count: gpuCount,
      installGpuDrivers,
    }),
  ],
});

const allocationPolicy = new batch.AllocationPolicy.InstancePolicyOrTemplate({
  instances: [{installGpuDrivers, policy: instancePolicy}],
});

const job = new batch.Job({
  name: jobName,
  taskGroups: [group],
  labels: {env: 'testing', type: 'script'},
  allocationPolicy,
  // We use Cloud Logging as it's an option available out of the box
  logsPolicy: new batch.LogsPolicy({
    destination: batch.LogsPolicy.Destination.CLOUD_LOGGING,
  }),
});
// The job's parent is the project and region in which the job will run
const parent = `projects/${projectId}/locations/${region}`;

async function callCreateBatchGPUJob() {
  // Construct request
  const request = {
    parent,
    jobId: jobName,
    job,
  };

  // Run request
  const [response] = await batchClient.createJob(request);
  console.log(JSON.stringify(response));
}

await callCreateBatchGPUJob();

Python

from google.cloud import batch_v1


def create_gpu_job(project_id: str, region: str, job_name: str) -> batch_v1.Job:
    """
    This method shows how to create a sample Batch Job that will run
    a simple command on Cloud Compute instances on GPU machines.

    Args:
        project_id: project ID or project number of the Cloud project you want to use.
        region: name of the region you want to use to run the job. Regions that are
            available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
        job_name: the name of the job that will be created.
            It needs to be unique for each project and region pair.

    Returns:
        A job object representing the job created.
    """
    client = batch_v1.BatchServiceClient()

    # Define what will be done as part of the job.
    task = batch_v1.TaskSpec()
    runnable = batch_v1.Runnable()
    runnable.script = batch_v1.Runnable.Script()
    runnable.script.text = "echo Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks."
    # You can also run a script from a file. Just remember, that needs to be a script that's
    # already on the VM that will be running the job. Using runnable.script.text and runnable.script.path is mutually
    # exclusive.
    # runnable.script.path = '/tmp/test.sh'
    task.runnables = [runnable]

    # We can specify what resources are requested by each task.
    resources = batch_v1.ComputeResource()
    resources.cpu_milli = 2000  # in milliseconds per cpu-second. This means the task requires 2 whole CPUs.
    resources.memory_mib = 16  # in MiB
    task.compute_resource = resources

    task.max_retry_count = 2
    task.max_run_duration = "3600s"

    # Tasks are grouped inside a job using TaskGroups.
    # Currently, it's possible to have only one task group.
    group = batch_v1.TaskGroup()
    group.task_count = 4
    group.task_spec = task

    # Policies are used to define on what kind of virtual machines the tasks will run on.
    # In this case, we tell the system to use "g2-standard-4" machine type.
    # Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
    policy = batch_v1.AllocationPolicy.InstancePolicy()
    policy.machine_type = "g2-standard-4"

    instances = batch_v1.AllocationPolicy.InstancePolicyOrTemplate()
    instances.policy = policy
    instances.install_gpu_drivers = True
    allocation_policy = batch_v1.AllocationPolicy()
    allocation_policy.instances = [instances]

    job = batch_v1.Job()
    job.task_groups = [group]
    job.allocation_policy = allocation_policy
    job.labels = {"env": "testing", "type": "container"}
    # We use Cloud Logging as it's an out of the box available option
    job.logs_policy = batch_v1.LogsPolicy()
    job.logs_policy.destination = batch_v1.LogsPolicy.Destination.CLOUD_LOGGING

    create_request = batch_v1.CreateJobRequest()
    create_request.job = job
    create_request.job_id = job_name
    # The job's parent is the region in which the job will run
    create_request.parent = f"projects/{project_id}/locations/{region}"

    return client.create_job(create_request)

Use GPUs for N1 VMs

You can create a job that uses GPUs for N1 VMs using the Google Cloud console, gcloud CLI, Batch API, Java, Node.js, or Python.

Console

To create a job that uses GPUs by using the Google Cloud console, do the following:

In the Google Cloud console, go to the Job list page.

Go to Job list
Click Create. The Create batch job page opens. In the left pane, the Job details page is selected.
Configure the Job details page:
1. Optional: In the Job name field, customize the job name.
  
  For example, enter example-gpu-job.
2. Configure the Task details section:
  1. In the New runnable window, add at least one script or container for this job to run.
    
    For example, to create a basic script job, do the following:
    1. Select the Script checkbox. A field appears.
    2. In the field, enter the following script:
      echo Hello world from task ${BATCH_TASK_INDEX}.
    3. Click Done.
  2. In the Task count field, enter the number of tasks for this job.
    
    For example, enter 3.
  3. Optional: In the Parallelism field, enter the number of tasks to run concurrently.
    
    For example, enter 1 (default).
Configure the Resource specifications page:
1. In the left pane, click Resource specifications. The Resource specifications page opens.
2. Optional: In the VM provisioning model section, select one of the following options for the provisioning model for this job's VMs:
  - If your job can withstand preemption and you want discounted VMs, select Spot.
  - Otherwise, select Standard (default).
3. Select the location for this job.
  1. In the Region field, select a region.
  2. In the Zone field, do one of the following:
    - If you want to restrict this job to run in a specific zone only, select a zone.
    - Otherwise, select any (default).
  Important: Make sure that you specify only locations that offer the GPU machine type that you want for this job.
4. Select the GPU machine type for this job's VMs:
  1. In the machine family options, click GPUs.
  2. In the GPU type field, select the type of GPUs.
    
    If you selected one of the GPU types for N1 VMs, then the Series field is set to N1.
  3. In the Number of GPUs field, select the number of GPUs for each VM.
  4. In the Machine type field, select the machine type.
  5. To automatically install GPU drivers, select GPU driver installation (default).
5. Configure the amount of VM resources required for each task:
  
  Important: Make sure that the GPU machine type has enough VM resources for the job's task requirements.
  1. In the Cores field, enter the amount of vCPUs per task.
    
    For example, enter 1 (default).
  2. In the Memory field, enter the amount of RAM in GB per task.
    
    For example, enter 0.5 (default).
6. Click Done.
Optional: Configure the other fields for this job.
Optional: To review the job configuration, in the left pane, click Preview.
Click Create.

The Job details page displays the job that you created.

gcloud

Create a JSON file that installs GPU drivers, defines the type and count subfields of the accelerators[] field, and runs in a location that has the GPU machine type.

For example, to create a basic script job that uses GPUs for N1 VMs and lets Batch select the exact N1 machine type, create a JSON file with the following contents:

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "accelerators": [
                        {
                            "type": "GPU_TYPE",
                            "count": GPU_COUNT
                        }
                    ]
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

Replace the following:

INSTALL_GPU_DRIVERS: When set to true, Batch fetches the drivers required for the GPU type that you specify in the policy field from a third-party location, and Batch installs them on your behalf. If you set this field to false (default), you need to install GPU drivers manually to use any GPUs for this job.
GPU_TYPE: the GPU type. You can view a list of the available GPU types by using the gcloud compute accelerator-types list command. Only use this field for GPUs for N1 VMs.
GPU_COUNT: the number of GPUs of the specified type. For more information about the valid options, see the GPU machine types for the N1 machine series. Only use this field for GPUs for N1 VMs.
ALLOWED_LOCATIONS: You can optionally use the allowedLocations[] field to specify a region or specific zone(s) in a region where the VMs for your job are allowed to run—for example, regions/us-central1 allows all zones in the region us-central1. Make sure that you specify locations that offer the GPU machine type that you want for this job. Otherwise, if you omit this field, make sure the job's location offers the GPU machine type.

To create and run the job, use the gcloud batch jobs submit command:
```
gcloud batch jobs submit JOB_NAME \
    --location LOCATION \
    --config JSON_CONFIGURATION_FILE
```
Replace the following:
- JOB_NAME: the name of the job.
- LOCATION: the location of the job.
- JSON_CONFIGURATION_FILE: the path for a JSON file with the job's configuration details.

API

Make a POST request to the jobs.create method that installs GPU drivers, defines the type and count subfields of the accelerators[] field, and uses a location that has the GPU machine type.

For example, to create a basic script job that uses GPUs for N1 VMs and lets Batch select the exact N1 machine type, make the following request:

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}."
                        }
                    }
                ]
            },
            "taskCount": 3,
            "parallelism": 1
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": INSTALL_GPU_DRIVERS,
                "policy": {
                    "accelerators": [
                        {
                            "type": "GPU_TYPE",
                            "count": GPU_COUNT
                        }
                    ]
                }
            }
        ],
        "location": {
            "allowedLocations": [
                "ALLOWED_LOCATIONS"
            ]
        }
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

Replace the following:

PROJECT_ID: the project ID of your project.
LOCATION: the location of the job.
JOB_NAME: the name of the job.
INSTALL_GPU_DRIVERS: When set to true, Batch fetches the drivers required for the GPU type that you specify in the policy field from a third-party location, and Batch installs them on your behalf. If you set this field to false (default), you need to install GPU drivers manually to use any GPUs for this job.
GPU_TYPE: the GPU type. You can view a list of the available GPU types by using the gcloud compute accelerator-types list command. Only use this field for GPUs for N1 VMs.
GPU_COUNT: the number of GPUs of the specified type. For more information about the valid options, see GPU machine types for N1 machine series. Only use this field for GPUs for N1 VMs.
ALLOWED_LOCATIONS: You can optionally use the allowedLocations[] field to specify a region or specific zone(s) in a region where the VMs for your job are allowed to run—for example, regions/us-central1 allows all zones in the region us-central1. Make sure that you specify locations that offer the GPU machine type that you want for this job. Otherwise, if you omit this field, make sure the job's location offers the GPU machine type.

Java


import com.google.cloud.batch.v1.AllocationPolicy;
import com.google.cloud.batch.v1.AllocationPolicy.Accelerator;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicy;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicyOrTemplate;
import com.google.cloud.batch.v1.BatchServiceClient;
import com.google.cloud.batch.v1.CreateJobRequest;
import com.google.cloud.batch.v1.Job;
import com.google.cloud.batch.v1.LogsPolicy;
import com.google.cloud.batch.v1.Runnable;
import com.google.cloud.batch.v1.Runnable.Script;
import com.google.cloud.batch.v1.TaskGroup;
import com.google.cloud.batch.v1.TaskSpec;
import com.google.protobuf.Duration;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateGpuJobN1 {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to use.
    String projectId = "YOUR_PROJECT_ID";
    // Name of the region you want to use to run the job. Regions that are
    // available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
    String region = "europe-central2";
    // The name of the job that will be created.
    // It needs to be unique for each project and region pair.
    String jobName = "JOB_NAME";
    // Optional. When set to true, Batch fetches the drivers required for the GPU type
    // that you specify in the policy field from a third-party location,
    // and Batch installs them on your behalf. If you set this field to false (default),
    // you need to install GPU drivers manually to use any GPUs for this job.
    boolean installGpuDrivers = false;
    // The GPU type. You can view a list of the available GPU types
    // by using the `gcloud compute accelerator-types list` command.
    String gpuType = "nvidia-tesla-t4";
    // The number of GPUs of the specified type.
    int gpuCount = 2;

    createGpuJob(projectId, region, jobName, installGpuDrivers, gpuType, gpuCount);
  }

  // Create a job that uses GPUs
  public static Job createGpuJob(String projectId, String region, String jobName,
                                  boolean installGpuDrivers, String gpuType, int gpuCount)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (BatchServiceClient batchServiceClient = BatchServiceClient.create()) {
      // Define what will be done as part of the job.
      Runnable runnable =
          Runnable.newBuilder()
              .setScript(
                  Script.newBuilder()
                      .setText(
                          "echo Hello world! This is task ${BATCH_TASK_INDEX}. "
                                  + "This job has a total of ${BATCH_TASK_COUNT} tasks.")
                      // You can also run a script from a file. Just remember, that needs to be a
                      // script that's already on the VM that will be running the job.
                      // Using setText() and setPath() is mutually exclusive.
                      // .setPath("/tmp/test.sh")
                      .build())
              .build();

      TaskSpec task = TaskSpec.newBuilder()
                  // Jobs can be divided into tasks. In this case, we have only one task.
                  .addRunnables(runnable)
                  .setMaxRetryCount(2)
                  .setMaxRunDuration(Duration.newBuilder().setSeconds(3600).build())
                  .build();

      // Tasks are grouped inside a job using TaskGroups.
      // Currently, it's possible to have only one task group.
      TaskGroup taskGroup = TaskGroup.newBuilder()
          .setTaskCount(3)
          .setParallelism(1)
          .setTaskSpec(task)
          .build();

      // Accelerator describes Compute Engine accelerators to be attached to the VM.
      Accelerator accelerator = Accelerator.newBuilder()
          .setType(gpuType)
          .setCount(gpuCount)
          .build();

      // Policies are used to define on what kind of virtual machines the tasks will run on.
      AllocationPolicy allocationPolicy =
          AllocationPolicy.newBuilder()
              .addInstances(
                  InstancePolicyOrTemplate.newBuilder()
                      .setInstallGpuDrivers(installGpuDrivers)
                      .setPolicy(InstancePolicy.newBuilder().addAccelerators(accelerator))
                      .build())
              .build();

      Job job =
          Job.newBuilder()
              .addTaskGroups(taskGroup)
              .setAllocationPolicy(allocationPolicy)
              .putLabels("env", "testing")
              .putLabels("type", "script")
              // We use Cloud Logging as it's an out of the box available option.
              .setLogsPolicy(
                  LogsPolicy.newBuilder().setDestination(LogsPolicy.Destination.CLOUD_LOGGING))
              .build();

      CreateJobRequest createJobRequest =
          CreateJobRequest.newBuilder()
              // The job's parent is the region in which the job will run.
              .setParent(String.format("projects/%s/locations/%s", projectId, region))
              .setJob(job)
              .setJobId(jobName)
              .build();

      Job result =
          batchServiceClient
              .createJobCallable()
              .futureCall(createJobRequest)
              .get(5, TimeUnit.MINUTES);

      System.out.printf("Successfully created the job: %s", result.getName());

      return result;
    }
  }
}

Node.js

// Imports the Batch library
const batchLib = require('@google-cloud/batch');
const batch = batchLib.protos.google.cloud.batch.v1;

// Instantiates a client
const batchClient = new batchLib.v1.BatchServiceClient();

/**
 * TODO(developer): Update these variables before running the sample.
 */
// Project ID or project number of the Google Cloud project you want to use.
const projectId = await batchClient.getProjectId();
// Name of the region you want to use to run the job. Regions that are
// available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
const region = 'europe-central2';
// The name of the job that will be created.
// It needs to be unique for each project and region pair.
const jobName = 'batch-gpu-job-n1';
// The GPU type. You can view a list of the available GPU types
// by using the `gcloud compute accelerator-types list` command.
const gpuType = 'nvidia-tesla-t4';
// The number of GPUs of the specified type.
const gpuCount = 1;
// Optional. When set to true, Batch fetches the drivers required for the GPU type
// that you specify in the policy field from a third-party location,
// and Batch installs them on your behalf. If you set this field to false (default),
// you need to install GPU drivers manually to use any GPUs for this job.
const installGpuDrivers = false;
// Accelerator-optimized machine types are available to Batch jobs. See the list
// of available types on: https://cloud.google.com/compute/docs/accelerator-optimized-machines
const machineType = 'n1-standard-16';

// Define what will be done as part of the job.
const runnable = new batch.Runnable({
  script: new batch.Runnable.Script({
    commands: ['-c', 'echo Hello world! This is task ${BATCH_TASK_INDEX}.'],
  }),
});

const task = new batch.TaskSpec({
  runnables: [runnable],
  maxRetryCount: 2,
  maxRunDuration: {seconds: 3600},
});

// Tasks are grouped inside a job using TaskGroups.
const group = new batch.TaskGroup({
  taskCount: 3,
  taskSpec: task,
});

// Policies are used to define on what kind of virtual machines the tasks will run on.
// In this case, we tell the system to use "g2-standard-4" machine type.
// Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
const instancePolicy = new batch.AllocationPolicy.InstancePolicy({
  machineType,
  // Accelerator describes Compute Engine accelerators to be attached to the VM
  accelerators: [
    new batch.AllocationPolicy.Accelerator({
      type: gpuType,
      count: gpuCount,
      installGpuDrivers,
    }),
  ],
});

const allocationPolicy = new batch.AllocationPolicy.InstancePolicyOrTemplate({
  instances: [{installGpuDrivers, policy: instancePolicy}],
});

const job = new batch.Job({
  name: jobName,
  taskGroups: [group],
  labels: {env: 'testing', type: 'script'},
  allocationPolicy,
  // We use Cloud Logging as it's an option available out of the box
  logsPolicy: new batch.LogsPolicy({
    destination: batch.LogsPolicy.Destination.CLOUD_LOGGING,
  }),
});
// The job's parent is the project and region in which the job will run
const parent = `projects/${projectId}/locations/${region}`;

async function callCreateBatchGPUJobN1() {
  // Construct request
  const request = {
    parent,
    jobId: jobName,
    job,
  };

  // Run request
  const [response] = await batchClient.createJob(request);
  console.log(JSON.stringify(response));
}

await callCreateBatchGPUJobN1();

Python

from google.cloud import batch_v1


def create_gpu_job(
    project_id: str, region: str, zone: str, job_name: str
) -> batch_v1.Job:
    """
    This method shows how to create a sample Batch Job that will run
    a simple command on Cloud Compute instances on GPU machines.

    Args:
        project_id: project ID or project number of the Cloud project you want to use.
        region: name of the region you want to use to run the job. Regions that are
            available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
        zone: name of the zone you want to use to run the job. Important in regard to GPUs availability.
            GPUs availability can be found here: https://cloud.google.com/compute/docs/gpus/gpu-regions-zones
        job_name: the name of the job that will be created.
            It needs to be unique for each project and region pair.

    Returns:
        A job object representing the job created.
    """
    client = batch_v1.BatchServiceClient()

    # Define what will be done as part of the job.
    task = batch_v1.TaskSpec()
    runnable = batch_v1.Runnable()
    runnable.script = batch_v1.Runnable.Script()
    runnable.script.text = "echo Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks."
    # You can also run a script from a file. Just remember, that needs to be a script that's
    # already on the VM that will be running the job. Using runnable.script.text and runnable.script.path is mutually
    # exclusive.
    # runnable.script.path = '/tmp/test.sh'
    task.runnables = [runnable]

    # We can specify what resources are requested by each task.
    resources = batch_v1.ComputeResource()
    resources.cpu_milli = 2000  # in milliseconds per cpu-second. This means the task requires 2 whole CPUs.
    resources.memory_mib = 16  # in MiB
    task.compute_resource = resources

    task.max_retry_count = 2
    task.max_run_duration = "3600s"

    # Tasks are grouped inside a job using TaskGroups.
    # Currently, it's possible to have only one task group.
    group = batch_v1.TaskGroup()
    group.task_count = 4
    group.task_spec = task

    # Policies are used to define on what kind of virtual machines the tasks will run on.
    # Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
    policy = batch_v1.AllocationPolicy.InstancePolicy()
    policy.machine_type = "n1-standard-16"

    accelerator = batch_v1.AllocationPolicy.Accelerator()
    # Note: not every accelerator is compatible with instance type
    # Read more here: https://cloud.google.com/compute/docs/gpus#t4-gpus
    accelerator.type_ = "nvidia-tesla-t4"
    accelerator.count = 1

    policy.accelerators = [accelerator]
    instances = batch_v1.AllocationPolicy.InstancePolicyOrTemplate()
    instances.policy = policy
    instances.install_gpu_drivers = True
    allocation_policy = batch_v1.AllocationPolicy()
    allocation_policy.instances = [instances]

    location = batch_v1.AllocationPolicy.LocationPolicy()
    location.allowed_locations = ["zones/us-central1-b"]
    allocation_policy.location = location

    job = batch_v1.Job()
    job.task_groups = [group]
    job.allocation_policy = allocation_policy
    job.labels = {"env": "testing", "type": "container"}
    # We use Cloud Logging as it's an out of the box available option
    job.logs_policy = batch_v1.LogsPolicy()
    job.logs_policy.destination = batch_v1.LogsPolicy.Destination.CLOUD_LOGGING

    create_request = batch_v1.CreateJobRequest()
    create_request.job = job
    create_request.job_id = job_name
    # The job's parent is the region in which the job will run
    create_request.parent = f"projects/{project_id}/locations/{region}"

    return client.create_job(create_request)

What's next

If you have issues creating or running a job, see Troubleshooting.
View jobs and tasks.
Learn about more job creation options.

Create and run a job that uses GPUs Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Create a job that uses GPUs

Plan the requirements for a job that uses GPUs

Step 1: Select the GPU machine type and provisioning method

Select the GPU machine type

Select the provisioning method

Step 2: Install the GPU drivers

Step 3: Define compatible VM resources

Create an example job that uses GPUs

Use GPUs for A3 VMs through Dynamic Workload Scheduler for Batch (Preview)

gcloud

API

Use GPUs for accelerator-optimized VMs

Console

gcloud

API

Java

Node.js

Python

Use GPUs for N1 VMs

Console

gcloud

API

Java

Node.js

Python

What's next

Create and run a job that uses GPUs