This document explains how to create and run a job that uses a graphics processing unit (GPU). To learn more about the features and restrictions for GPUs, see About GPUs in the Compute Engine documentation.
When you create a Batch job, you can optionally use GPUs to accelerate specific workloads. Common use cases for jobs that use GPUs include intensive data processing and artificial intelligence workloads (AI) such as machine learning (ML).
Before you begin
- If you haven't used Batch before, review Get started with Batch and enable Batch by completing the prerequisites for projects and users.
-
To get the permissions that you need to create a job, ask your administrator to grant you the following IAM roles:
-
Batch Job Editor (
roles/batch.jobsEditor
) on the project -
Service Account User (
roles/iam.serviceAccountUser
) on the job's service account, which by default is the default Compute Engine service account
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
-
Batch Job Editor (
Create a job that uses GPUs
To create a job that uses GPUs, do the following:
- Plan the requirements for a job that uses GPUs.
- Create a job with the requirements and methods that you identified. For examples of how to create a job using the recommended methods, see Create an example job that uses GPUs in this document.
Plan the requirements for a job that uses GPUs
Before creating a job that uses GPUs, plan the job's requirements as explained in the following sections:
Step 1: Select the GPU machine type
The available GPU machine types (the valid combinations of GPU type, number of GPUs, and machine type (vCPUs and memory)) and their use cases are listed on the GPU machine types page in the Compute Engine documentation.
The fields required for a job to specify a GPU machine type vary based on the categories in the following table:
GPU machine types and job requirements | |
---|---|
GPUs for accelerator-optimized VMs: VMs with a machine type from the accelerator-optimized machine family have a specific type and number of these GPUs automatically attached. |
To use GPUs for accelerator-optimized VMs, we recommend that you specify the machine type. Each accelerator-optimized machine type supports only a specific type and number of GPUs, so it's functionally equivalent whether you do or don't specify those values in addition to the accelerator-optimized machine type. Specifically, Batch also supports specifying only the type and number of GPUs for accelerator-optimized VMs, but the resulting vCPU and memory options are often very limited. As a result, we recommend that you verify that the available vCPU and memory options are compatible with the job's task requirements. |
GPUs for N1 VMs: These GPUs require you to specify the type and amount to attach to each VM and must be attached to VMs with a machine type from the N1 machine series. |
To use GPUs for N1 VMs, we recommend that you specify at least the type of GPUs and number of GPUs. Make sure that the combination of values matches one of the valid GPU options for the N1 machine types. The vCPU and memory options for N1 VMs that use any specific type and number GPUs is quite flexible, so, if preferred, you can let Batch select a machine type that meets the job's task requirements. |
Step 2: Install the GPU drivers
To install the required GPU drivers, select one of the following methods:
Install drivers automatically (recommended if possible): As shown in the examples, to let Batch fetch the required GPU drivers from a third-party location and install them on your behalf, set the
installGpuDrivers
field for the job totrue
. This method is recommended if your job does not require you to install drivers manually.Optionally, if you need to specify which version of the GPU driver that Batch installs, also set the
driverVersion
field.Install drivers manually: This method is required if any of the following are true:
- A job uses both script and container runnables and does not have internet access. For more information about the access a job has, see Batch networking overview.
- A job uses a custom VM image. To learn more about VM OS images and which VM OS images you can use, see VM OS environment overview.
To manually install the required GPU drivers, the following method is recommended:
Create a custom VM image that includes the GPU drivers.
To install GPU drivers, run an installation script based on the OS that you want to use:
If your job has any container runnables and does not use Container-Optimized OS, you must also install the NVIDIA Container Toolkit
Create and submit a job with the custom VM image by using a Compute Engine instance template. Set the
installGpuDrivers
field for the job tofalse
(default).
Step 3: Define compatible VM resources
To learn about the requirements and options for defining the VM resources for a job, see Job resources.
In summary, you must do all of the following when defining the VM resources for a job that uses GPUs:
Make sure that the GPU machine type is available in the location of your job's VMs.
To learn where GPU machine types are available, see GPU availability by regions and zones in the Compute Engine documentation.
If you specify the job's machine type, make sure that machine type has enough vCPUs and memory for the job's task requirements. Specifying the job's machine type is required when using GPUs for accelerator-optimized VMs and optional when using GPUs for N1 VMs.
Make sure you define the VM resources for a job using a valid method:
- Define VM resources directly by using the
instances[].policy
field (recommended if possible). This method is shown in the examples. - Define VM resources through a template by using the
instances[].instanceTemplate
field. This method is required to manually install GPU drivers through a custom image. For more information, see Define job resources using a VM instance template.
- Define VM resources directly by using the
Create an example job that uses GPUs
The following sections explain how to create an example job for different GPU machine types. The example jobs all install GPU drivers automatically and directly define VM resources.
Use GPUs for accelerator-optimized VMs
You can create a job that uses GPUs for accelerator-optimized VMs using the gcloud CLI, Batch API, Java, or Python.
gcloud
Create a JSON file that installs GPU drivers, defines the
machineType
field with machine type from the accelerator-optimized machine family, and uses a location that has the specified type of GPUs.For example, to create a basic script job that uses GPUs for accelerator-optimized VMs, create a JSON file with the following contents:
{ "taskGroups": [ { "taskSpec": { "runnables": [ { "script": { "text": "echo Hello world from task ${BATCH_TASK_INDEX}." } } ] }, "taskCount": 3, "parallelism": 1 } ], "allocationPolicy": { "instances": [ { "installGpuDrivers": INSTALL_GPU_DRIVERS, "policy": { "machineType": "MACHINE_TYPE" } } ], "location": { "allowedLocations": [ "ALLOWED_LOCATIONS" ] } } }
Replace the following:
INSTALL_GPU_DRIVERS
: When set totrue
, Batch fetches the drivers required for the GPU type that you specify in thepolicy
field from a third-party location, and Batch installs them on your behalf. If you set this field tofalse
(default), you need to install GPU drivers manually to use any GPUs for this job.MACHINE_TYPE
: a machine type from the accelerator-optimized machine family.ALLOWED_LOCATIONS
: TheallowedLocations[]
field defines a region, and optionally one or more zones, where the VM instances for your job are allowed to run—for example,regions/us-central1, zones/us-central1-a
allows the zoneus-central1-a
. Make sure that you specify locations that offer the GPU machine type that you want for this job. Otherwise, if you omit this field, make sure the job's location offers the GPU machine type.
To create and run the job, use the
gcloud batch jobs submit
command:gcloud batch jobs submit JOB_NAME \ --location LOCATION \ --config JSON_CONFIGURATION_FILE
Replace the following:
JOB_NAME
: the name of the job.LOCATION
: the location of the job.JSON_CONFIGURATION_FILE
: the path for a JSON file with the job's configuration details.
API
Make a POST
request to the
jobs.create
method
that installs GPU drivers, defines the
machineType
field
with machine type from the accelerator-optimized machine family, and uses
a location that has the specified type of GPUs.
For example, to create a basic script job that uses GPUs for accelerator-optimized VMs, make the following request:
POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME
{
"taskGroups": [
{
"taskSpec": {
"runnables": [
{
"script": {
"text": "echo Hello world from task ${BATCH_TASK_INDEX}."
}
}
]
},
"taskCount": 3,
"parallelism": 1
}
],
"allocationPolicy": {
"instances": [
{
"installGpuDrivers": INSTALL_GPU_DRIVERS,
"policy": {
"machineType": "MACHINE_TYPE"
}
}
],
"location": {
"allowedLocations": [
"ALLOWED_LOCATIONS"
]
}
}
}
Replace the following:
PROJECT_ID
: the project ID of your project.LOCATION
: the location of the job.JOB_NAME
: the name of the job.INSTALL_GPU_DRIVERS
: When set totrue
, Batch fetches the drivers required for the GPU type that you specify in thepolicy
field from a third-party location, and Batch installs them on your behalf. If you set this field tofalse
(default), you need to install GPU drivers manually to use any GPUs for this job.MACHINE_TYPE
: a machine type from the accelerator-optimized machine family.ALLOWED_LOCATIONS
: TheallowedLocations[]
field defines a region, and optionally one or more zones, where the VM instances for your job are allowed to run—for example,regions/us-central1, zones/us-central1-a
allows the zoneus-central1-a
. Make sure that you specify locations that offer the GPU machine type that you want for this job. Otherwise, if you omit this field, make sure the job's location offers the GPU machine type.
Java
Python
Use GPUs for N1 VMs
You can create a job that uses GPUs for N1 VMs using the gcloud CLI, Batch API, Java, Node.js or Python.
gcloud
Create a JSON file that installs GPU drivers, defines the
type
andcount
subfields of theaccelerators[]
field, and uses a location that has the specified type of GPUs.For example, to create a basic script job that uses GPUs for N1 VMs and lets Batch select the exact N1 machine type, create a JSON file with the following contents:
{ "taskGroups": [ { "taskSpec": { "runnables": [ { "script": { "text": "echo Hello world from task ${BATCH_TASK_INDEX}." } } ] }, "taskCount": 3, "parallelism": 1 } ], "allocationPolicy": { "instances": [ { "installGpuDrivers": INSTALL_GPU_DRIVERS, "policy": { "accelerators": [ { "type": "GPU_TYPE", "count": GPU_COUNT } ] } } ], "location": { "allowedLocations": [ "ALLOWED_LOCATIONS" ] } } }
Replace the following:
INSTALL_GPU_DRIVERS
: When set totrue
, Batch fetches the drivers required for the GPU type that you specify in thepolicy
field from a third-party location, and Batch installs them on your behalf. If you set this field tofalse
(default), you need to install GPU drivers manually to use any GPUs for this job.GPU_TYPE
: the GPU type. You can view a list of the available GPU types by using thegcloud compute accelerator-types list
command. Only use this field for GPUs for N1 VMs.GPU_COUNT
: the number of GPUs of the specified type. For more information about the valid options, see the GPU machine types for the N1 machine series. Only use this field for GPUs for N1 VMs.ALLOWED_LOCATIONS
: TheallowedLocations[]
field defines a region, and optionally one or more zones, where the VM instances for your job are allowed to run—for example,regions/us-central1, zones/us-central1-a
allows the zoneus-central1-a
. Make sure that you specify locations that offer the GPU machine type that you want for this job. Otherwise, if you omit this field, make sure the job's location offers the GPU machine type.
To create and run the job, use the
gcloud batch jobs submit
command:gcloud batch jobs submit JOB_NAME \ --location LOCATION \ --config JSON_CONFIGURATION_FILE
Replace the following:
JOB_NAME
: the name of the job.LOCATION
: the location of the job.JSON_CONFIGURATION_FILE
: the path for a JSON file with the job's configuration details.
API
Make a POST
request to the
jobs.create
method
that installs GPU drivers, defines the
type
and count
subfields
of the accelerators[]
field, and uses a location that has the specified
type of GPUs.
For example, to create a basic script job that uses GPUs for N1 VMs and lets Batch select the exact N1 machine type, make the following request:
POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME
{
"taskGroups": [
{
"taskSpec": {
"runnables": [
{
"script": {
"text": "echo Hello world from task ${BATCH_TASK_INDEX}."
}
}
]
},
"taskCount": 3,
"parallelism": 1
}
],
"allocationPolicy": {
"instances": [
{
"installGpuDrivers": INSTALL_GPU_DRIVERS,
"policy": {
"accelerators": [
{
"type": "GPU_TYPE",
"count": GPU_COUNT
}
]
}
}
],
"location": {
"allowedLocations": [
"ALLOWED_LOCATIONS"
]
}
}
}
Replace the following:
PROJECT_ID
: the project ID of your project.LOCATION
: the location of the job.JOB_NAME
: the name of the job.INSTALL_GPU_DRIVERS
: When set totrue
, Batch fetches the drivers required for the GPU type that you specify in thepolicy
field from a third-party location, and Batch installs them on your behalf. If you set this field tofalse
(default), you need to install GPU drivers manually to use any GPUs for this job.GPU_TYPE
: the GPU type. You can view a list of the available GPU types by using thegcloud compute accelerator-types list
command. Only use this field for GPUs for N1 VMs.GPU_COUNT
: the number of GPUs of the specified type. For more information about the valid options, see GPU machine types for N1 machine series. Only use this field for GPUs for N1 VMs.ALLOWED_LOCATIONS
: TheallowedLocations[]
field defines a region, and optionally one or more zones, where the VM instances for your job are allowed to run—for example,regions/us-central1, zones/us-central1-a
allows the zoneus-central1-a
. Make sure that you specify locations that offer the GPU machine type that you want for this job. Otherwise, if you omit this field, make sure the job's location offers the GPU machine type.
Java
Node.js
To create a job with GPUs using Node.js, select one of the following options based on the machine type for your GPU model:
Create a job that uses GPUs with accelerator-optimized VMs
To use GPUs with accelerator-optimized VMs, just specify the machine type that you want for the job's VMs:
Create a job that uses GPUs with N1 VMs
To use GPUs with N1 VMs, you need to specify the number and type of GPUs that you want for each of the job's VMs:
Python
What's next
- If you have issues creating or running a job, see Troubleshooting.
- View jobs and tasks.
- Learn about more job creation options.