This document explains how to create and run a Batch job that uses one or more external storage volumes. Storage options include new or existing persistent disk, new local SSDs, existing Cloud Storage buckets, and an existing network file system (NFS) such as a Filestore file share.
Before you begin
- If you haven't used Batch before, review Get started with Batch and enable Batch by completing the prerequisites for projects and users.
-
To get the permissions that you need to create a job, ask your administrator to grant you the following IAM roles:
-
Batch Job Editor (
roles/batch.jobsEditor
) on the project -
Service Account User (
roles/iam.serviceAccountUser
) on the job's service account, which by default is the default Compute Engine service account -
Create a job that uses a Cloud Storage bucket:
Storage Object Viewer (
roles/storage.objectViewer
) on the bucket
For more information about granting roles, see Manage access.
You might also be able to get the required permissions through custom roles or other predefined roles.
-
Batch Job Editor (
Create a job that uses storage volumes
By default, each Compute Engine VM for a job has a single boot persistent disk that contains the operating system. Optionally, you can create a job that uses additional storage volumes. Specifically, a job's VMs can use one or more of each of the following types of storage volumes. For more information about all of the types of storage volumes and the differences and restrictions for each, see the documentation for Compute Engine VM storage options.
- persistent disk: zonal or regional, persistent block storage
- local SSD: high-performance, transient block storage
- Cloud Storage bucket: affordable object storage
- network file system (NFS): distributed file system that follows Network File System protocol—for example, a Filestore file share, which is a high-performance NFS hosted on Google Cloud
You can allow a job to use each storage volume by including it
in your job's definition and specifying its
mount path (mountPath
)
in your runnables. To learn how to create a job that uses storage volumes, see
one or more of the following sections:
Use a persistent disk
A job that uses persistent disks has the following restrictions:
- All persistent disks: Review the restrictions for all persistent disks.
- Instance templates: If you want to use a VM instance template while creating this job, you must attach any persistent disk(s) for this job in the instance template. Otherwise, if you don't want to use an instance template, you must attach any persistent disk(s) directly in the job definition.
New versus existing persistent disks: Each persistent disk in a job can be either new (defined in and created with the job) or existing (already created in your project and specified in the job). The supported mount options for how Batch mounts the persistent disks to the job's VMs as well as the supported location options for your job and its persistent disks vary between new and existing persistent disks as described in the following table:
New persistent disks Existing persistent disks Mount options All options are supported. All options except writing are supported. This is due to restrictions of multi-writer mode. Location options You can only create zonal persistent disks.
You can select any location for your job. The persistent disks get created in the zone your project runs in.
You can select zonal and regional persistent disks.
You must set the job's location (or, if specified, just the job's allowed locations) to only locations that contain all of the job's persistent disks. For example, for a zonal persistent disk, the job's location must be the disk's zone; for a regional persistent disk, the job's location must be either the disk's region or, if specifying zones, one or both of the specific zones where the regional persistent disk is located.
You can create a job that uses a persistent disk using the
gcloud CLI or Batch API.
The following example describes how to create a job that attaches and mounts
an existing persistent disk and a new persistent disk. The job also has 3 tasks
that each run a script to create a file in the new persistent disk named
output_task_TASK_INDEX.txt
where
TASK_INDEX
is the index of each task:
0
, 1
, and 2
.
gcloud
To create a job that uses persistent disks using the
gcloud CLI, use the
gcloud batch jobs submit
command.
In the job's JSON configuration file, specify the persistent disks in the
instances
field and mount the persistent disk in the volumes
field.
Create a JSON file.
If you are not using an instance template for this job, create a JSON file with the following contents:
{ "allocationPolicy": { "instances": [ { "policy": { "disks": [ { "deviceName": "EXISTING_PERSISTENT_DISK_NAME", "existingDisk": "projects/PROJECT_ID/EXISTING_PERSISTENT_DISK_LOCATION/disks/EXISTING_PERSISTENT_DISK_NAME" }, { "newDisk": { "sizeGb":NEW_PERSISTENT_DISK_SIZE, "type": "NEW_PERSISTENT_DISK_TYPE" }, "deviceName": "NEW_PERSISTENT_DISK_NAME" } ] } } ], "location": { "allowedLocations": [ "EXISTING_PERSISTENT_DISK_LOCATION" ] } }, "taskGroups":[ { "taskSpec":{ "runnables": [ { "script": { "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> /mnt/disks/NEW_PERSISTENT_DISK_NAME/output_task_${BATCH_TASK_INDEX}.txt" } } ], "volumes": [ { "deviceName": "NEW_PERSISTENT_DISK_NAME", "mountPath": "/mnt/disks/NEW_PERSISTENT_DISK_NAME", "mountOptions": "rw,async" }, { "deviceName": "EXISTING_PERSISTENT_DISK_NAME", "mountPath": "/mnt/disks/EXISTING_PERSISTENT_DISK_NAME" } ] }, "taskCount":3 } ], "logsPolicy": { "destination": "CLOUD_LOGGING" } }
Replace the following:
PROJECT_ID
: the project ID of your project.EXISTING_PERSISTENT_DISK_NAME
: the name of an existing persistent disk.EXISTING_PERSISTENT_DISK_LOCATION
: the location of an existing persistent disk. For each existing zonal persistent disk, the job's location must be the disk's zone; for each existing regional persistent disk, the job's location must be either the disk's region or, if specifying zones, one or both of the specific zones where the regional persistent disk is located. If you are not specifying any existing persistent disks, you can select any location. Learn more about theallowedLocations
field.NEW_PERSISTENT_DISK_SIZE
: the size of the new persistent disk in GB. The allowed sizes depend on the type of persistent disk, but the minimum is often 10 GB (10
) and the maximum is often 64 TB (64000
).NEW_PERSISTENT_DISK_TYPE
: the disk type of the new persistent disk, eitherpd-standard
,pd-balanced
,pd-ssd
, orpd-extreme
.NEW_PERSISTENT_DISK_NAME
: the name of the new persistent disk.
If you are using a VM instance template for this job, create a JSON file as shown previously, except replace the
instances
field with the following:"instances": [ { "instanceTemplate": "INSTANCE_TEMPLATE_NAME" } ],
where
INSTANCE_TEMPLATE_NAME
is the name of the instance template for this job. For a job that uses persistent disks, this instance template must define and attach the persistent disks that you want the job to use. For this example, the template must define and attach a new persistent disk namedNEW_PERSISTENT_DISK_NAME
and and attach an existing persistent disk namedEXISTING_PERSISTENT_DISK_NAME
.
Run the following command:
gcloud batch jobs submit JOB_NAME \ --location LOCATION \ --config JSON_CONFIGURATION_FILE
Replace the following:
JOB_NAME
: the name of the job.LOCATION
: the location of the job.JSON_CONFIGURATION_FILE
: the path for a JSON file with the job's configuration details.
API
To create a job that uses persistent disks using the
Batch API, use the
jobs.create
method.
In the request, specify the persistent disks in the
instances
field and mount the persistent disk in the volumes
field.
If you are not using an instance template for this job, make the following request:
POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME { "allocationPolicy": { "instances": [ { "policy": { "disks": [ { "deviceName": "EXISTING_PERSISTENT_DISK_NAME", "existingDisk": "projects/PROJECT_ID/EXISTING_PERSISTENT_DISK_LOCATION/disks/EXISTING_PERSISTENT_DISK_NAME" }, { "newDisk": { "sizeGb":NEW_PERSISTENT_DISK_SIZE, "type": "NEW_PERSISTENT_DISK_TYPE" }, "deviceName": "NEW_PERSISTENT_DISK_NAME" } ] } } ], "location": { "allowedLocations": [ "EXISTING_PERSISTENT_DISK_LOCATION" ] } }, "taskGroups":[ { "taskSpec":{ "runnables": [ { "script": { "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> /mnt/disks/NEW_PERSISTENT_DISK_NAME/output_task_${BATCH_TASK_INDEX}.txt" } } ], "volumes": [ { "deviceName": "NEW_PERSISTENT_DISK_NAME", "mountPath": "/mnt/disks/NEW_PERSISTENT_DISK_NAME", "mountOptions": "rw,async" }, { "deviceName": "EXISTING_PERSISTENT_DISK_NAME", "mountPath": "/mnt/disks/EXISTING_PERSISTENT_DISK_NAME" } ] }, "taskCount":3 } ], "logsPolicy": { "destination": "CLOUD_LOGGING" } }
Replace the following:
PROJECT_ID
: the project ID of your project.LOCATION
: the location of the job.JOB_NAME
: the name of the job.EXISTING_PERSISTENT_DISK_NAME
: the name of an existing persistent disk.EXISTING_PERSISTENT_DISK_LOCATION
: the location of an existing persistent disk. For each existing zonal persistent disk, the job's location must be the disk's zone; for each existing regional persistent disk, the job's location must be either the disk's region or, if specifying zones, one or both of the specific zones where the regional persistent disk is located. If you are not specifying any existing persistent disks, you can select any location. Learn more about theallowedLocations
field.NEW_PERSISTENT_DISK_SIZE
: the size of the new persistent disk in GB. The allowed sizes depend on the type of persistent disk, but the minimum is often 10 GB (10
) and the maximum is often 64 TB (64000
).NEW_PERSISTENT_DISK_TYPE
: the disk type of the new persistent disk, eitherpd-standard
,pd-balanced
,pd-ssd
, orpd-extreme
.NEW_PERSISTENT_DISK_NAME
: the name of the new persistent disk.
If you are using a VM instance template for this job, create a JSON file as shown previously, except replace the
instances
field with the following:"instances": [ { "instanceTemplate": "INSTANCE_TEMPLATE_NAME" } ],
where
INSTANCE_TEMPLATE_NAME
is the name of the instance template for this job. For a job that uses persistent disks, this instance template must define and attach the persistent disks that you want the job to use. For this example, the template must define and attach a new persistent disk namedNEW_PERSISTENT_DISK_NAME
and and attach an existing persistent disk namedEXISTING_PERSISTENT_DISK_NAME
.
Use a local SSD
A job that uses local SSDs has the following restrictions:
- All local SSDs Review the restrictions for all local SSDs.
- Instance templates If you want to specify a VM instance template while creating this job, you must attach any persistent disk(s) for this job in the instance template. Otherwise, if you don't want to use an instance template you must attach any persistent disk(s) directly in the job definition.
You can create a job that uses a local SSD using the
gcloud CLI or Batch API.
The following example describes how to create a job that creates, attaches, and
mounts a local SSD. The job also has 3 tasks
that each run a script to create a file in the local SSD named
output_task_TASK_INDEX.txt
where
TASK_INDEX
is the index of each task:
0
, 1
, and 2
.
gcloud
To create a job that uses local SSDs using the
gcloud CLI, use the
gcloud batch jobs submit
command.
In the job's JSON configuration file, create and attach the local SSDs in the
instances
field and mount the local SSDs in the volumes
field.
Create a JSON file.
If you are not using an instance template for this job, create a JSON file with the following contents:
{ "allocationPolicy": { "instances": [ { "policy": { "machineType": MACHINE_TYPE, "disks": [ { "newDisk": { "sizeGb":LOCAL_SSD_SIZE, "type": "local-ssd" }, "deviceName": "LOCAL_SSD_NAME" } ] } } ] }, "taskGroups":[ { "taskSpec":{ "runnables": [ { "script": { "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> /mnt/disks/LOCAL_SSD_NAME/output_task_${BATCH_TASK_INDEX}.txt" } } ], "volumes": [ { "deviceName": "LOCAL_SSD_NAME", "mountPath": "/mnt/disks/LOCAL_SSD_NAME", "mountOptions": "rw,async" } ] }, "taskCount":3 } ], "logsPolicy": { "destination": "CLOUD_LOGGING" } }
Replace the following:
MACHINE_TYPE
: the machine type, which can be predefined or custom, of the job's VMs. The allowed number of local SSDs depends on the machine type for your job's VMs.LOCAL_SSD_NAME
: the name of a local SSD created for this job.LOCAL_SSD_SIZE
: the size of all the local SSDs in GB. Each local SSD is 375 GB, so this value must be a multiple of375
GB. For example, for 2 local SSDs, set this value to750
GB.
If you are using a VM instance template for this job, create a JSON file as shown previously, except replace the
instances
field with the following:"instances": [ { "instanceTemplate": "INSTANCE_TEMPLATE_NAME" } ],
where
INSTANCE_TEMPLATE_NAME
is the name of the instance template for this job. For a job that uses local SSDs, this instance template must define and attach the local SSDs that you want the job to use. For this example, the template must define and attach a local SSD namedLOCAL_SSD_NAME
.
Run the following command:
gcloud batch jobs submit JOB_NAME \ --location LOCATION \ --config JSON_CONFIGURATION_FILE
Replace the following:
JOB_NAME
: the name of the job.LOCATION
: the location of the job.JSON_CONFIGURATION_FILE
: the path for a JSON file with the job's configuration details.
API
To create a job that uses local SSDs using the
Batch API, use the
jobs.create
method.
In the request, create and attach the local SSDs in the
instances
field and mount the local SSDs in the volumes
field.
If you are not using an instance template for this job, make the following request:
POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME { "allocationPolicy": { "instances": [ { "policy": { "machineType": MACHINE_TYPE, "disks": [ { "newDisk": { "sizeGb":LOCAL_SSD_SIZE, "type": "local-ssd" }, "deviceName": "LOCAL_SSD_NAME" } ] } } ] }, "taskGroups":[ { "taskSpec":{ "runnables": [ { "script": { "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> /mnt/disks/LOCAL_SSD_NAME/output_task_${BATCH_TASK_INDEX}.txt" } } ], "volumes": [ { "deviceName": "LOCAL_SSD_NAME", "mountPath": "/mnt/disks/LOCAL_SSD_NAME", "mountOptions": "rw,async" } ] }, "taskCount":3 } ], "logsPolicy": { "destination": "CLOUD_LOGGING" } }
Replace the following:
PROJECT_ID
: the project ID of your project.LOCATION
: the location of the job.JOB_NAME
: the name of the job.MACHINE_TYPE
: the machine type, which can be predefined or custom, of the job's VMs. The allowed number of local SSDs depends on the machine type for your job's VMs.LOCAL_SSD_NAME
: the name of a local SSD created for this job.LOCAL_SSD_SIZE
: the size of all the local SSDs in GB. Each local SSD is 375 GB, so this value must be a multiple of375
GB. For example, for 2 local SSDs, set this value to750
GB.
If you are using a VM instance template for this job, create a JSON file as shown previously, except replace the
instances
field with the following:"instances": [ { "instanceTemplate": "INSTANCE_TEMPLATE_NAME" } ],
where
INSTANCE_TEMPLATE_NAME
is the name of the instance template for this job. For a job that uses local SSDs, this instance template must define and attach the local SSDs that you want the job to use. For this example, the template must define and attach a local SSD namedLOCAL_SSD_NAME
.
Use a Cloud Storage bucket
To create a job that uses an existing Cloud Storage bucket, select one of the following methods:
- Recommended: Mount a bucket directly to your job's VMs by specifying the bucket in the job's definition, as shown in this section. When the job runs, the bucket is automatically mounted to the VMs for your job using Cloud Storage FUSE.
- Create a job with tasks that directly access a
Cloud Storage bucket by using the
gsutil
command-line tool or client libraries for the Cloud Storage API. To learn how to access a Cloud Storage bucket directly from a VM, see the Compute Engine documentation for Writing and reading data from Cloud Storage buckets.
Before you create a job that uses a bucket, create a bucket or identify an existing bucket. For more information, see Create buckets and List buckets.
You can create a job that uses a Cloud Storage bucket using the gcloud CLI, Batch API, Go, Java, Node.js, or Python.
The following example describes how to create a job mounts a
Cloud Storage bucket. The job also has 3 tasks that each run
a script to create a file in the bucket named
output_task_TASK_INDEX.txt
where TASK_INDEX
is the index of each task:
0
, 1
, and 2
.
gcloud
To create a job that uses a Cloud Storage bucket using the
gcloud CLI, use the
gcloud batch jobs submit
command.
In the job's JSON configuration file, mount the bucket in the
volumes
field.
For example, to create a job that outputs files to a Cloud Storage:
- Create a JSON file in the current directory named
hello-world-bucket.json
with the following contents:json { "taskGroups": [ { "taskSpec": { "runnables": [ { "script": { "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> MOUNT_PATH/output_task_${BATCH_TASK_INDEX}.txt" } } ], "volumes": [ { "gcs": { "remotePath": "BUCKET_PATH" }, "mountPath": "MOUNT_PATH" } ] }, "taskCount": 3 } ], "logsPolicy": { "destination": "CLOUD_LOGGING" } }
Replace the following:
BUCKET_PATH
: the path of the bucket directory that you want this job to access, which must start with the name of the bucket. For example, for a bucket namedBUCKET_NAME
, the pathBUCKET_NAME
represents the root directory of the bucket and the pathBUCKET_NAME/subdirectory
represents thesubdirectory
subdirectory.MOUNT_PATH
: the mount path that the job's runnables use to access this bucket. The path must start with/mnt/disks/
followed by a directory or path that you choose. For example, if you want to represent this bucket with a directory namedmy-bucket
, set the mount path to/mnt/disks/my-bucket
.
Run the following command:
gcloud batch jobs submit example-bucket-job \ --location us-central1 \ --config hello-world-bucket.json
API
To create a job that uses a Cloud Storage bucket using the
Batch API, use the
jobs.create
method
and mount the bucket in the volumes
field.
POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/jobs?job_id=example-bucket-job
{
"taskGroups": [
{
"taskSpec": {
"runnables": [
{
"script": {
"text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> MOUNT_PATH/output_task_${BATCH_TASK_INDEX}.txt"
}
}
],
"volumes": [
{
"gcs": {
"remotePath": "BUCKET_PATH"
},
"mountPath": "MOUNT_PATH"
}
]
},
"taskCount": 3
}
],
"logsPolicy": {
"destination": "CLOUD_LOGGING"
}
}
Replace the following:
PROJECT_ID
: the project ID of your project.BUCKET_PATH
: the path of the bucket directory that you want this job to access, which must start with the name of the bucket. For example, for a bucket namedBUCKET_NAME
, the pathBUCKET_NAME
represents the root directory of the bucket and the pathBUCKET_NAME/subdirectory
represents thesubdirectory
subdirectory.MOUNT_PATH
: the mount path that the job's runnables use to access this bucket. The path must start with/mnt/disks/
followed by a directory or path that you choose. For example, if you want to represent this bucket with a directory namedmy-bucket
, set the mount path to/mnt/disks/my-bucket
.
Go
Go
For more information, see the Batch Go API reference documentation.
Java
Java
For more information, see the Batch Java API reference documentation.
Node.js
Node.js
For more information, see the Batch Node.js API reference documentation.
Python
Python
For more information, see the Batch Python API reference documentation.
Use a network file system
You can create a job that uses an existing network file system (NFS), such as a Filestore file share, using the gcloud CLI or Batch API.
Before creating a job that uses a NFS, make sure that your network's firewall is properly configured to allow traffic between your job's VMs and the NFS. For more information, see Configuring firewall rules for Filestore.
This following example describes how to create a job that specifies and
mounts a NFS. The job also has 3
tasks that each run a script to create a file in the NFS named
output_task_TASK_INDEX.txt
where TASK_INDEX
is the index of each task:
0
, 1
, and 2
.
gcloud
To create a job that uses a NFS using the
gcloud CLI, use the
gcloud batch jobs submit
command.
In the job's JSON configuration file, mount the NFS in the
volumes
field.
Create a JSON file with the following contents:
{ "taskGroups": [ { "taskSpec": { "runnables": [ { "script": { "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> MOUNT_PATH/output_task_${BATCH_TASK_INDEX}.txt" } } ], "volumes": [ { "nfs": { "server": "NFS_IP_ADDRESS", "remotePath": "NFS_PATH" }, "mountPath": "MOUNT_PATH" } ] }, "taskCount": 3 } ], "logsPolicy": { "destination": "CLOUD_LOGGING" } }
Replace the following:
NFS_IP_ADDRESS
: the IP address of the NFS. For example, if your NFS is a Filestore file share, then specify the IP address of the VM hosting the Filestore file share, which you can get by describing the Filestore VM.NFS_PATH
: the path of the NFS directory that you want this job to access, which must start with a/
followed by the root directory of the NFS. For example, for a Filestore file share namedFILE_SHARE_NAME
, the path/FILE_SHARE_NAME
represents the root directory of the file share and the path/FILE_SHARE_NAME/subdirectory
represents thesubdirectory
subdirectory.MOUNT_PATH
: the mount path that the job's runnables use to access this NFS. The path must start with/mnt/disks/
followed by a directory or path that you choose. For example, if you want to represent this NFS with a directory namedmy-nfs
, set the mount path to/mnt/disks/my-nfs
.
Run the following command:
gcloud batch jobs submit JOB_NAME \ --location LOCATION \ --config JSON_CONFIGURATION_FILE
Replace the following:
JOB_NAME
: the name of the job.LOCATION
: the location of the job.JSON_CONFIGURATION_FILE
: the path for a JSON file with the job's configuration details.
API
To create a job that uses a NFS using the
Batch API, use the
jobs.create
method
and mount the NFS in the volumes
field.
POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME
{
"taskGroups": [
{
"taskSpec": {
"runnables": [
{
"script": {
"text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> MOUNT_PATH/output_task_${BATCH_TASK_INDEX}.txt"
}
}
],
"volumes": [
{
"nfs": {
"server": "NFS_IP_ADDRESS",
"remotePath": "NFS_PATH"
},
"mountPath": "MOUNT_PATH"
}
]
},
"taskCount": 3
}
],
"logsPolicy": {
"destination": "CLOUD_LOGGING"
}
}
Replace the following:
PROJECT_ID
: the project ID of your project.LOCATION
: the location of the job.JOB_NAME
: the name of the job.NFS_IP_ADDRESS
: the IP address of the Network File System. For example, if your NFS is a Filestore file share, then specify the IP address of the VM hosting the Filestore file share, which you can get by describing the Filestore VM.NFS_PATH
: the path of the NFS directory that you want this job to access, which must start with a/
followed by the root directory of the NFS. For example, for a Filestore file share namedFILE_SHARE_NAME
, the path/FILE_SHARE_NAME
represents the root directory of the file share and the path/FILE_SHARE_NAME/subdirectory
represents a subdirectory.MOUNT_PATH
: the mount path that the job's runnables use to access this NFS. The path must start with/mnt/disks/
followed by a directory or path that you choose. For example, if you want to represent this NFS with a directory namedmy-nfs
, set the mount path to/mnt/disks/my-nfs
.
What's next
- If you have issues creating or running a job, see Troubleshooting.
- View jobs and tasks.
- Learn about more job creation options.