Create and run a job that uses storage volumes

Stay organized with collections Save and categorize content based on your preferences.

This document explains how to create and run a Batch job that uses one or more external storage volumes. Storage options include new or existing persistent disk, new local SSDs, existing Cloud Storage buckets, and an existing network file system (NFS) such as a Filestore file share.

Before you begin

Create a job that uses storage volumes

By default, each Compute Engine VM for a job has a single boot persistent disk that contains the operating system. Optionally, you can create a job that uses additional storage volumes. Specifically, a job's VMs can use one or more of each of the following types of storage volumes. For more information about all of the types of storage volumes and the differences and restrictions for each, see the documentation for Compute Engine VM storage options.

You can allow a job to use each storage volume by including it in your job's definition and specifying its mount path (mountPath) in your runnables. To learn how to create a job that uses storage volumes, see one or more of the following sections:

Use a persistent disk

A job that uses persistent disks has the following restrictions:

  • All persistent disks: Review the restrictions for all persistent disks.
  • Instance templates: If you want to use a VM instance template while creating this job, you must attach any persistent disk(s) for this job in the instance template. Otherwise, if you don't want to use an instance template, you must attach any persistent disk(s) directly in the job definition.
  • New versus existing persistent disks: Each persistent disk in a job can be either new (defined in and created with the job) or existing (already created in your project and specified in the job). The supported mount options for how Batch mounts the persistent disks to the job's VMs as well as the supported location options for your job and its persistent disks vary between new and existing persistent disks as described in the following table:

    New persistent disks Existing persistent disks
    Mount options All options are supported. All options except writing are supported. This is due to restrictions of multi-writer mode.
    Location options

    You can only create zonal persistent disks.

    You can select any location for your job. The persistent disks get created in the zone your project runs in.

    You can select zonal and regional persistent disks.

    You must set the job's location (or, if specified, just the job's allowed locations) to only locations that contain all of the job's persistent disks. For example, for a zonal persistent disk, the job's location must be the disk's zone; for a regional persistent disk, the job's location must be either the disk's region or, if specifying zones, one or both of the specific zones where the regional persistent disk is located.

You can create a job that uses a persistent disk using the gcloud CLI or Batch API. The following example describes how to create a job that attaches and mounts an existing persistent disk and a new persistent disk. The job also has 3 tasks that each run a script to create a file in the new persistent disk named output_task_TASK_INDEX.txt where TASK_INDEX is the index of each task: 0, 1, and 2.

gcloud

To create a job that uses persistent disks using the gcloud CLI, use the gcloud batch jobs submit command. In the job's JSON configuration file, specify the persistent disks in the instances field and mount the persistent disk in the volumes field.

  1. Create a JSON file.

    • If you are not using an instance template for this job, create a JSON file with the following contents:

      {
          "allocationPolicy": {
              "instances": [
                  {
                      "policy": {
                          "disks": [
                              {
                                  "deviceName": "EXISTING_PERSISTENT_DISK_NAME",
                                  "existingDisk": "projects/PROJECT_ID/EXISTING_PERSISTENT_DISK_LOCATION/disks/EXISTING_PERSISTENT_DISK_NAME"
                              },
                              {
                                  "newDisk": {
                                      "sizeGb":NEW_PERSISTENT_DISK_SIZE,
                                      "type": "NEW_PERSISTENT_DISK_TYPE"
                                  },
                                  "deviceName": "NEW_PERSISTENT_DISK_NAME"
                              }
                          ]
                      }
                  }
              ],
              "location": {
                  "allowedLocations": [
                      "EXISTING_PERSISTENT_DISK_LOCATION"
                  ]
              }
          },
          "taskGroups":[
              {
                  "taskSpec":{
                      "runnables": [
                          {
                              "script": {
                                  "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> /mnt/disks/NEW_PERSISTENT_DISK_NAME/output_task_${BATCH_TASK_INDEX}.txt"
                              }
                          }
                      ],
                      "volumes": [
                          {
                              "deviceName": "NEW_PERSISTENT_DISK_NAME",
                              "mountPath": "/mnt/disks/NEW_PERSISTENT_DISK_NAME",
                              "mountOptions": "rw,async"
                          },
                          {
      
                              "deviceName": "EXISTING_PERSISTENT_DISK_NAME",
                              "mountPath": "/mnt/disks/EXISTING_PERSISTENT_DISK_NAME"
                          }
                      ]
                  },
                  "taskCount":3
              }
          ],
          "logsPolicy": {
              "destination": "CLOUD_LOGGING"
          }
      }
      

      Replace the following:

      • PROJECT_ID: the project ID of your project.
      • EXISTING_PERSISTENT_DISK_NAME: the name of an existing persistent disk.
      • EXISTING_PERSISTENT_DISK_LOCATION: the location of an existing persistent disk. For each existing zonal persistent disk, the job's location must be the disk's zone; for each existing regional persistent disk, the job's location must be either the disk's region or, if specifying zones, one or both of the specific zones where the regional persistent disk is located. If you are not specifying any existing persistent disks, you can select any location. Learn more about the allowedLocations field.
      • NEW_PERSISTENT_DISK_SIZE: the size of the new persistent disk in GB. The allowed sizes depend on the type of persistent disk, but the minimum is often 10 GB (10) and the maximum is often 64 TB (64000).
      • NEW_PERSISTENT_DISK_TYPE: the disk type of the new persistent disk, either pd-standard, pd-balanced, pd-ssd, or pd-extreme.
      • NEW_PERSISTENT_DISK_NAME: the name of the new persistent disk.
    • If you are using a VM instance template for this job, create a JSON file as shown previously, except replace the instances field with the following:

      "instances": [
          {
              "instanceTemplate": "INSTANCE_TEMPLATE_NAME"
          }
      ],
      

      where INSTANCE_TEMPLATE_NAME is the name of the instance template for this job. For a job that uses persistent disks, this instance template must define and attach the persistent disks that you want the job to use. For this example, the template must define and attach a new persistent disk named NEW_PERSISTENT_DISK_NAME and and attach an existing persistent disk named EXISTING_PERSISTENT_DISK_NAME.

  2. Run the following command:

    gcloud batch jobs submit JOB_NAME \
      --location LOCATION \
      --config JSON_CONFIGURATION_FILE
    

    Replace the following:

    • JOB_NAME: the name of the job.
    • LOCATION: the location of the job.
    • JSON_CONFIGURATION_FILE: the path for a JSON file with the job's configuration details.

API

To create a job that uses persistent disks using the Batch API, use the jobs.create method. In the request, specify the persistent disks in the instances field and mount the persistent disk in the volumes field.

  • If you are not using an instance template for this job, make the following request:

    POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME
    
    {
        "allocationPolicy": {
            "instances": [
                {
                    "policy": {
                        "disks": [
                            {
                                "deviceName": "EXISTING_PERSISTENT_DISK_NAME",
                                "existingDisk": "projects/PROJECT_ID/EXISTING_PERSISTENT_DISK_LOCATION/disks/EXISTING_PERSISTENT_DISK_NAME"
                            },
                            {
                                "newDisk": {
                                    "sizeGb":NEW_PERSISTENT_DISK_SIZE,
                                    "type": "NEW_PERSISTENT_DISK_TYPE"
                                },
                                "deviceName": "NEW_PERSISTENT_DISK_NAME"
                            }
                        ]
                    }
                }
            ],
            "location": {
                "allowedLocations": [
                    "EXISTING_PERSISTENT_DISK_LOCATION"
                ]
            }
        },
        "taskGroups":[
            {
                "taskSpec":{
                    "runnables": [
                        {
                            "script": {
                                "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> /mnt/disks/NEW_PERSISTENT_DISK_NAME/output_task_${BATCH_TASK_INDEX}.txt"
                            }
                        }
                    ],
                    "volumes": [
                        {
                            "deviceName": "NEW_PERSISTENT_DISK_NAME",
                            "mountPath": "/mnt/disks/NEW_PERSISTENT_DISK_NAME",
                            "mountOptions": "rw,async"
                        },
                        {
    
                            "deviceName": "EXISTING_PERSISTENT_DISK_NAME",
                            "mountPath": "/mnt/disks/EXISTING_PERSISTENT_DISK_NAME"
                        }
                    ]
                },
                "taskCount":3
            }
        ],
        "logsPolicy": {
            "destination": "CLOUD_LOGGING"
        }
    }
    

    Replace the following:

    • PROJECT_ID: the project ID of your project.
    • LOCATION: the location of the job.
    • JOB_NAME: the name of the job.
    • EXISTING_PERSISTENT_DISK_NAME: the name of an existing persistent disk.
    • EXISTING_PERSISTENT_DISK_LOCATION: the location of an existing persistent disk. For each existing zonal persistent disk, the job's location must be the disk's zone; for each existing regional persistent disk, the job's location must be either the disk's region or, if specifying zones, one or both of the specific zones where the regional persistent disk is located. If you are not specifying any existing persistent disks, you can select any location. Learn more about the allowedLocations field.
    • NEW_PERSISTENT_DISK_SIZE: the size of the new persistent disk in GB. The allowed sizes depend on the type of persistent disk, but the minimum is often 10 GB (10) and the maximum is often 64 TB (64000).
    • NEW_PERSISTENT_DISK_TYPE: the disk type of the new persistent disk, either pd-standard, pd-balanced, pd-ssd, or pd-extreme.
    • NEW_PERSISTENT_DISK_NAME: the name of the new persistent disk.
  • If you are using a VM instance template for this job, create a JSON file as shown previously, except replace the instances field with the following:

    "instances": [
        {
            "instanceTemplate": "INSTANCE_TEMPLATE_NAME"
        }
    ],
    

    where INSTANCE_TEMPLATE_NAME is the name of the instance template for this job. For a job that uses persistent disks, this instance template must define and attach the persistent disks that you want the job to use. For this example, the template must define and attach a new persistent disk named NEW_PERSISTENT_DISK_NAME and and attach an existing persistent disk named EXISTING_PERSISTENT_DISK_NAME.

Use a local SSD

A job that uses local SSDs has the following restrictions:

You can create a job that uses a local SSD using the gcloud CLI or Batch API. The following example describes how to create a job that creates, attaches, and mounts a local SSD. The job also has 3 tasks that each run a script to create a file in the local SSD named output_task_TASK_INDEX.txt where TASK_INDEX is the index of each task: 0, 1, and 2.

gcloud

To create a job that uses local SSDs using the gcloud CLI, use the gcloud batch jobs submit command. In the job's JSON configuration file, create and attach the local SSDs in the instances field and mount the local SSDs in the volumes field.

  1. Create a JSON file.

    • If you are not using an instance template for this job, create a JSON file with the following contents:

      {
          "allocationPolicy": {
              "instances": [
                  {
                      "policy": {
                          "machineType": MACHINE_TYPE,
                          "disks": [
                              {
                                  "newDisk": {
                                      "sizeGb":LOCAL_SSD_SIZE,
                                      "type": "local-ssd"
                                  },
                                  "deviceName": "LOCAL_SSD_NAME"
                              }
                          ]
                      }
                  }
              ]
          },
          "taskGroups":[
              {
                  "taskSpec":{
                      "runnables": [
                          {
                              "script": {
                                  "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> /mnt/disks/LOCAL_SSD_NAME/output_task_${BATCH_TASK_INDEX}.txt"
                              }
                          }
                      ],
                      "volumes": [
                          {
                              "deviceName": "LOCAL_SSD_NAME",
                              "mountPath": "/mnt/disks/LOCAL_SSD_NAME",
                              "mountOptions": "rw,async"
                          }
                      ]
                  },
                  "taskCount":3
              }
          ],
          "logsPolicy": {
              "destination": "CLOUD_LOGGING"
          }
      }
      

      Replace the following:

      • MACHINE_TYPE: the machine type, which can be predefined or custom, of the job's VMs. The allowed number of local SSDs depends on the machine type for your job's VMs.
      • LOCAL_SSD_NAME: the name of a local SSD created for this job.
      • LOCAL_SSD_SIZE: the size of all the local SSDs in GB. Each local SSD is 375 GB, so this value must be a multiple of 375 GB. For example, for 2 local SSDs, set this value to 750 GB.
    • If you are using a VM instance template for this job, create a JSON file as shown previously, except replace the instances field with the following:

      "instances": [
          {
              "instanceTemplate": "INSTANCE_TEMPLATE_NAME"
          }
      ],
      

      where INSTANCE_TEMPLATE_NAME is the name of the instance template for this job. For a job that uses local SSDs, this instance template must define and attach the local SSDs that you want the job to use. For this example, the template must define and attach a local SSD named LOCAL_SSD_NAME.

  2. Run the following command:

    gcloud batch jobs submit JOB_NAME \
      --location LOCATION \
      --config JSON_CONFIGURATION_FILE
    

    Replace the following:

    • JOB_NAME: the name of the job.
    • LOCATION: the location of the job.
    • JSON_CONFIGURATION_FILE: the path for a JSON file with the job's configuration details.

API

To create a job that uses local SSDs using the Batch API, use the jobs.create method. In the request, create and attach the local SSDs in the instances field and mount the local SSDs in the volumes field.

  • If you are not using an instance template for this job, make the following request:

    POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME
    
    {
        "allocationPolicy": {
            "instances": [
                {
                    "policy": {
                        "machineType": MACHINE_TYPE,
                        "disks": [
                            {
                                "newDisk": {
                                    "sizeGb":LOCAL_SSD_SIZE,
                                    "type": "local-ssd"
                                },
                                "deviceName": "LOCAL_SSD_NAME"
                            }
                        ]
                    }
                }
            ]
        },
        "taskGroups":[
            {
                "taskSpec":{
                    "runnables": [
                        {
                            "script": {
                                "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> /mnt/disks/LOCAL_SSD_NAME/output_task_${BATCH_TASK_INDEX}.txt"
                            }
                        }
                    ],
                    "volumes": [
                        {
                            "deviceName": "LOCAL_SSD_NAME",
                            "mountPath": "/mnt/disks/LOCAL_SSD_NAME",
                            "mountOptions": "rw,async"
                        }
                    ]
                },
                "taskCount":3
            }
        ],
        "logsPolicy": {
            "destination": "CLOUD_LOGGING"
        }
    }
    

    Replace the following:

    • PROJECT_ID: the project ID of your project.
    • LOCATION: the location of the job.
    • JOB_NAME: the name of the job.
    • MACHINE_TYPE: the machine type, which can be predefined or custom, of the job's VMs. The allowed number of local SSDs depends on the machine type for your job's VMs.
    • LOCAL_SSD_NAME: the name of a local SSD created for this job.
    • LOCAL_SSD_SIZE: the size of all the local SSDs in GB. Each local SSD is 375 GB, so this value must be a multiple of 375 GB. For example, for 2 local SSDs, set this value to 750 GB.
  • If you are using a VM instance template for this job, create a JSON file as shown previously, except replace the instances field with the following:

    "instances": [
        {
            "instanceTemplate": "INSTANCE_TEMPLATE_NAME"
        }
    ],
    

    where INSTANCE_TEMPLATE_NAME is the name of the instance template for this job. For a job that uses local SSDs, this instance template must define and attach the local SSDs that you want the job to use. For this example, the template must define and attach a local SSD named LOCAL_SSD_NAME.

Use a Cloud Storage bucket

To create a job that uses an existing Cloud Storage bucket, select one of the following methods:

  • Recommended: Mount a bucket directly to your job's VMs by specifying the bucket in the job's definition, as shown in this section. When the job runs, the bucket is automatically mounted to the VMs for your job using Cloud Storage FUSE.
  • Create a job with tasks that directly access a Cloud Storage bucket by using the gsutil command-line tool or client libraries for the Cloud Storage API. To learn how to access a Cloud Storage bucket directly from a VM, see the Compute Engine documentation for Writing and reading data from Cloud Storage buckets.

Before you create a job that uses a bucket, create a bucket or identify an existing bucket. For more information, see Create buckets and List buckets.

You can create a job that uses a Cloud Storage bucket using the gcloud CLI, Batch API, Go, Java, Node.js, or Python.

The following example describes how to create a job mounts a Cloud Storage bucket. The job also has 3 tasks that each run a script to create a file in the bucket named output_task_TASK_INDEX.txt where TASK_INDEX is the index of each task: 0, 1, and 2.

gcloud

To create a job that uses a Cloud Storage bucket using the gcloud CLI, use the gcloud batch jobs submit command. In the job's JSON configuration file, mount the bucket in the volumes field.

For example, to create a job that outputs files to a Cloud Storage:

  1. Create a JSON file in the current directory named hello-world-bucket.json with the following contents: json { "taskGroups": [ { "taskSpec": { "runnables": [ { "script": { "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> MOUNT_PATH/output_task_${BATCH_TASK_INDEX}.txt" } } ], "volumes": [ { "gcs": { "remotePath": "BUCKET_PATH" }, "mountPath": "MOUNT_PATH" } ] }, "taskCount": 3 } ], "logsPolicy": { "destination": "CLOUD_LOGGING" } } Replace the following:
  • BUCKET_PATH: the path of the bucket directory that you want this job to access, which must start with the name of the bucket. For example, for a bucket named BUCKET_NAME, the path BUCKET_NAME represents the root directory of the bucket and the path BUCKET_NAME/subdirectory represents the subdirectory subdirectory.
  • MOUNT_PATH: the mount path that the job's runnables use to access this bucket. The path must start with /mnt/disks/ followed by a directory or path that you choose. For example, if you want to represent this bucket with a directory named my-bucket, set the mount path to /mnt/disks/my-bucket.
  1. Run the following command:

    gcloud batch jobs submit example-bucket-job \
      --location us-central1 \
      --config hello-world-bucket.json
    

API

To create a job that uses a Cloud Storage bucket using the Batch API, use the jobs.create method and mount the bucket in the volumes field.

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/jobs?job_id=example-bucket-job

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> MOUNT_PATH/output_task_${BATCH_TASK_INDEX}.txt"
                        }
                    }
                ],
                "volumes": [
                    {
                        "gcs": {
                            "remotePath": "BUCKET_PATH"
                        },
                        "mountPath": "MOUNT_PATH"
                    }
                ]
            },
            "taskCount": 3
        }
    ],
    "logsPolicy": {
            "destination": "CLOUD_LOGGING"
    }
}

Replace the following:

  • PROJECT_ID: the project ID of your project.
  • BUCKET_PATH: the path of the bucket directory that you want this job to access, which must start with the name of the bucket. For example, for a bucket named BUCKET_NAME, the path BUCKET_NAME represents the root directory of the bucket and the path BUCKET_NAME/subdirectory represents the subdirectory subdirectory.
  • MOUNT_PATH: the mount path that the job's runnables use to access this bucket. The path must start with /mnt/disks/ followed by a directory or path that you choose. For example, if you want to represent this bucket with a directory named my-bucket, set the mount path to /mnt/disks/my-bucket.

Go

Go

For more information, see the Batch Go API reference documentation.

import (
	"context"
	"fmt"
	"io"

	batch "cloud.google.com/go/batch/apiv1"
	batchpb "google.golang.org/genproto/googleapis/cloud/batch/v1"
	durationpb "google.golang.org/protobuf/types/known/durationpb"
)

// Creates and runs a job that executes the specified script
func createScriptJobWithBucket(w io.Writer, projectID, region, jobName, bucketName string) error {
	// projectID := "your_project_id"
	// region := "us-central1"
	// jobName := "some-job"
	// jobName := "some-bucket"

	ctx := context.Background()
	batchClient, err := batch.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer batchClient.Close()

	// Define what will be done as part of the job.
	command := &batchpb.Runnable_Script_Text{
		Text: "echo Hello world from task ${BATCH_TASK_INDEX}. >> /mnt/share/output_task_${BATCH_TASK_INDEX}.txt",
	}

	// Specify the Google Cloud Storage bucket to mount
	volume := &batchpb.Volume{
		Source: &batchpb.Volume_Gcs{
			Gcs: &batchpb.GCS{
				RemotePath: bucketName,
			},
		},
		MountPath:    "/mnt/share",
		MountOptions: []string{},
	}

	// We can specify what resources are requested by each task.
	resources := &batchpb.ComputeResource{
		// CpuMilli is milliseconds per cpu-second. This means the task requires 50% of a single CPUs.
		CpuMilli:  500,
		MemoryMib: 16,
	}

	taskSpec := &batchpb.TaskSpec{
		Runnables: []*batchpb.Runnable{{
			Executable: &batchpb.Runnable_Script_{
				Script: &batchpb.Runnable_Script{Command: command},
			},
		}},
		ComputeResource: resources,
		MaxRunDuration: &durationpb.Duration{
			Seconds: 3600,
		},
		MaxRetryCount: 2,
		Volumes:       []*batchpb.Volume{volume},
	}

	// Tasks are grouped inside a job using TaskGroups.
	taskGroups := []*batchpb.TaskGroup{
		{
			TaskCount: 4,
			TaskSpec:  taskSpec,
		},
	}

	// Policies are used to define on what kind of virtual machines the tasks will run on.
	// In this case, we tell the system to use "e2-standard-4" machine type.
	// Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
	allocationPolicy := &batchpb.AllocationPolicy{
		Instances: []*batchpb.AllocationPolicy_InstancePolicyOrTemplate{{
			PolicyTemplate: &batchpb.AllocationPolicy_InstancePolicyOrTemplate_Policy{
				Policy: &batchpb.AllocationPolicy_InstancePolicy{
					MachineType: "e2-standard-4",
				},
			},
		}},
	}

	// We use Cloud Logging as it's an out of the box available option
	logsPolicy := &batchpb.LogsPolicy{
		Destination: batchpb.LogsPolicy_CLOUD_LOGGING,
	}

	jobLabels := map[string]string{"env": "testing", "type": "script"}

	// The job's parent is the region in which the job will run
	parent := fmt.Sprintf("projects/%s/locations/%s", projectID, region)

	job := batchpb.Job{
		TaskGroups:       taskGroups,
		AllocationPolicy: allocationPolicy,
		Labels:           jobLabels,
		LogsPolicy:       logsPolicy,
	}

	req := &batchpb.CreateJobRequest{
		Parent: parent,
		JobId:  jobName,
		Job:    &job,
	}

	created_job, err := batchClient.CreateJob(ctx, req)
	if err != nil {
		return fmt.Errorf("unable to create job: %v", err)
	}

	fmt.Fprintf(w, "Job created: %v\n", created_job)

	return nil
}

Java

Java

For more information, see the Batch Java API reference documentation.


import com.google.cloud.batch.v1.AllocationPolicy;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicy;
import com.google.cloud.batch.v1.AllocationPolicy.InstancePolicyOrTemplate;
import com.google.cloud.batch.v1.BatchServiceClient;
import com.google.cloud.batch.v1.ComputeResource;
import com.google.cloud.batch.v1.CreateJobRequest;
import com.google.cloud.batch.v1.GCS;
import com.google.cloud.batch.v1.Job;
import com.google.cloud.batch.v1.LogsPolicy;
import com.google.cloud.batch.v1.LogsPolicy.Destination;
import com.google.cloud.batch.v1.Runnable;
import com.google.cloud.batch.v1.Runnable.Script;
import com.google.cloud.batch.v1.TaskGroup;
import com.google.cloud.batch.v1.TaskSpec;
import com.google.cloud.batch.v1.Volume;
import com.google.protobuf.Duration;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateWithMountedBucket {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Cloud project you want to use.
    String projectId = "YOUR_PROJECT_ID";

    // Name of the region you want to use to run the job. Regions that are
    // available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
    String region = "europe-central2";

    // The name of the job that will be created.
    // It needs to be unique for each project and region pair.
    String jobName = "JOB_NAME";

    // Name of the bucket to be mounted for your Job.
    String bucketName = "BUCKET_NAME";

    createScriptJobWithBucket(projectId, region, jobName, bucketName);
  }

  // This method shows how to create a sample Batch Job that will run
  // a simple command on Cloud Compute instances.
  public static void createScriptJobWithBucket(String projectId, String region, String jobName,
      String bucketName)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the `batchServiceClient.close()` method on the client to safely
    // clean up any remaining background resources.
    try (BatchServiceClient batchServiceClient = BatchServiceClient.create()) {

      // Define what will be done as part of the job.
      Runnable runnable =
          Runnable.newBuilder()
              .setScript(
                  Script.newBuilder()
                      .setText(
                          "echo Hello world from task ${BATCH_TASK_INDEX}. >> "
                              + "/mnt/share/output_task_${BATCH_TASK_INDEX}.txt")
                      // You can also run a script from a file. Just remember, that needs to be a
                      // script that's already on the VM that will be running the job.
                      // Using setText() and setPath() is mutually exclusive.
                      // .setPath("/tmp/test.sh")
                      .build())
              .build();

      Volume volume = Volume.newBuilder()
          .setGcs(GCS.newBuilder()
              .setRemotePath(bucketName)
              .build())
          .setMountPath("/mnt/share")
          .build();

      // We can specify what resources are requested by each task.
      ComputeResource computeResource =
          ComputeResource.newBuilder()
              // In milliseconds per cpu-second. This means the task requires 50% of a single CPUs.
              .setCpuMilli(500)
              // In MiB.
              .setMemoryMib(16)
              .build();

      TaskSpec task =
          TaskSpec.newBuilder()
              // Jobs can be divided into tasks. In this case, we have only one task.
              .addRunnables(runnable)
              .addVolumes(volume)
              .setComputeResource(computeResource)
              .setMaxRetryCount(2)
              .setMaxRunDuration(Duration.newBuilder().setSeconds(3600).build())
              .build();

      // Tasks are grouped inside a job using TaskGroups.
      // Currently, it's possible to have only one task group.
      TaskGroup taskGroup = TaskGroup.newBuilder().setTaskCount(4).setTaskSpec(task).build();

      // Policies are used to define on what kind of virtual machines the tasks will run on.
      // In this case, we tell the system to use "e2-standard-4" machine type.
      // Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
      InstancePolicy instancePolicy =
          InstancePolicy.newBuilder().setMachineType("e2-standard-4").build();

      AllocationPolicy allocationPolicy =
          AllocationPolicy.newBuilder()
              .addInstances(InstancePolicyOrTemplate.newBuilder().setPolicy(instancePolicy).build())
              .build();

      Job job =
          Job.newBuilder()
              .addTaskGroups(taskGroup)
              .setAllocationPolicy(allocationPolicy)
              .putLabels("env", "testing")
              .putLabels("type", "script")
              .putLabels("mount", "bucket")
              // We use Cloud Logging as it's an out of the box available option.
              .setLogsPolicy(
                  LogsPolicy.newBuilder().setDestination(Destination.CLOUD_LOGGING).build())
              .build();

      CreateJobRequest createJobRequest =
          CreateJobRequest.newBuilder()
              // The job's parent is the region in which the job will run.
              .setParent(String.format("projects/%s/locations/%s", projectId, region))
              .setJob(job)
              .setJobId(jobName)
              .build();

      Job result =
          batchServiceClient
              .createJobCallable()
              .futureCall(createJobRequest)
              .get(5, TimeUnit.MINUTES);

      System.out.printf("Successfully created the job: %s", result.getName());
    }
  }
}

Node.js

Node.js

For more information, see the Batch Node.js API reference documentation.

/**
 * TODO(developer): Uncomment and replace these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
/**
 * The region you want to the job to run in. The regions that support Batch are listed here:
 * https://cloud.google.com/batch/docs/get-started#locations
 */
// const region = 'us-central-1';
/**
 * The name of the job that will be created.
 * It needs to be unique for each project and region pair.
 */
// const jobName = 'YOUR_JOB_NAME';
/**
 * The name of the bucket to be mounted.
 */
// const bucketName = 'YOUR_BUCKET_NAME';

// Imports the Batch library
const batchLib = require('@google-cloud/batch');
const batch = batchLib.protos.google.cloud.batch.v1;

// Instantiates a client
const batchClient = new batchLib.v1.BatchServiceClient();

// Define what will be done as part of the job.
const task = new batch.TaskSpec();
const runnable = new batch.Runnable();
runnable.script = new batch.Runnable.Script();
runnable.script.text =
  'echo Hello world from task ${BATCH_TASK_INDEX}. >> /mnt/share/output_task_${BATCH_TASK_INDEX}.txt';
// You can also run a script from a file. Just remember, that needs to be a script that's
// already on the VM that will be running the job. Using runnable.script.text and runnable.script.path is mutually
// exclusive.
// runnable.script.path = '/tmp/test.sh'
task.runnables = [runnable];

const gcsBucket = new batch.GCS();
gcsBucket.remotePath = bucketName;
const gcsVolume = new batch.Volume();
gcsVolume.gcs = gcsBucket;
gcsVolume.mountPath = '/mnt/share';
task.volumes = [gcsVolume];

// We can specify what resources are requested by each task.
const resources = new batch.ComputeResource();
resources.cpuMilli = 2000; // in milliseconds per cpu-second. This means the task requires 2 whole CPUs.
resources.memoryMib = 16;
task.computeResource = resources;

task.maxRetryCount = 2;
task.maxRunDuration = {seconds: 3600};

// Tasks are grouped inside a job using TaskGroups.
const group = new batch.TaskGroup();
group.taskCount = 4;
group.taskSpec = task;

// Policies are used to define on what kind of virtual machines the tasks will run on.
// In this case, we tell the system to use "e2-standard-4" machine type.
// Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
const allocationPolicy = new batch.AllocationPolicy();
const policy = new batch.AllocationPolicy.InstancePolicy();
policy.machineType = 'e2-standard-4';
const instances = new batch.AllocationPolicy.InstancePolicyOrTemplate();
instances.policy = policy;
allocationPolicy.instances = [instances];

const job = new batch.Job();
job.name = jobName;
job.taskGroups = [group];
job.allocationPolicy = allocationPolicy;
job.labels = {env: 'testing', type: 'script'};
// We use Cloud Logging as it's an option available out of the box
job.logsPolicy = new batch.LogsPolicy();
job.logsPolicy.destination = batch.LogsPolicy.Destination.CLOUD_LOGGING;

// The job's parent is the project and region in which the job will run
const parent = `projects/${projectId}/locations/${region}`;

async function callCreateJob() {
  // Construct request
  const request = {
    parent,
    jobId: jobName,
    job,
  };

  // Run request
  const response = await batchClient.createJob(request);
  console.log(response);
}

callCreateJob();

Python

Python

For more information, see the Batch Python API reference documentation.

from google.cloud import batch_v1


def create_script_job_with_bucket(project_id: str, region: str, job_name: str, bucket_name: str) -> batch_v1.Job:
    """
    This method shows how to create a sample Batch Job that will run
    a simple command on Cloud Compute instances.

    Args:
        project_id: project ID or project number of the Cloud project you want to use.
        region: name of the region you want to use to run the job. Regions that are
            available for Batch are listed on: https://cloud.google.com/batch/docs/get-started#locations
        job_name: the name of the job that will be created.
            It needs to be unique for each project and region pair.
        bucket_name: name of the bucket to be mounted for your Job.

    Returns:
        A job object representing the job created.
    """
    client = batch_v1.BatchServiceClient()

    # Define what will be done as part of the job.
    task = batch_v1.TaskSpec()
    runnable = batch_v1.Runnable()
    runnable.script = batch_v1.Runnable.Script()
    runnable.script.text = "echo Hello world from task ${BATCH_TASK_INDEX}. >> /mnt/share/output_task_${BATCH_TASK_INDEX}.txt"
    task.runnables = [runnable]

    gcs_bucket = batch_v1.GCS()
    gcs_bucket.remote_path = bucket_name
    gcs_volume = batch_v1.Volume()
    gcs_volume.gcs = gcs_bucket
    gcs_volume.mount_path = '/mnt/share'
    task.volumes = [gcs_volume]

    # We can specify what resources are requested by each task.
    resources = batch_v1.ComputeResource()
    resources.cpu_milli = 500  # in milliseconds per cpu-second. This means the task requires 50% of a single CPUs.
    resources.memory_mib = 16
    task.compute_resource = resources

    task.max_retry_count = 2
    task.max_run_duration = "3600s"

    # Tasks are grouped inside a job using TaskGroups.
    # Currently, it's possible to have only one task group.
    group = batch_v1.TaskGroup()
    group.task_count = 4
    group.task_spec = task

    # Policies are used to define on what kind of virtual machines the tasks will run on.
    # In this case, we tell the system to use "e2-standard-4" machine type.
    # Read more about machine types here: https://cloud.google.com/compute/docs/machine-types
    allocation_policy = batch_v1.AllocationPolicy()
    policy = batch_v1.AllocationPolicy.InstancePolicy()
    policy.machine_type = "e2-standard-4"
    instances = batch_v1.AllocationPolicy.InstancePolicyOrTemplate()
    instances.policy = policy
    allocation_policy.instances = [instances]

    job = batch_v1.Job()
    job.task_groups = [group]
    job.allocation_policy = allocation_policy
    job.labels = {"env": "testing", "type": "script", "mount": "bucket"}
    # We use Cloud Logging as it's an out of the box available option
    job.logs_policy = batch_v1.LogsPolicy()
    job.logs_policy.destination = batch_v1.LogsPolicy.Destination.CLOUD_LOGGING

    create_request = batch_v1.CreateJobRequest()
    create_request.job = job
    create_request.job_id = job_name
    # The job's parent is the region in which the job will run
    create_request.parent = f"projects/{project_id}/locations/{region}"

    return client.create_job(create_request)

Use a network file system

You can create a job that uses an existing network file system (NFS), such as a Filestore file share, using the gcloud CLI or Batch API.

Before creating a job that uses a NFS, make sure that your network's firewall is properly configured to allow traffic between your job's VMs and the NFS. For more information, see Configuring firewall rules for Filestore.

This following example describes how to create a job that specifies and mounts a NFS. The job also has 3 tasks that each run a script to create a file in the NFS named output_task_TASK_INDEX.txt where TASK_INDEX is the index of each task: 0, 1, and 2.

gcloud

To create a job that uses a NFS using the gcloud CLI, use the gcloud batch jobs submit command. In the job's JSON configuration file, mount the NFS in the volumes field.

  1. Create a JSON file with the following contents:

    {
        "taskGroups": [
            {
                "taskSpec": {
                    "runnables": [
                        {
                            "script": {
                                "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> MOUNT_PATH/output_task_${BATCH_TASK_INDEX}.txt"
                            }
                        }
                    ],
                    "volumes": [
                        {
                            "nfs": {
                                "server": "NFS_IP_ADDRESS",
                                "remotePath": "NFS_PATH"
                            },
                            "mountPath": "MOUNT_PATH"
                        }
                    ]
                },
                "taskCount": 3
            }
        ],
        "logsPolicy": {
            "destination": "CLOUD_LOGGING"
        }
    }
    

    Replace the following:

    • NFS_IP_ADDRESS: the IP address of the NFS. For example, if your NFS is a Filestore file share, then specify the IP address of the VM hosting the Filestore file share, which you can get by describing the Filestore VM.
    • NFS_PATH: the path of the NFS directory that you want this job to access, which must start with a / followed by the root directory of the NFS. For example, for a Filestore file share named FILE_SHARE_NAME, the path /FILE_SHARE_NAME represents the root directory of the file share and the path /FILE_SHARE_NAME/subdirectory represents the subdirectory subdirectory.
    • MOUNT_PATH: the mount path that the job's runnables use to access this NFS. The path must start with /mnt/disks/ followed by a directory or path that you choose. For example, if you want to represent this NFS with a directory named my-nfs, set the mount path to /mnt/disks/my-nfs.
  2. Run the following command:

    gcloud batch jobs submit JOB_NAME \
      --location LOCATION \
      --config JSON_CONFIGURATION_FILE
    

    Replace the following:

    • JOB_NAME: the name of the job.
    • LOCATION: the location of the job.
    • JSON_CONFIGURATION_FILE: the path for a JSON file with the job's configuration details.

API

To create a job that uses a NFS using the Batch API, use the jobs.create method and mount the NFS in the volumes field.

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME

   {
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "echo Hello world from task ${BATCH_TASK_INDEX}. >> MOUNT_PATH/output_task_${BATCH_TASK_INDEX}.txt"
                        }
                    }
                ],
                "volumes": [
                    {
                        "nfs": {
                            "server": "NFS_IP_ADDRESS",
                            "remotePath": "NFS_PATH"
                        },
                        "mountPath": "MOUNT_PATH"
                    }
                ]
            },
            "taskCount": 3
        }
    ],
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

Replace the following:

  • PROJECT_ID: the project ID of your project.
  • LOCATION: the location of the job.
  • JOB_NAME: the name of the job.
  • NFS_IP_ADDRESS: the IP address of the Network File System. For example, if your NFS is a Filestore file share, then specify the IP address of the VM hosting the Filestore file share, which you can get by describing the Filestore VM.
  • NFS_PATH: the path of the NFS directory that you want this job to access, which must start with a / followed by the root directory of the NFS. For example, for a Filestore file share named FILE_SHARE_NAME, the path /FILE_SHARE_NAME represents the root directory of the file share and the path /FILE_SHARE_NAME/subdirectory represents a subdirectory.
  • MOUNT_PATH: the mount path that the job's runnables use to access this NFS. The path must start with /mnt/disks/ followed by a directory or path that you choose. For example, if you want to represent this NFS with a directory named my-nfs, set the mount path to /mnt/disks/my-nfs.

What's next