CustomJobSpec

Represents the spec of a CustomJob.

JSON representation
{
  "workerPoolSpecs": [
    {
      object (WorkerPoolSpec)
    }
  ],
  "scheduling": {
    object (Scheduling)
  },
  "serviceAccount": string,
  "network": string,
  "baseOutputDirectory": {
    object (GcsDestination)
  },
  "tensorboard": string
}
Fields
workerPoolSpecs[]

object (WorkerPoolSpec)

Required. The spec of the worker pools including machine type and Docker image. All worker pools except the first one are optional and can be skipped by providing an empty value.

scheduling

object (Scheduling)

Scheduling options for a CustomJob.

serviceAccount

string

Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account. If unspecified, the AI Platform Custom Code Service Agent for the CustomJob's project is used.

network

string

The full name of the Compute Engine network to which the Job should be peered. For example, projects/12345/global/networks/myVPC. Format is of the form projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name.

Private services access must already be configured for the network. If left unspecified, the job is not peered with any network.

baseOutputDirectory

object (GcsDestination)

The Cloud Storage location to store the output of this CustomJob or HyperparameterTuningJob. For HyperparameterTuningJob, the baseOutputDirectory of each child CustomJob backing a Trial is set to a subdirectory of name id under its parent HyperparameterTuningJob's baseOutputDirectory.

The following Vertex AI environment variables will be passed to containers or python modules when this field is set:

For CustomJob:

  • AIP_MODEL_DIR = <baseOutputDirectory>/model/
  • AIP_CHECKPOINT_DIR = <baseOutputDirectory>/checkpoints/
  • AIP_TENSORBOARD_LOG_DIR = <baseOutputDirectory>/logs/

For CustomJob backing a Trial of HyperparameterTuningJob:

  • AIP_MODEL_DIR = <baseOutputDirectory>/<trial_id>/model/
  • AIP_CHECKPOINT_DIR = <baseOutputDirectory>/<trial_id>/checkpoints/
  • AIP_TENSORBOARD_LOG_DIR = <baseOutputDirectory>/<trial_id>/logs/
tensorboard

string

Optional. The name of a Vertex AI Tensorboard resource to which this CustomJob will upload Tensorboard logs. Format: projects/{project}/locations/{location}/tensorboards/{tensorboard}

WorkerPoolSpec

Represents the spec of a worker pool in a job.

JSON representation
{
  "machineSpec": {
    object (MachineSpec)
  },
  "replicaCount": string,
  "diskSpec": {
    object (DiskSpec)
  },

  // Union field task can be only one of the following:
  "containerSpec": {
    object (ContainerSpec)
  },
  "pythonPackageSpec": {
    object (PythonPackageSpec)
  }
  // End of list of possible types for union field task.
}
Fields
machineSpec

object (MachineSpec)

Optional. Immutable. The specification of a single machine.

replicaCount

string (int64 format)

Optional. The number of worker replicas to use for this worker pool.

diskSpec

object (DiskSpec)

Disk spec.

Union field task. The custom task to be executed in this worker pool. task can be only one of the following:
containerSpec

object (ContainerSpec)

The custom container task.

pythonPackageSpec

object (PythonPackageSpec)

The Python packaged task.

ContainerSpec

The spec of a Container.

JSON representation
{
  "imageUri": string,
  "command": [
    string
  ],
  "args": [
    string
  ],
  "env": [
    {
      object (EnvVar)
    }
  ]
}
Fields
imageUri

string

Required. The URI of a container image in the Container Registry that is to be run on each worker replica.

command[]

string

The command to be invoked when the container is started. It overrides the entrypoint instruction in Dockerfile when provided.

args[]

string

The arguments to be passed when starting the container.

env[]

object (EnvVar)

Environment variables to be passed to the container.

PythonPackageSpec

The spec of a Python packaged code.

JSON representation
{
  "executorImageUri": string,
  "packageUris": [
    string
  ],
  "pythonModule": string,
  "args": [
    string
  ],
  "env": [
    {
      object (EnvVar)
    }
  ]
}
Fields
executorImageUri

string

Required. The URI of a container image in Artifact Registry that will run the provided Python package. Vertex AI provides a wide range of executor images with pre-installed packages to meet users' various use cases. See the list of pre-built containers for training. You must use an image from this list.

packageUris[]

string

Required. The Google Cloud Storage location of the Python package files which are the training program and its dependent packages. The maximum number of package URIs is 100.

pythonModule

string

Required. The Python module name to run after installing the packages.

args[]

string

Command line arguments to be passed to the Python task.

env[]

object (EnvVar)

Environment variables to be passed to the python module.

DiskSpec

Represents the spec of disk options.

JSON representation
{
  "bootDiskType": string,
  "bootDiskSizeGb": integer
}
Fields
bootDiskType

string

Type of the boot disk (default is "pd-ssd"). Valid values: "pd-ssd" (Persistent Disk Solid State Drive) or "pd-standard" (Persistent Disk Hard Disk Drive).

bootDiskSizeGb

integer

Size in GB of the boot disk (default is 100GB).

Scheduling

All parameters related to queuing and scheduling of custom jobs.

JSON representation
{
  "timeout": string,
  "restartJobOnWorkerRestart": boolean
}
Fields
timeout

string (Duration format)

The maximum job running time. The default is 7 days.

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

restartJobOnWorkerRestart

boolean

Restarts the entire CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.