Represents the spec of a CustomJob.
JSON representation |
---|
{ "persistentResourceId": string, "workerPoolSpecs": [ { object ( |
Fields | |
---|---|
persistentResourceId |
Optional. The ID of the PersistentResource in the same Project and Location which to run If this is specified, the job will be run on existing machines held by the PersistentResource instead of on-demand short-live machines. The network and CMEK configs on the job should be consistent with those on the PersistentResource, otherwise, the job will be rejected. |
workerPoolSpecs[] |
Required. The spec of the worker pools including machine type and Docker image. All worker pools except the first one are optional and can be skipped by providing an empty value. |
scheduling |
Scheduling options for a CustomJob. |
serviceAccount |
Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account. If unspecified, the Vertex AI Custom code service Agent for the CustomJob's project is used. |
network |
Optional. The full name of the Compute Engine network to which the Job should be peered. For example, To specify this field, you must have already configured VPC Network Peering for Vertex AI. If this field is left unspecified, the job is not peered with any network. |
reservedIpRanges[] |
Optional. A list of names for the reserved ip ranges under the VPC network that can be used for this job. If set, we will deploy the job within the provided ip ranges. Otherwise, the job will be deployed to any ip ranges under the provided VPC network. Example: ['vertex-ai-ip-range']. |
baseOutputDirectory |
The Cloud Storage location to store the output of this CustomJob or HyperparameterTuningJob. For HyperparameterTuningJob, the baseOutputDirectory of each child CustomJob backing a Trial is set to a subdirectory of name The following Vertex AI environment variables will be passed to containers or python modules when this field is set: For CustomJob:
For CustomJob backing a Trial of HyperparameterTuningJob:
|
protectedArtifactLocationId |
The ID of the location to store protected artifacts. e.g. us-central1. Populate only when the location is different than CustomJob location. List of supported locations: https://cloud.google.com/vertex-ai/docs/general/locations |
tensorboard |
Optional. The name of a Vertex AI |
enableWebAccess |
Optional. Whether you want Vertex AI to enable interactive shell access to training containers. If set to |
enableDashboardAccess |
Optional. Whether you want Vertex AI to enable access to the customized dashboard in training chief container. If set to |
experiment |
Optional. The Experiment associated with this job. Format: |
experimentRun |
Optional. The Experiment Run associated with this job. Format: |
models[] |
Optional. The name of the Model resources for which to generate a mapping to artifact URIs. Applicable only to some of the Google-provided custom jobs. Format: In order to retrieve a specific version of the model, also provide the version ID or version alias. Example: |
WorkerPoolSpec
Represents the spec of a worker pool in a job.
JSON representation |
---|
{ "machineSpec": { object ( |
Fields | |
---|---|
machineSpec |
Optional. Immutable. The specification of a single machine. |
replicaCount |
Optional. The number of worker replicas to use for this worker pool. |
nfsMounts[] |
Optional. List of NFS mount spec. |
diskSpec |
Disk spec. |
Union field task . The custom task to be executed in this worker pool. task can be only one of the following: |
|
containerSpec |
The custom container task. |
pythonPackageSpec |
The Python packaged task. |
ContainerSpec
The spec of a Container.
JSON representation |
---|
{
"imageUri": string,
"command": [
string
],
"args": [
string
],
"env": [
{
object ( |
Fields | |
---|---|
imageUri |
Required. The URI of a container image in the Container Registry that is to be run on each worker replica. |
command[] |
The command to be invoked when the container is started. It overrides the entrypoint instruction in Dockerfile when provided. |
args[] |
The arguments to be passed when starting the container. |
env[] |
Environment variables to be passed to the container. Maximum limit is 100. |
PythonPackageSpec
The spec of a Python packaged code.
JSON representation |
---|
{
"executorImageUri": string,
"packageUris": [
string
],
"pythonModule": string,
"args": [
string
],
"env": [
{
object ( |
Fields | |
---|---|
executorImageUri |
Required. The URI of a container image in Artifact Registry that will run the provided Python package. Vertex AI provides a wide range of executor images with pre-installed packages to meet users' various use cases. See the list of pre-built containers for training. You must use an image from this list. |
packageUris[] |
Required. The Google Cloud Storage location of the Python package files which are the training program and its dependent packages. The maximum number of package URIs is 100. |
pythonModule |
Required. The Python module name to run after installing the packages. |
args[] |
Command line arguments to be passed to the Python task. |
env[] |
Environment variables to be passed to the python module. Maximum limit is 100. |
NfsMount
Represents a mount configuration for Network File System (NFS) to mount.
JSON representation |
---|
{ "server": string, "path": string, "mountPoint": string } |
Fields | |
---|---|
server |
Required. IP address of the NFS server. |
path |
Required. Source path exported from NFS server. Has to start with '/', and combined with the ip address, it indicates the source mount path in the form of |
mountPoint |
Required. Destination mount path. The NFS will be mounted for the user under /mnt/nfs/ |
Scheduling
All parameters related to queuing and scheduling of custom jobs.
JSON representation |
---|
{ "timeout": string, "restartJobOnWorkerRestart": boolean, "disableRetries": boolean } |
Fields | |
---|---|
timeout |
The maximum job running time. The default is 7 days. A duration in seconds with up to nine fractional digits, ending with ' |
restartJobOnWorkerRestart |
Restarts the entire CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job. |
disableRetries |
Optional. Indicates if the job should retry for internal errors after the job starts running. If true, overrides |