Configuring container settings for custom training

When you perform custom training, you must specify what machine learning (ML) code you want AI Platform (Unified) to run. To do this, configure training container settings for either a custom container or a Python training application that runs on a pre-built container.

To determine whether you want to use a custom container or a pre-built container, read Training code requirements.

This document describes the fields of the AI Platform (Unified) API that you must specify in either of the preceding cases.

Where to specify container settings

Specify configuration details within a WorkerPoolSpec. Depending on how you perform custom training, put this WorkerPoolSpec in one of the following API fields:

If you are performing distributed training, you can use different settings for each worker pool.

Configuring container settings

Depending on whether you are using a pre-built container or a custom container, you must specify different fields within the WorkerPoolSpec. Select the tab for your scenario:

Pre-built container

  1. Select a pre-built container that supports the ML framework you plan to use for training. Specify one of the container image's URIs in the pythonPackageSpec.executorImageUri field.

  2. Specify the Cloud Storage URIs of your Python training application in the pythonPackageSpec.packageUris field.

  3. Specify your training application's entry point module in the pythonPackageSpec.pythonModule field.

  4. Optionally, specify a list of command-line arguments to pass to your training application's entry point module in the pythonPackageSpec.args field.

The following examples highlight where you specify these container settings when you create a CustomJob:

Console

In the Google Cloud Console, you cannot create a CustomJob directly. However, you can create a TrainingPipeline that creates a CustomJob. When you create a TrainingPipeline in the Cloud Console, you can specify pre-built container settings in certain fields on the Training container step:

  • pythonPackageSpec.executorImageUri: Use the Model framework and Model framework version drop-down lists.

  • pythonPackageSpec.packageUris: Use the Package location field.

  • pythonPackageSpec.pythonModule: Use the Python module field.

  • pythonPackageSpec.args: Use the Arguments field.

gcloud

gcloud beta ai custom-jobs create \
  --region=LOCATION \
  --display-name=JOB_NAME \
  --python-package-uris=PYTHON_PACKAGE_URIS \
  --worker-pool-spec=machine-type=MACHINE_TYPE,replica-count=REPLICA_COUNT,executor-image-uri=PYTHON_PACKAGE_EXECUTOR_IMAGE_URI,python-module=PYTHON_MODULE

For more context, read the guide to creating a CustomJob.

Custom container

  1. Specify the Artifact Registry, Container Registry, or Docker Hub URI of your custom container in the containerSpec.imageUri field.

  2. Optionally, if you want to override the ENTRYPOINT or CMD instructions in your container, specify the containerSpec.command or containerSpec.args fields. These fields affect how your container runs according to the following rules:

    • If you specify neither field: Your container runs according to its ENTRYPOINT instruction and CMD instruction (if it exists). Refer to the Docker documentation about how CMD and ENTRYPOINT interact.

    • If you specify only containerSpec.command: Your container runs with the value of containerSpec.command replacing its ENTRYPOINT instruction. If the container has a CMD instruction, it is ignored.

    • If you specify only containerSpec.args: Your container runs according to its ENTRYPOINT instruction, with the value of containerSpec.args replacing its CMD instruction.

    • If you specify both fields: Your container runs with containerSpec.command replacing its ENTRYPOINT instruction and containerSpec.args replacing its CMD instruction.

The following example highlights where you can specify some of these container settings when you create a CustomJob:

Console

In the Google Cloud Console, you cannot create a CustomJob directly. However, you can create a TrainingPipeline that creates a CustomJob. When you create a TrainingPipeline in the Cloud Console, you can specify custom container settings in certain fields on the Training container step:

  • containerSpec.imageUri: Use the Container image field.

  • containerSpec.command: This API field is not configurable in the Cloud Console.

  • containerSpec.args: Use the Arguments field.

gcloud

gcloud beta ai custom-jobs create \
  --region=LOCATION \
  --display-name=JOB_NAME \
  --worker-pool-spec=machine-type=MACHINE_TYPE,replica-count=REPLICA_COUNT,container-image-uri=CUSTOM_CONTAINER_IMAGE_URI

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const customJobDisplayName = 'YOUR_CUSTOM_JOB_DISPLAY_NAME';
// const containerImageUri = 'YOUR_CONTAINER_IMAGE_URI';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

// Imports the Google Cloud Job Service Client library
const {JobServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const jobServiceClient = new JobServiceClient(clientOptions);

async function createCustomJob() {
  // Configure the parent resource
  const parent = `projects/${project}/locations/${location}`;
  const customJob = {
    displayName: customJobDisplayName,
    jobSpec: {
      workerPoolSpecs: [
        {
          machineSpec: {
            machineType: 'n1-standard-4',
            acceleratorType: 'NVIDIA_TESLA_K80',
            acceleratorCount: 1,
          },
          replicaCount: 1,
          containerSpec: {
            imageUri: containerImageUri,
            command: [],
            args: [],
          },
        },
      ],
    },
  };
  const request = {parent, customJob};

  // Create custom job request
  const [response] = await jobServiceClient.createCustomJob(request);

  console.log('Create custom job response');
  console.log(`${JSON.stringify(response)}`);
}
createCustomJob();

Python

from google.cloud import aiplatform


def create_custom_job_sample(
    project: str,
    display_name: str,
    container_image_uri: str,
    location: str = "us-central1",
    api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
    # The AI Platform services require regional API endpoints.
    client_options = {"api_endpoint": api_endpoint}
    # Initialize client that will be used to create and send requests.
    # This client only needs to be created once, and can be reused for multiple requests.
    client = aiplatform.gapic.JobServiceClient(client_options=client_options)
    custom_job = {
        "display_name": display_name,
        "job_spec": {
            "worker_pool_specs": [
                {
                    "machine_spec": {
                        "machine_type": "n1-standard-4",
                        "accelerator_type": aiplatform.gapic.AcceleratorType.NVIDIA_TESLA_K80,
                        "accelerator_count": 1,
                    },
                    "replica_count": 1,
                    "container_spec": {
                        "image_uri": container_image_uri,
                        "command": [],
                        "args": [],
                    },
                }
            ]
        },
    }
    parent = f"projects/{project}/locations/{location}"
    response = client.create_custom_job(parent=parent, custom_job=custom_job)
    print("response:", response)

For more context, read the guide to creating a CustomJob.

What's next