カスタムトレーニングのコンテナ設定を構成する

カスタムトレーニングを実施するときは、Vertex AI で実行する ML コードを指定する必要があります。これを行うには、カスタムコンテナまたはビルド済みコンテナで動作する Python トレーニングアプリケーション用にトレーニングコンテナ設定を構成します。

カスタムコンテナとビルド済みコンテナのどちらを使用するかを決定するには、トレーニングコードの要件をご覧ください。

このドキュメントでは、そのどちらのケースでも指定する必要がある Vertex AI API のフィールドについて説明します。

コンテナ設定を指定する場所

構成の詳細は WorkerPoolSpec 内で指定します。カスタムトレーニングの実行方法に応じて、この WorkerPoolSpec を次のいずれかの API フィールドに配置します。

CustomJob リソースを作成する場合は、CustomJob.jobSpec.workerPoolSpecs に WorkerPoolSpec を指定します。

Google Cloud CLI を使用している場合は、gcloud ai custom-jobs create コマンドで --worker-pool-spec フラグまたは --config フラグを使用して、ワーカープールオプションを指定します。

詳細については、CustomJob の作成をご覧ください。
HyperparameterTuningJob リソースを作成する場合は、HyperparameterTuningJob.trialJobSpec.workerPoolSpecs に WorkerPoolSpec を指定します。

gcloud CLI を使用している場合は、gcloud ai hpt-tuning-jobs create コマンドで --config フラグを使用して、ワーカープールオプションを指定します。

詳細については、HyperparameterTuningJob の作成をご覧ください。
ハイパーパラメータ調整を行わない TrainingPipeline リソースを作成する場合は、TrainingPipeline.trainingTaskInputs.workerPoolSpecs に WorkerPoolSpec を指定します。

詳細については、カスタムの TrainingPipeline の作成をご覧ください。
ハイパーパラメータ調整を行う TrainingPipeline を作成する場合は、TrainingPipeline.trainingTaskInputs.trialJobSpec.workerPoolSpecs に WorkerPoolSpec を指定します。

分散トレーニングを行う場合、ワーカープールごとに異なる設定を使用できます。

コンテナ設定を構成する

ビルド済みコンテナとカスタムコンテナのどちらを使用するかによって、指定する必要のある WorkerPoolSpec 内のフィールドが異なります。ご自身のシナリオに対応するタブを選択してください。

ビルド済みコンテナ

トレーニングに使用する ML フレームワークをサポートするビルド済みのコンテナを選択します。コンテナイメージの URI を pythonPackageSpec.executorImageUri フィールドに指定します。
pythonPackageSpec.packageUris フィールドに、Python トレーニングアプリケーションの Cloud Storage URI を指定します。
pythonPackageSpec.pythonModule フィールドに、トレーニングアプリケーションのエントリポイントモジュールを指定します。
必要に応じて、pythonPackageSpec.args フィールドに、トレーニングアプリケーションのエントリポイントモジュールに渡すコマンドライン引数のリストを指定します。

次の例は、CustomJob の作成時に、これらのコンテナ設定を指定する場所をハイライト表示しています。

コンソール

Google Cloud コンソールでは、CustomJob を直接作成できません。ただし、CustomJob を作成する TrainingPipeline を作成することは可能です。 Google Cloud コンソールで TrainingPipeline を作成するときに、[トレーニングコンテナ] ステップで、特定のフィールドにビルド済みコンテナ設定を指定できます。

pythonPackageSpec.executorImageUri: [モデルフレームワーク] と [モデルフレームワークのバージョン] のプルダウンリストを使用します。
pythonPackageSpec.packageUris: [Package location] フィールドを使用します。
pythonPackageSpec.pythonModule: [Python モジュール] フィールドを使用します。
pythonPackageSpec.args: [引数] フィールドを使用します。

gcloud

gcloud ai custom-jobs create \
  --region=LOCATION \
  --display-name=JOB_NAME \
  --python-package-uris=PYTHON_PACKAGE_URIS \
  --worker-pool-spec=machine-type=MACHINE_TYPE,replica-count=REPLICA_COUNT,executor-image-uri=PYTHON_PACKAGE_EXECUTOR_IMAGE_URI,python-module=PYTHON_MODULE

詳細については、CustomJob の作成ガイドをご覧ください。

カスタムコンテナ

containerSpec.imageUri フィールドに、カスタムコンテナの Artifact Registry または Docker Hub URI を指定します。
必要に応じて、コンテナ内の ENTRYPOINT または CMD の命令をオーバーライドする場合は、containerSpec.command フィールドまたは containerSpec.args フィールドを指定してください。これらのフィールドは、次のルールに対するコンテナの動作に影響します。
- どちらのフィールドも指定しない場合: コンテナは、ENTRYPOINT 命令と CMD 命令（存在する場合）に従って動作します。CMD と ENTRYPOINT の相互作用については、Docker のドキュメントを参照してください。
- containerSpec.command のみを指定する場合: コンテナは、ENTRYPOINT 命令を containerSpec.command の値に置き換えて動作します。コンテナに CMD 命令がある場合は、無視されます。
- containerSpec.args のみを指定する場合: コンテナは ENTRYPOINT 命令に従い、CMD 命令を containerSpec.args の値に置き換えて動作します。
- 両方のフィールドを指定する場合: コンテナは、ENTRYPOINT 命令を containerSpec.command に、CMD 命令を containerSpec.args にそれぞれ置き換えて動作します。

次の例は、CustomJob の作成時に、これらのコンテナ設定の一部を指定する場所をハイライト表示しています。

コンソール

Google Cloud コンソールでは、CustomJob を直接作成できません。ただし、CustomJob を作成する TrainingPipeline を作成することは可能です。 Google Cloud コンソールで TrainingPipeline を作成するときに、[トレーニングコンテナ] ステップで、特定のフィールドにカスタムコンテナ設定を指定できます。

containerSpec.imageUri: [コンテナイメージ] フィールドを使用します。
containerSpec.command: この API フィールドはGoogle Cloud コンソールでは構成できません。
containerSpec.args: [引数] フィールドを使用します。

gcloud

gcloud ai custom-jobs create \
  --region=LOCATION \
  --display-name=JOB_NAME \
  --worker-pool-spec=machine-type=MACHINE_TYPE,replica-count=REPLICA_COUNT,container-image-uri=CUSTOM_CONTAINER_IMAGE_URI

Java

このサンプルを試す前に、Vertex AI クイックスタート: クライアントライブラリの使用にある Java の設定手順を完了してください。詳細については、Vertex AI Java API のリファレンスドキュメントをご覧ください。

Vertex AI に対する認証を行うには、アプリケーションのデフォルト認証情報を設定します。詳細については、ローカル開発環境の認証を設定するをご覧ください。


import com.google.cloud.aiplatform.v1.AcceleratorType;
import com.google.cloud.aiplatform.v1.ContainerSpec;
import com.google.cloud.aiplatform.v1.CustomJob;
import com.google.cloud.aiplatform.v1.CustomJobSpec;
import com.google.cloud.aiplatform.v1.JobServiceClient;
import com.google.cloud.aiplatform.v1.JobServiceSettings;
import com.google.cloud.aiplatform.v1.LocationName;
import com.google.cloud.aiplatform.v1.MachineSpec;
import com.google.cloud.aiplatform.v1.WorkerPoolSpec;
import java.io.IOException;

// Create a custom job to run machine learning training code in Vertex AI
public class CreateCustomJobSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "PROJECT";
    String displayName = "DISPLAY_NAME";

    // Vertex AI runs your training application in a Docker container image. A Docker container
    // image is a self-contained software package that includes code and all dependencies. Learn
    // more about preparing your training application at
    // https://cloud.google.com/vertex-ai/docs/training/overview#prepare_your_training_application
    String containerImageUri = "CONTAINER_IMAGE_URI";
    createCustomJobSample(project, displayName, containerImageUri);
  }

  static void createCustomJobSample(String project, String displayName, String containerImageUri)
      throws IOException {
    JobServiceSettings settings =
        JobServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();
    String location = "us-central1";

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (JobServiceClient client = JobServiceClient.create(settings)) {
      MachineSpec machineSpec =
          MachineSpec.newBuilder()
              .setMachineType("n1-standard-4")
              .setAcceleratorType(AcceleratorType.NVIDIA_TESLA_T4)
              .setAcceleratorCount(1)
              .build();

      ContainerSpec containerSpec =
          ContainerSpec.newBuilder().setImageUri(containerImageUri).build();

      WorkerPoolSpec workerPoolSpec =
          WorkerPoolSpec.newBuilder()
              .setMachineSpec(machineSpec)
              .setReplicaCount(1)
              .setContainerSpec(containerSpec)
              .build();

      CustomJobSpec customJobSpecJobSpec =
          CustomJobSpec.newBuilder().addWorkerPoolSpecs(workerPoolSpec).build();

      CustomJob customJob =
          CustomJob.newBuilder()
              .setDisplayName(displayName)
              .setJobSpec(customJobSpecJobSpec)
              .build();
      LocationName parent = LocationName.of(project, location);
      CustomJob response = client.createCustomJob(parent, customJob);
      System.out.format("response: %s\n", response);
      System.out.format("Name: %s\n", response.getName());
    }
  }
}

Node.js

このサンプルを試す前に、Vertex AI クイックスタート: クライアントライブラリの使用にある Node.js の設定手順を完了してください。詳細については、Vertex AI Node.js API のリファレンスドキュメントをご覧ください。

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const customJobDisplayName = 'YOUR_CUSTOM_JOB_DISPLAY_NAME';
// const containerImageUri = 'YOUR_CONTAINER_IMAGE_URI';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

// Imports the Google Cloud Job Service Client library
const {JobServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const jobServiceClient = new JobServiceClient(clientOptions);

async function createCustomJob() {
  // Configure the parent resource
  const parent = `projects/${project}/locations/${location}`;
  const customJob = {
    displayName: customJobDisplayName,
    jobSpec: {
      workerPoolSpecs: [
        {
          machineSpec: {
            machineType: 'n1-standard-4',
            acceleratorType: 'NVIDIA_TESLA_T4',
            acceleratorCount: 1,
          },
          replicaCount: 1,
          containerSpec: {
            imageUri: containerImageUri,
            command: [],
            args: [],
          },
        },
      ],
    },
  };
  const request = {parent, customJob};

  // Create custom job request
  const [response] = await jobServiceClient.createCustomJob(request);

  console.log('Create custom job response:\n', JSON.stringify(response));
}
createCustomJob();

Python

Vertex AI SDK for Python のインストールまたは更新の方法については、Vertex AI SDK for Python をインストールするをご覧ください。詳細については、Python API リファレンスドキュメントをご覧ください。

from google.cloud import aiplatform


def create_custom_job_sample(
    project: str,
    display_name: str,
    container_image_uri: str,
    location: str = "us-central1",
    api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
    # The AI Platform services require regional API endpoints.
    client_options = {"api_endpoint": api_endpoint}
    # Initialize client that will be used to create and send requests.
    # This client only needs to be created once, and can be reused for multiple requests.
    client = aiplatform.gapic.JobServiceClient(client_options=client_options)
    custom_job = {
        "display_name": display_name,
        "job_spec": {
            "worker_pool_specs": [
                {
                    "machine_spec": {
                        "machine_type": "n1-standard-4",
                        "accelerator_type": aiplatform.gapic.AcceleratorType.NVIDIA_TESLA_K80,
                        "accelerator_count": 1,
                    },
                    "replica_count": 1,
                    "container_spec": {
                        "image_uri": container_image_uri,
                        "command": [],
                        "args": [],
                    },
                }
            ]
        },
    }
    parent = f"projects/{project}/locations/{location}"
    response = client.create_custom_job(parent=parent, custom_job=custom_job)
    print("response:", response)

詳細については、CustomJob の作成ガイドをご覧ください。

次のステップ

CustomJob を作成するして、カスタムトレーニングの実行方法を学習する。

カスタム トレーニングのコンテナ設定を構成する

コンテナ設定を指定する場所

コンテナ設定を構成する

ビルド済みコンテナ

コンソール

gcloud

カスタム コンテナ

コンソール

gcloud

Java

Node.js

Python

次のステップ

カスタムトレーニングのコンテナ設定を構成する

カスタムコンテナ