# Specifies base image and tag
FROMimage:tag
WORKDIR/root
# Installs additional packages
RUNpipinstallpkg1pkg2pkg3
# Downloads training data
RUNcurlhttps://example-url/path-to-data/data-filename--output/root/data-filename
# Copies the trainer code to the docker image.
COPYyour-path-to/model.py/root/model.py
COPYyour-path-to/task.py/root/task.py
# Sets up the entry point to invoke the trainer.
ENTRYPOINT["python","task.py"]
(選用) 調整 TPU VM 的 Dockerfile
如要使用 TPU VM 在 Vertex AI 上訓練模型,必須調整 Dockerfile,安裝特別建構的 tensorflow 和 libtpu 程式庫版本。進一步瞭解如何調整容器,以便搭配 TPU VM 使用。
如果您使用的 Artifact Registry 映像檔所屬的專案和您要用來執行 Vertex AI 的專案是同一個,就不需要另外設定權限。 Google Cloud 您可以立即建立自訂訓練工作,使用容器映像檔。
不過,如果您已將容器映像檔推送至其他 Google Cloud 專案的 Artifact Registry,而您打算在 Vertex AI 專案中使用該映像檔,則必須授予 Vertex AI 專案的 Vertex AI 服務代理權限,才能從其他專案提取映像檔。進一步瞭解 Vertex AI 服務代理程式,以及如何授予權限。
Artifact Registry
如要瞭解如何授予 Vertex AI 服務代理存取 Artifact Registry 存放區的權限,請參閱 Artifact Registry 說明文件,瞭解如何授予存放區專屬權限。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[],[],null,["# Create a custom container image for training\n\nUsing a custom container image provides the most flexibility for\ntraining on Vertex AI. To learn how using a custom container image\ndiffers from [using a Python training application with a prebuilt\ncontainer](/vertex-ai/docs/training/create-python-pre-built-container), read [Training\ncode requirements](/vertex-ai/docs/training/code-requirements).\n\nThe guide walks through the following steps:\n\n1. Creating a custom container:\n 1. Writing a Dockerfile that sets up your container to work with Vertex AI and includes dependencies needed for your training application.\n 2. Building and running your Docker container locally.\n2. Pushing the container image to Artifact Registry.\n\n| **Note:** Vertex AI supports custom training with container images on Artifact Registry or Docker Hub. This guide focuses on using Artifact Registry with Vertex AI.\n\nBefore you begin\n----------------\n\nTo configure an Artifact Registry API repository and set up Docker in your development\nenvironment, follow [Artifact Registry's Quickstart for\nDocker](/artifact-registry/docs/docker/quickstart#before-you-begin).\nSpecifically, make sure to complete the following steps of the quickstart:\n\n- Before you begin\n- Choose a shell\n- Create a Docker repository\n- Configure authentication\n\nCreate a custom container image\n-------------------------------\n\nWe recommend two possible workflows for creating a custom container image:\n\n- Write your training code. Then, [use the\n gcloud CLI's `local-run` command to build and test a custom\n container image](/vertex-ai/docs/training/containerize-run-code-local) based on your\n training code without writing a Dockerfile yourself.\n\n This workflow can be more straightforward if you are not familiar with Docker.\n If you follow this workflow, you can skip the rest of this section.\n- Write your training code. Then, write a Dockerfile and build a container image\n based on it. Finally, test the container locally.\n\n This workflow offers more flexibility, because you can customize your\n container image as much as you want.\n\nThe rest of this section walks through an example of the latter workflow.\n\n### Training code\n\nYou can write training code using any dependencies in any programming language.\nMake sure your code meets the [training code\nrequirements](/vertex-ai/docs/training/code-requirements). If you plan to use\nhyperparameter tuning, GPUs, or distributed training, make sure to read the\ncorresponding sections of that document; these sections describe specific\nconsiderations for using the features with custom containers.\n\n### Create a Dockerfile\n\nCreate a [Dockerfile](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/) to specify all the\ninstructions needed to build your container image.\n\nThis section walks through creating a generic example of a Dockerfile to use for\ncustom training. To learn more about creating a container image, read the\n[Docker documentation's quickstart](https://docs.docker.com/get-started/).\n\nFor use with Vertex AI, your Dockerfile needs to include commands\nthat cover the following tasks:\n\n- Choose a base image\n- Install additional dependencies\n- Copy your training code to the image\n- Configure the entrypoint for Vertex AI to invoke your training code\n\nYour Dockerfile can include additional logic, depending on your needs. For more\ninformation about each specific instruction, see the\n[Dockerfile reference](https://docs.docker.com/engine/reference/builder/).\n\nThe logic in your Dockerfile may vary according to your needs, but in general\nit resembles this: \n\n```bash\n# Specifies base image and tag\nFROM image:tag\nWORKDIR /root\n\n# Installs additional packages\nRUN pip install pkg1 pkg2 pkg3\n\n# Downloads training data\nRUN curl https://example-url/path-to-data/data-filename --output /root/data-filename\n\n# Copies the trainer code to the docker image.\nCOPY your-path-to/model.py /root/model.py\nCOPY your-path-to/task.py /root/task.py\n\n# Sets up the entry point to invoke the trainer.\nENTRYPOINT [\"python\", \"task.py\"]\n```\n\n#### (Optional) Adjust your Dockerfile for TPU VMs\n\nIf you want to train on Vertex AI using a TPU VM, then you must\nadjust your Dockerfile to install specially built versions of the `tensorflow`\nand `libtpu` libraries. Learn more about [adjusting your container for use with\na TPU VM](/vertex-ai/docs/training/configure-compute#tpu-requirements).\n\n### Build the container image\n\nCreate the correct image URI by using environment variables, and then build\nthe Docker image: \n\n export PROJECT_ID=$(gcloud config list project --format \"value(core.project)\")\n export REPO_NAME=\u003cvar translate=\"no\"\u003eREPOSITORY_NAME\u003c/var\u003e\n export IMAGE_NAME=\u003cvar translate=\"no\"\u003eIMAGE_NAME\u003c/var\u003e\n export IMAGE_TAG=\u003cvar translate=\"no\"\u003eIMAGE_TAG\u003c/var\u003e\n export IMAGE_URI=us-central1-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/${IMAGE_NAME}:${IMAGE_TAG}\n\n docker build -f Dockerfile -t ${IMAGE_URI} ./\n\nIn these commands replace the following:\n\n- \u003cvar translate=\"no\"\u003eREPOSITORY_NAME\u003c/var\u003e: the name of the Artifact Registry repository that you created in the [Before you begin](#before-you-begin) section.\n- \u003cvar translate=\"no\"\u003eIMAGE_NAME\u003c/var\u003e: a name of your choice for your container image.\n- \u003cvar translate=\"no\"\u003eIMAGE_TAG\u003c/var\u003e: a tag of your choice for this version of your container image.\n\nLearn more about [Artifact Registry's requirements for naming your container\nimage](/artifact-registry/docs/docker/pushing-and-pulling#tag).\n\n### Run the container locally (optional)\n\nVerify the container image by running it as a container locally. You likely want\nto run your training code on a smaller dataset or for a shorter number of\niterations than you plan to run on Vertex AI. For example, if the\nentrypoint script in your container image accepts an `--epochs` flag to control\nhow many [epochs](https://developers.google.com/machine-learning/glossary#epoch)\nit runs for, you might run the following command: \n\n docker run ${IMAGE_URI} --epochs 1\n\nPush the container to Artifact Registry\n---------------------------------------\n\nIf the local run works, you can push the container to Artifact Registry.\n\nFirst, run\n[`gcloud auth configure-docker us-central1-docker.pkg.dev`](/sdk/gcloud/reference/auth/configure-docker) if\nyou have not already done so in your development environment. Then run the\nfollowing command: \n\n docker push ${IMAGE_URI}\n\n| Models, prediction containers, and training containers are code. It's important to isolate less trusted code from sensitive models and data. Deploy endpoints and training stages in their own projects, use a dedicated service account with very limited permissions, and use VPC Service Controls to isolate them and reduce the impact of access granted to such containers and models.\n\n### Artifact Registry permissions\n\nIf you are using an Artifact Registry image from the\nsame Google Cloud project where you're using Vertex AI, then there\nis no further need to configure permissions. You can immediately [create a\ncustom training job](/vertex-ai/docs/training/create-custom-job) that uses your\ncontainer image.\n\nHowever, if you have pushed your container image to Artifact Registry\nin a different Google Cloud project from the\nproject where you plan to use Vertex AI, then you must grant the\nVertex AI Service Agent for your Vertex AI project permission to pull the\nimage from the other project. [Learn more about the Vertex AI Service Agent and how to\ngrant it permissions](/vertex-ai/docs/general/access-control#service-agents). \n\n### Artifact Registry\n\nTo learn how to grant your Vertex AI Service Agent access to your Artifact Registry\nrepository, read the Artifact Registry documentation about [granting\nrepository-specific\npermissions](/artifact-registry/docs/access-control#grant-repo).\n\nWhat's next\n-----------\n\n- Learn more about [the concepts involved in using\n containers](/vertex-ai/docs/training/containers-overview).\n- Learn about additional [training code\n requirements](/vertex-ai/docs/training/code-requirements) for custom training.\n- Learn how to [create a custom training\n job](/vertex-ai/docs/training/create-custom-job) or a [training\n pipeline](/vertex-ai/docs/training/create-training-pipeline) that uses your custom container."]]