# Specifies base image and tag
FROMimage:tag
WORKDIR/root
# Installs additional packages
RUNpipinstallpkg1pkg2pkg3
# Downloads training data
RUNcurlhttps://example-url/path-to-data/data-filename--output/root/data-filename
# Copies the trainer code to the docker image.
COPYyour-path-to/model.py/root/model.py
COPYyour-path-to/task.py/root/task.py
# Sets up the entry point to invoke the trainer.
ENTRYPOINT["python","task.py"]
如果您使用的 Artifact Registry 映像与您使用的 Vertex AI 都在同一 Google Cloud 项目中,则无需进一步配置权限。您可以立即创建使用您的容器映像的自定义训练作业。
但是,如果您已将容器映像推送到 Artifact Registry 并且其所在 Google Cloud 项目与计划使用 Vertex AI 的项目不同,则必须为 Vertex AI 项目的 Vertex AI Service Agent 授予从其他项目拉取映像的权限。详细了解 Vertex AI Service Agent 以及如何授予其权限。
Artifact Registry
如需了解如何为 Vertex AI Service Agent 授予 Artifact Registry 代码库的访问权限,请参阅有关授予特定于代码库的权限的 Artifact Registry 文档。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-04。"],[],[],null,["# Create a custom container image for training\n\nUsing a custom container image provides the most flexibility for\ntraining on Vertex AI. To learn how using a custom container image\ndiffers from [using a Python training application with a prebuilt\ncontainer](/vertex-ai/docs/training/create-python-pre-built-container), read [Training\ncode requirements](/vertex-ai/docs/training/code-requirements).\n\nThe guide walks through the following steps:\n\n1. Creating a custom container:\n 1. Writing a Dockerfile that sets up your container to work with Vertex AI and includes dependencies needed for your training application.\n 2. Building and running your Docker container locally.\n2. Pushing the container image to Artifact Registry.\n\n| **Note:** Vertex AI supports custom training with container images on Artifact Registry or Docker Hub. This guide focuses on using Artifact Registry with Vertex AI.\n\nBefore you begin\n----------------\n\nTo configure an Artifact Registry API repository and set up Docker in your development\nenvironment, follow [Artifact Registry's Quickstart for\nDocker](/artifact-registry/docs/docker/quickstart#before-you-begin).\nSpecifically, make sure to complete the following steps of the quickstart:\n\n- Before you begin\n- Choose a shell\n- Create a Docker repository\n- Configure authentication\n\nCreate a custom container image\n-------------------------------\n\nWe recommend two possible workflows for creating a custom container image:\n\n- Write your training code. Then, [use the\n gcloud CLI's `local-run` command to build and test a custom\n container image](/vertex-ai/docs/training/containerize-run-code-local) based on your\n training code without writing a Dockerfile yourself.\n\n This workflow can be more straightforward if you are not familiar with Docker.\n If you follow this workflow, you can skip the rest of this section.\n- Write your training code. Then, write a Dockerfile and build a container image\n based on it. Finally, test the container locally.\n\n This workflow offers more flexibility, because you can customize your\n container image as much as you want.\n\nThe rest of this section walks through an example of the latter workflow.\n\n### Training code\n\nYou can write training code using any dependencies in any programming language.\nMake sure your code meets the [training code\nrequirements](/vertex-ai/docs/training/code-requirements). If you plan to use\nhyperparameter tuning, GPUs, or distributed training, make sure to read the\ncorresponding sections of that document; these sections describe specific\nconsiderations for using the features with custom containers.\n\n### Create a Dockerfile\n\nCreate a [Dockerfile](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/) to specify all the\ninstructions needed to build your container image.\n\nThis section walks through creating a generic example of a Dockerfile to use for\ncustom training. To learn more about creating a container image, read the\n[Docker documentation's quickstart](https://docs.docker.com/get-started/).\n\nFor use with Vertex AI, your Dockerfile needs to include commands\nthat cover the following tasks:\n\n- Choose a base image\n- Install additional dependencies\n- Copy your training code to the image\n- Configure the entrypoint for Vertex AI to invoke your training code\n\nYour Dockerfile can include additional logic, depending on your needs. For more\ninformation about each specific instruction, see the\n[Dockerfile reference](https://docs.docker.com/engine/reference/builder/).\n\nThe logic in your Dockerfile may vary according to your needs, but in general\nit resembles this: \n\n```bash\n# Specifies base image and tag\nFROM image:tag\nWORKDIR /root\n\n# Installs additional packages\nRUN pip install pkg1 pkg2 pkg3\n\n# Downloads training data\nRUN curl https://example-url/path-to-data/data-filename --output /root/data-filename\n\n# Copies the trainer code to the docker image.\nCOPY your-path-to/model.py /root/model.py\nCOPY your-path-to/task.py /root/task.py\n\n# Sets up the entry point to invoke the trainer.\nENTRYPOINT [\"python\", \"task.py\"]\n```\n\n#### (Optional) Adjust your Dockerfile for TPU VMs\n\nIf you want to train on Vertex AI using a TPU VM, then you must\nadjust your Dockerfile to install specially built versions of the `tensorflow`\nand `libtpu` libraries. Learn more about [adjusting your container for use with\na TPU VM](/vertex-ai/docs/training/configure-compute#tpu-requirements).\n\n### Build the container image\n\nCreate the correct image URI by using environment variables, and then build\nthe Docker image: \n\n export PROJECT_ID=$(gcloud config list project --format \"value(core.project)\")\n export REPO_NAME=\u003cvar translate=\"no\"\u003eREPOSITORY_NAME\u003c/var\u003e\n export IMAGE_NAME=\u003cvar translate=\"no\"\u003eIMAGE_NAME\u003c/var\u003e\n export IMAGE_TAG=\u003cvar translate=\"no\"\u003eIMAGE_TAG\u003c/var\u003e\n export IMAGE_URI=us-central1-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/${IMAGE_NAME}:${IMAGE_TAG}\n\n docker build -f Dockerfile -t ${IMAGE_URI} ./\n\nIn these commands replace the following:\n\n- \u003cvar translate=\"no\"\u003eREPOSITORY_NAME\u003c/var\u003e: the name of the Artifact Registry repository that you created in the [Before you begin](#before-you-begin) section.\n- \u003cvar translate=\"no\"\u003eIMAGE_NAME\u003c/var\u003e: a name of your choice for your container image.\n- \u003cvar translate=\"no\"\u003eIMAGE_TAG\u003c/var\u003e: a tag of your choice for this version of your container image.\n\nLearn more about [Artifact Registry's requirements for naming your container\nimage](/artifact-registry/docs/docker/pushing-and-pulling#tag).\n\n### Run the container locally (optional)\n\nVerify the container image by running it as a container locally. You likely want\nto run your training code on a smaller dataset or for a shorter number of\niterations than you plan to run on Vertex AI. For example, if the\nentrypoint script in your container image accepts an `--epochs` flag to control\nhow many [epochs](https://developers.google.com/machine-learning/glossary#epoch)\nit runs for, you might run the following command: \n\n docker run ${IMAGE_URI} --epochs 1\n\nPush the container to Artifact Registry\n---------------------------------------\n\nIf the local run works, you can push the container to Artifact Registry.\n\nFirst, run\n[`gcloud auth configure-docker us-central1-docker.pkg.dev`](/sdk/gcloud/reference/auth/configure-docker) if\nyou have not already done so in your development environment. Then run the\nfollowing command: \n\n docker push ${IMAGE_URI}\n\n| Models, prediction containers, and training containers are code. It's important to isolate less trusted code from sensitive models and data. Deploy endpoints and training stages in their own projects, use a dedicated service account with very limited permissions, and use VPC Service Controls to isolate them and reduce the impact of access granted to such containers and models.\n\n### Artifact Registry permissions\n\nIf you are using an Artifact Registry image from the\nsame Google Cloud project where you're using Vertex AI, then there\nis no further need to configure permissions. You can immediately [create a\ncustom training job](/vertex-ai/docs/training/create-custom-job) that uses your\ncontainer image.\n\nHowever, if you have pushed your container image to Artifact Registry\nin a different Google Cloud project from the\nproject where you plan to use Vertex AI, then you must grant the\nVertex AI Service Agent for your Vertex AI project permission to pull the\nimage from the other project. [Learn more about the Vertex AI Service Agent and how to\ngrant it permissions](/vertex-ai/docs/general/access-control#service-agents). \n\n### Artifact Registry\n\nTo learn how to grant your Vertex AI Service Agent access to your Artifact Registry\nrepository, read the Artifact Registry documentation about [granting\nrepository-specific\npermissions](/artifact-registry/docs/access-control#grant-repo).\n\nWhat's next\n-----------\n\n- Learn more about [the concepts involved in using\n containers](/vertex-ai/docs/training/containers-overview).\n- Learn about additional [training code\n requirements](/vertex-ai/docs/training/code-requirements) for custom training.\n- Learn how to [create a custom training\n job](/vertex-ai/docs/training/create-custom-job) or a [training\n pipeline](/vertex-ai/docs/training/create-training-pipeline) that uses your custom container."]]