이 페이지에서는 Google Cloud CLI, Vertex AI SDK for Python, REST API를 사용하여 영구 리소스에서 커스텀 학습 작업을 실행하는 방법을 보여줍니다.
일반적으로 커스텀 학습 작업을 만들 때는 작업을 만들고 실행할 컴퓨팅 리소스를 지정해야 합니다. 영구 리소스를 만든 후에는 대신 해당 영구 리소스의 리소스 풀 하나 이상에서 실행할 커스텀 학습 작업을 구성할 수 있습니다. 영구 리소스에서 커스텀 학습 작업을 실행하면 컴퓨팅 리소스를 만드는 데 필요한 작업 시작 시간이 현저하게 줄어듭니다.
필요한 역할
영구 리소스에서 커스텀 학습 작업을 실행하는 데 필요한 권한을 얻으려면 관리자에게 프로젝트에 대한 Vertex AI 사용자(roles/aiplatform.user) IAM 역할을 부여해 달라고 요청하세요.
역할 부여에 대한 자세한 내용은 프로젝트, 폴더, 조직에 대한 액세스 관리를 참조하세요.
이 사전 정의된 역할에는 영구 리소스에서 커스텀 학습 작업을 실행하는 데 필요한 aiplatform.customJobs.create 권한이 포함되어 있습니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-07-16(UTC)"],[],[],null,["# Run custom training jobs on a persistent resource\n\nThis page shows you how to run a custom training job on a persistent resource by\nusing the Google Cloud CLI, Vertex AI SDK for Python, and the REST API.\n\nNormally, when you\n[create a custom training job](/vertex-ai/docs/training/create-custom-job), you need to\nspecify compute resources that the job creates and runs on. After you create a\npersistent resource, you can instead configure the custom training job to run on\none or more resource pools of that persistent resource. Running a custom\ntraining job on a persistent resource significantly reduces the job startup time\nthat's otherwise needed for compute resource creation.\n\nRequired roles\n--------------\n\n\nTo get the permission that\nyou need to run custom training jobs on a persistent resource,\n\nask your administrator to grant you the\n\n\n[Vertex AI User](/iam/docs/roles-permissions/aiplatform#aiplatform.user) (`roles/aiplatform.user`)\nIAM role on your project.\n\n\nFor more information about granting roles, see [Manage access to projects, folders, and organizations](/iam/docs/granting-changing-revoking-access).\n\n\nThis predefined role contains the\n` aiplatform.customJobs.create`\npermission,\nwhich is required to\nrun custom training jobs on a persistent resource.\n\n\nYou might also be able to get\nthis permission\nwith [custom roles](/iam/docs/creating-custom-roles) or\nother [predefined roles](/iam/docs/roles-overview#predefined).\n\nCreate a training job that runs on a persistent resource\n--------------------------------------------------------\n\nTo create a custom training jobs that runs on a persistent resource, make the\nfollowing modifications to the standard instructions for\n[creating a custom training job](/vertex-ai/docs/training/create-custom-job): \n\n### gcloud\n\n- Specify the `--persistent-resource-id` flag and set the value to the ID of the persistent resource (\u003cvar translate=\"no\"\u003ePERSISTENT_RESOURCE_ID\u003c/var\u003e) that you want to use.\n- Specify the `--worker-pool-spec` flag such that the values for `machine-type` and `disk-type` matches exactly with a corresponding resource pool from the persistent resource. Specify one `--worker-pool-spec` for single node training and multiple for distributed training.\n- Specify a `replica-count` less than or equal to the `replica-count` or `max-replica-count` of the corresponding resource pool.\n\n### Python\n\nTo learn how to install or update the Vertex AI SDK for Python, see [Install the Vertex AI SDK for Python](/vertex-ai/docs/start/use-vertex-ai-python-sdk).\n\nFor more information, see the\n[Python API reference documentation](/python/docs/reference/aiplatform/latest).\n\n def create_custom_job_on_persistent_resource_sample(\n project: str,\n location: str,\n staging_bucket: str,\n display_name: str,\n container_uri: str,\n persistent_resource_id: str,\n service_account: Optional[str] = None,\n ) -\u003e None:\n aiplatform.init(\n project=project, location=location, staging_bucket=staging_bucket\n )\n\n worker_pool_specs = [{\n \"machine_spec\": {\n \"machine_type\": \"n1-standard-4\",\n \"accelerator_type\": \"NVIDIA_TESLA_K80\",\n \"accelerator_count\": 1,\n },\n \"replica_count\": 1,\n \"container_spec\": {\n \"image_uri\": container_uri,\n \"command\": [],\n \"args\": [],\n },\n }]\n\n custom_job = aiplatform.CustomJob(\n display_name=display_name,\n worker_pool_specs=worker_pool_specs,\n persistent_resource_id=persistent_resource_id,\n )\n\n custom_job.run(service_account=service_account)\n\n### REST\n\n- Specify the `persistent_resource_id` parameter and set the value to the ID of the persistent resource (\u003cvar translate=\"no\"\u003ePERSISTENT_RESOURCE_ID\u003c/var\u003e) that you want to use.\n- Specify the `worker_pool_specs` parameter such that the values of `machine_spec` and `disk_spec` for each resource pool matches exactly with a corresponding resource pool from the persistent resource. Specify one `machine_spec` for single node training and multiple for distributed training.\n- Specify a `replica_count` less than or equal to the `replica_count` or `max_replica_count` of the corresponding resource pool, excluding the replica count of any other jobs running on that resource pool.\n\nWhat's next\n-----------\n\n- [Learn about persistent resource](/vertex-ai/docs/training/persistent-resource-overview).\n- [Create and use a persistent resource](/vertex-ai/docs/training/persistent-resource-create).\n- [Get information about a persistent resource](/vertex-ai/docs/training/persistent-resource-get).\n- [Reboot a persistent resource](/vertex-ai/docs/training/persistent-resource-reboot).\n- [Delete a persistent resource](/vertex-ai/docs/training/persistent-resource-delete)."]]