配置在永久性资源上运行的流水线

Vertex AI 永久性资源指的是长时间运行的集群，您可以使用此类集群来运行自定义训练作业和流水线运行。通过在运行流水线时使用永久性资源，您可以帮助确保计算资源的可用性并缩短流水线任务启动时间。永久性资源可与自定义训练作业支持的所有虚拟机和 GPU 搭配使用。如需详细了解永久性资源，请参阅永久性资源概览。

本页面介绍如何执行以下操作：

创建永久性资源
使用永久性资源创建流水线运行

准备工作

您必须先满足以下前提条件，然后才能使用永久性资源创建流水线运行。

定义并编译流水线

定义流水线，然后将流水线定义编译为 YAML 文件。如需详细了解如何定义和编译流水线，请参阅构建流水线。

所需 IAM 角色

如需获得创建永久性资源所需的权限，请让管理员向您授予项目的 Vertex AI Administrator (roles/aiplatform.admin) IAM 角色。如需详细了解如何授予角色，请参阅管理对项目、文件夹和组织的访问权限。

此预定义角色包含创建永久性资源所需的 aiplatform.persistentResources.create 权限。

您也可以使用自定义角色或其他预定义角色来获取此权限。

创建永久性资源

使用以下示例创建可与流水线运行关联的永久性资源。如需详细了解如何创建永久性资源，请参阅创建永久性资源。

gcloud

如需创建可与流水线运行关联的永久性资源，请将 gcloud ai persistent-resources create 命令与 --enable-custom-service-account 标志结合使用。

永久性资源可以具有一个或多个资源池。如需在一个永久性资源中创建多个资源池，请指定多个 --resource-pool-spec 标志。

您可以在命令行中指定所有资源池配置，也可以使用 --config 标志指定包含配置的 YAML 文件的路径。

在使用下面的命令数据之前，请先进行以下替换：

PROJECT_ID：要在其中创建永久性资源的 Google Cloud 项目的项目 ID。
LOCATION：要创建永久性资源的区域。如需查看受支持区域的列表，请参阅功能可用性。
PERSISTENT_RESOURCE_ID：永久性资源的唯一用户定义 ID。该 ID 必须以字母开头，以字母或数字结尾，并且只能包含小写字母、数字和连字符 (-)。
DISPLAY_NAME：可选。永久性资源的显示名称。
MACHINE_TYPE：要使用的虚拟机 (VM) 类型。如需查看支持的虚拟机列表，请参阅机器类型。此字段对应于 ResourcePool API 消息中的 machineSpec.machineType 字段。
REPLICA_COUNT：可选。要为资源池创建的副本数（如果您不想使用自动扩缩）。此字段对应于 ResourcePool API 消息中的 replicaCount 字段。如果您未指定 MIN_REPLICA_COUNT 和 MAX_REPLICA_COUNT 字段，则必须指定副本数。
MIN_REPLICA_COUNT：可选。副本数下限（如果您对资源池使用自动扩缩）。您必须同时指定 MIN_REPLICA_COUNT 和 MAX_REPLICA_COUNT 才能使用自动扩缩。
MAX_REPLICA_COUNT：可选。副本数上限（如果您对资源池使用自动扩缩）。您必须同时指定 MIN_REPLICA_COUNT 和 MAX_REPLICA_COUNT 才能使用自动扩缩。
CONFIG：永久性资源 YAML 配置文件的路径，其中包含 ResourcePool 规范列表。如果配置文件和命令行参数中都指定了选项，则命令行参数会替换配置文件。请注意，带有下划线的键会被认为无效。
示例 YAML 配置文件：
```
resourcePoolSpecs:
  machineSpec:
    machineType: n1-standard-4
  replicaCount: 1
    
```

执行以下命令：

Linux、macOS 或 Cloud Shell

gcloud ai persistent-resources create \
    --persistent-resource-id=PERSISTENT_RESOURCE_ID \
    --display-name=DISPLAY_NAME \
    --project=PROJECT_ID \
    --region=LOCATION \
    --resource-pool-spec="replica-count=REPLICA_COUNT,machine-type=MACHINE_TYPE,min-replica-count=MIN_REPLICA_COUNT,max-replica-count=MAX_REPLICA_COUNT" \
    --enable-custom-service-account

Windows (PowerShell)

gcloud ai persistent-resources create `
    --persistent-resource-id=PERSISTENT_RESOURCE_ID `
    --display-name=DISPLAY_NAME `
    --project=PROJECT_ID `
    --region=LOCATION `
    --resource-pool-spec="replica-count=REPLICA_COUNT,machine-type=MACHINE_TYPE,min-replica-count=MIN_REPLICA_COUNT,max-replica-count=MAX_REPLICA_COUNT" `
    --enable-custom-service-account

Windows (cmd.exe)

gcloud ai persistent-resources create ^
    --persistent-resource-id=PERSISTENT_RESOURCE_ID ^
    --display-name=DISPLAY_NAME ^
    --project=PROJECT_ID ^
    --region=LOCATION ^
    --resource-pool-spec="replica-count=REPLICA_COUNT,machine-type=MACHINE_TYPE,min-replica-count=MIN_REPLICA_COUNT,max-replica-count=MAX_REPLICA_COUNT" ^
    --enable-custom-service-account

您应该会收到类似如下所示的响应：

Using endpoint [https://us-central1-aiplatform.googleapis.com/]
Operation to create PersistentResource [projects/PROJECT_NUMBER/locations/us-central1/persistentResources/mypersistentresource/operations/OPERATION_ID] is submitted successfully.

You can view the status of your PersistentResource create operation with the command

  $ gcloud ai operations describe projects/sample-project/locations/us-central1/operations/OPERATION_ID

示例 gcloud 命令：

gcloud ai persistent-resources create \
    --persistent-resource-id=my-persistent-resource \
    --region=us-central1 \
    --resource-pool-spec="replica-count=4,machine-type=n1-standard-4"
    --enable-custom-service-account

`gcloud` 中的高级配置

如果您想指定前面示例中未提供的配置选项，则可以使用 --config 标志指定本地环境中的 config.yaml 文件的路径，该文件包含 persistentResources 的字段。例如：

gcloud ai persistent-resources create \
    --persistent-resource-id=PERSISTENT_RESOURCE_ID \
    --project=PROJECT_ID \
    --region=LOCATION \
    --config=CONFIG
    --enable-custom-service-account

Python

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Python 设置说明执行操作。如需了解详情，请参阅 Vertex AI Python API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为本地开发环境设置身份验证。

如需创建可与流水线运行搭配使用的永久性资源，请在创建永久性资源时将 ResourceRuntimeSpec 对象中的 enable_custom_service_account 参数设置为 True。

from google.cloud.aiplatform.preview import persistent_resource
from google.cloud.aiplatform_v1beta1.types.persistent_resource import ResourcePool
from google.cloud.aiplatform_v1beta1.types.machine_resources import MachineSpec

my_example_resource = persistent_resource.PersistentResource.create(
    persistent_resource_id='PERSISTENT_RESOURCE_ID',
    display_name='DISPLAY_NAME',
    resource_pools=[
        ResourcePool(
            machine_spec=MachineSpec(
                machine_type='MACHINE_TYPE'
            ),
            replica_count=REPLICA_COUNT
        )
    ],
    enable_custom_service_account=True,
)

替换以下内容：

PERSISTENT_RESOURCE_ID：永久性资源的唯一用户定义 ID。该 ID 只能包含小写字母、数字和连字符 (-)。第一个字符必须是小写字母，最后一个字符必须是小写字母或数字。
DISPLAY_NAME：可选。永久性资源的显示名称。
MACHINE_TYPE：要使用的虚拟机 (VM) 类型。如需查看支持的虚拟机列表，请参阅机器类型。此字段对应于 ResourcePool API 消息中的 machineSpec.machineType 字段。
REPLICA_COUNT：创建此资源池时要创建的副本数。

REST

如需创建可与流水线运行关联的 PersistentResource 资源，请使用 persistentResources/create 方法发送 POST 请求，并在请求正文中将 enable_custom_service_account 参数设置为 true。

永久性资源可以具有一个或多个资源池。您可以将每个资源池配置为使用固定数量的副本或自动扩缩。

在使用任何请求数据之前，请先进行以下替换：

PROJECT_ID：要在其中创建永久性资源的 Google Cloud 项目的项目 ID。
LOCATION：要创建永久性资源的区域。如需查看受支持区域的列表，请参阅功能可用性。
PERSISTENT_RESOURCE_ID：永久性资源的唯一用户定义 ID。该 ID 必须以字母开头，以字母或数字结尾，并且只能包含小写字母、数字和连字符 (-)。
DISPLAY_NAME：可选。永久性资源的显示名称。
MACHINE_TYPE：要使用的虚拟机 (VM) 类型。如需查看支持的虚拟机列表，请参阅机器类型。此字段对应于 ResourcePool API 消息中的 machineSpec.machineType 字段。
REPLICA_COUNT：可选。要为资源池创建的副本数（如果您不想使用自动扩缩）。此字段对应于 ResourcePool API 消息中的 replicaCount 字段。如果您未指定 MIN_REPLICA_COUNT 和 MAX_REPLICA_COUNT 字段，则必须指定副本数。
MIN_REPLICA_COUNT：可选。副本数下限（如果您对资源池使用自动扩缩）。您必须同时指定 MIN_REPLICA_COUNT 和 MAX_REPLICA_COUNT 才能使用自动扩缩。
MAX_REPLICA_COUNT：可选。副本数上限（如果您对资源池使用自动扩缩）。您必须同时指定 MIN_REPLICA_COUNT 和 MAX_REPLICA_COUNT 才能使用自动扩缩。

HTTP 方法和网址：

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/persistentResources?persistent_resource_id=PERSISTENT_RESOURCE_ID

请求 JSON 正文：

{
  "display_name": "DISPLAY_NAME",
  "resource_pools": [
    {
      "machine_spec": {
        "machine_type": "MACHINE_TYPE"
      },
      "replica_count": REPLICA_COUNT,
      "autoscaling_spec": {
        "min_replica_count": MIN_REPLICA_COUNT,
        "max_replica_count": MAX_REPLICA_COUNT
      }
    }
  ],
  "resource_runtime_spec": {
    "service_account_spec": {
      "enable_custom_service_account": true
    }
  }
}

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中。在终端中运行以下命令，在当前目录中创建或覆盖此文件：

cat > request.json << 'EOF'
{
  "display_name": "DISPLAY_NAME",
  "resource_pools": [
    {
      "machine_spec": {
        "machine_type": "MACHINE_TYPE"
      },
      "replica_count": REPLICA_COUNT,
      "autoscaling_spec": {
        "min_replica_count": MIN_REPLICA_COUNT,
        "max_replica_count": MAX_REPLICA_COUNT
      }
    }
  ],
  "resource_runtime_spec": {
    "service_account_spec": {
      "enable_custom_service_account": true
    }
  }
}
EOF

然后，执行以下命令以发送 REST 请求：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/persistentResources?persistent_resource_id=PERSISTENT_RESOURCE_ID"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中。在终端中运行以下命令，在当前目录中创建或覆盖此文件：

@'
{
  "display_name": "DISPLAY_NAME",
  "resource_pools": [
    {
      "machine_spec": {
        "machine_type": "MACHINE_TYPE"
      },
      "replica_count": REPLICA_COUNT,
      "autoscaling_spec": {
        "min_replica_count": MIN_REPLICA_COUNT,
        "max_replica_count": MAX_REPLICA_COUNT
      }
    }
  ],
  "resource_runtime_spec": {
    "service_account_spec": {
      "enable_custom_service_account": true
    }
  }
}
'@  | Out-File -FilePath request.json -Encoding utf8

然后，执行以下命令以发送 REST 请求：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/persistentResources?persistent_resource_id=PERSISTENT_RESOURCE_ID" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/persistentResources/mypersistentresource/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreatePersistentResourceOperationMetadata",
    "genericMetadata": {
      "createTime": "2023-02-08T21:17:15.009668Z",
      "updateTime": "2023-02-08T21:17:15.009668Z"
    }
  }
}

使用永久性资源创建流水线运行

如需创建流水线作业，您必须先创建流水线规范。流水线规范是一种内存中对象，您可以通过转换已编译的流水线定义来创建该对象。

创建流水线规范

按照以下说明创建内存中流水线规范，之后您可以使用该规范来创建流水线运行：

定义流水线并将其编译为 YAML 文件。如需详细了解如何定义和编译流水线，请参阅构建流水线。

使用以下代码示例将已编译的流水线 YAML 文件转换为内存中流水线规范。

import yaml
with open("COMPILED_PIPELINE_PATH", "r") as stream:
  try:
    pipeline_spec = yaml.safe_load(stream)
    print(pipeline_spec)
  except yaml.YAMLError as exc:
    print(exc)

将 COMPILED_PIPELINE_PATH 替换为已编译的流水线 YAML 文件的本地路径。

创建流水线运行

使用以下 Python 代码示例创建使用永久性资源的流水线运行：

# Import aiplatform and the appropriate API version v1beta1
from google.cloud import aiplatform, aiplatform_v1beta1
from google.cloud.aiplatform_v1beta1.types import pipeline_job as pipeline_job_types

# Initialize the Vertex SDK using PROJECT_ID and LOCATION
aiplatform.init(project='PROJECT_ID', location='LOCATION')

# Create the API Endpoint
client_options = {
    "api_endpoint": f"LOCATION-aiplatform.googleapis.com"
}

# Initialize the PipeLineServiceClient
client = aiplatform_v1beta1.PipelineServiceClient(client_options=client_options)

# Construct the runtime detail
pr_runtime_detail = pipeline_job_types.PipelineJob.RuntimeConfig.PersistentResourceRuntimeDetail(
    persistent_resource_name=(
        f"projects/PROJECT_NUMBER/"
        f"locations/LOCATION/"
        f"persistentResources/PERSISTENT_RESOURCE_ID"
    ),
    task_resource_unavailable_wait_time_ms=WAIT_TIME,
    task_resource_unavailable_timeout_behavior='TIMEOUT_BEHAVIOR',
)

# Construct the default runtime configuration block
default_runtime = pipeline_job_types.PipelineJob.RuntimeConfig.DefaultRuntime(
    persistent_resource_runtime_detail=pr_runtime_detail
)

# Construct the main runtime configuration
runtime_config = pipeline_job_types.PipelineJob.RuntimeConfig(
    gcs_output_directory='PIPELINE_ROOT',
    parameter_values={
        'project_id': 'PROJECT_ID'
    },
    default_runtime=default_runtime
)

# Construct the pipeline job object
pipeline_job = pipeline_job_types.PipelineJob(
    display_name='PIPELINE_DISPLAY_NAME',
    pipeline_spec=PIPELINE_SPEC,
    runtime_config=runtime_config,
)

# Construct the request
parent_path = f"projects/PROJECT_ID/locations/LOCATION"
request = aiplatform_v1beta1.CreatePipelineJobRequest(
    parent=parent_path,
    pipeline_job=pipeline_job,
)

# Make the API Call to create the pipeline job
response = client.create_pipeline_job(request=request)

# Construct the Google Cloud console link
job_id = response.name.split('/')[-1]
console_link = (
    f"https://console.cloud.google.com/vertex-ai/locations/LOCATION"
    f"/pipelines/runs/{job_id}"
    f"?project=PROJECT_ID"
)

# Print the Google Cloud console link to the pipeline run
print(f"View Pipeline Run in Google Cloud console: {console_link}")