本頁面由 Cloud Translation API 翻譯而成。

自行建構管道元件

編寫元件來顯示 Google Cloud 控制台連結

執行元件時，您通常不僅想查看啟動的元件工作連結，也想查看基礎雲端資源的連結，例如 Vertex 批次預測工作或 Dataflow 工作。

gcp_resource proto 是一種特殊參數，您可以在元件中使用，讓 Google Cloud 控制台在 Vertex AI Pipelines 控制台中提供資源記錄和狀態的自訂檢視畫面。

輸出 `gcp_resource` 參數

使用容器型元件

首先，您需要在元件中定義 gcp_resource 參數，如下列範例 component.py 檔案所示：

Python

如要瞭解如何安裝或更新 Python 適用的 Vertex AI SDK，請參閱「安裝 Python 適用的 Vertex AI SDK」。詳情請參閱 Python API 參考說明文件。

# Copyright 2023 The Kubeflow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import List

from google_cloud_pipeline_components import _image
from google_cloud_pipeline_components import _placeholders
from kfp.dsl import container_component
from kfp.dsl import ContainerSpec
from kfp.dsl import OutputPath


@container_component
def dataflow_python(
    python_module_path: str,
    temp_location: str,
    gcp_resources: OutputPath(str),
    location: str = 'us-central1',
    requirements_file_path: str = '',
    args: List[str] = [],
    project: str = _placeholders.PROJECT_ID_PLACEHOLDER,
):
  # fmt: off
  """Launch a self-executing Beam Python file on Google Cloud using the
  Dataflow Runner.

  Args:
      location: Location of the Dataflow job. If not set, defaults to `'us-central1'`.
      python_module_path: The GCS path to the Python file to run.
      temp_location: A GCS path for Dataflow to stage temporary job files created during the execution of the pipeline.
      requirements_file_path: The GCS path to the pip requirements file.
      args: The list of args to pass to the Python file. Can include additional parameters for the Dataflow Runner.
      project: Project to create the Dataflow job. Defaults to the project in which the PipelineJob is run.

  Returns:
      gcp_resources: Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
  """
  # fmt: on
  return ContainerSpec(
      image=_image.GCPC_IMAGE_TAG,
      command=[
          'python3',
          '-u',
          '-m',
          'google_cloud_pipeline_components.container.v1.dataflow.dataflow_launcher',
      ],
      args=[
          '--project',
          project,
          '--location',
          location,
          '--python_module_path',
          python_module_path,
          '--temp_location',
          temp_location,
          '--requirements_file_path',
          requirements_file_path,
          '--args',
          args,
          '--gcp_resources',
          gcp_resources,
      ],
  )

接著，在容器中安裝 Google Cloud Pipeline Components 套件：

pip install --upgrade google-cloud-pipeline-components

接著，在 Python 程式碼中，將資源定義為 gcp_resource 參數：

Python

如要瞭解如何安裝或更新 Python 適用的 Vertex AI SDK，請參閱「安裝 Python 適用的 Vertex AI SDK」。詳情請參閱 Python API 參考說明文件。

from google_cloud_pipeline_components.proto.gcp_resources_pb2 import GcpResources
from google.protobuf.json_format import MessageToJson

dataflow_resources = GcpResources()
dr = dataflow_resources.resources.add()
dr.resource_type='DataflowJob'
dr.resource_uri='https://dataflow.googleapis.com/v1b3/projects/[your-project]/locations/us-east1/jobs/[dataflow-job-id]'

with open(gcp_resources, 'w') as f:
    f.write(MessageToJson(dataflow_resources))

使用 Python 元件

或者，您也可以像處理任何字串輸出參數一樣，傳回 gcp_resources 輸出參數：

@dsl.component(
    base_image='python:3.9',
    packages_to_install=['google-cloud-pipeline-components==2.19.0'],
)
def launch_dataflow_component(project: str, location:str) -> NamedTuple("Outputs",  [("gcp_resources", str)]):
  # Launch the dataflow job
  dataflow_job_id = [dataflow-id]
  dataflow_resources = GcpResources()
  dr = dataflow_resources.resources.add()
  dr.resource_type='DataflowJob'
  dr.resource_uri=f'https://dataflow.googleapis.com/v1b3/projects/{project}/locations/{location}/jobs/{dataflow_job_id}'
  gcp_resources=MessageToJson(dataflow_resources)
  return gcp_resources

支援的 `resource_type` 值

您可以將 resource_type 設為任意字串，但只有下列類型會在 Google Cloud 控制台中顯示連結：

BatchPredictionJob
BigQueryJob
CustomJob
DataflowJob
HyperparameterTuningJob

撰寫元件來取消基礎資源

取消管道工作時，預設行為是讓基礎 Google Cloud 資源繼續執行。系統不會自動取消。如要變更這項行為，請將 SIGTERM 處理常式附加至管道工作。如果工作可能長時間執行，建議您在輪詢迴圈之前執行這項操作。

取消作業已在多個 Google Cloud 管道元件中實作，包括：

批次預測工作
BigQuery ML 工作
自訂工作
Google Cloud 無伺服器 Apache Spark 批次工作
超參數微調工作

如要瞭解詳情 (包括如何附加 SIGTERM 處理常式的程式碼範例)，請參閱下列 GitHub 連結：

實作 SIGTERM 處理常式時，請考量下列事項：

元件執行幾分鐘後，取消傳播功能才會生效。這是因為在呼叫 Python 信號處理常式之前，需要處理背景啟動工作。
部分 Google Cloud 資源可能未實作取消功能。舉例來說，建立或刪除 Vertex AI 端點或模型可能會建立長期執行的作業，這類作業會透過 REST API 接受取消要求，但不會實作取消作業本身。

自行建構管道元件 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

編寫元件來顯示 Google Cloud 控制台連結

輸出 gcp_resource 參數

使用容器型元件

Python

Python

使用 Python 元件

支援的 resource_type 值

撰寫元件來取消基礎資源

自行建構管道元件

輸出 `gcp_resource` 參數

支援的 `resource_type` 值