# Copyright 2023 The Kubeflow Authors. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.fromtypingimportListfromgoogle_cloud_pipeline_componentsimport_imagefromgoogle_cloud_pipeline_componentsimport_placeholdersfromkfp.dslimportcontainer_componentfromkfp.dslimportContainerSpecfromkfp.dslimportOutputPath@container_componentdefdataflow_python(python_module_path:str,temp_location:str,gcp_resources:OutputPath(str),location:str='us-central1',requirements_file_path:str='',args:List[str]=[],project:str=_placeholders.PROJECT_ID_PLACEHOLDER,):# fmt: off"""Launch a self-executing Beam Python file on Google Cloud using the Dataflow Runner. Args: location: Location of the Dataflow job. If not set, defaults to `'us-central1'`. python_module_path: The GCS path to the Python file to run. temp_location: A GCS path for Dataflow to stage temporary job files created during the execution of the pipeline. requirements_file_path: The GCS path to the pip requirements file. args: The list of args to pass to the Python file. Can include additional parameters for the Dataflow Runner. project: Project to create the Dataflow job. Defaults to the project in which the PipelineJob is run. Returns: gcp_resources: Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md. """# fmt: onreturnContainerSpec(image=_image.GCPC_IMAGE_TAG,command=['python3','-u','-m','google_cloud_pipeline_components.container.v1.dataflow.dataflow_launcher',],args=['--project',project,'--location',location,'--python_module_path',python_module_path,'--temp_location',temp_location,'--requirements_file_path',requirements_file_path,'--args',args,'--gcp_resources',gcp_resources,],)
그런 다음 컨테이너 내에서 Google Cloud 파이프라인 구성요소 패키지를 설치합니다.
resource_type을 임의의 문자열로 설정할 수 있지만 Google Cloud 콘솔에는 다음 유형만 링크가 있습니다.
BatchPredictionJob
BigQueryJob
CustomJob
DataflowJob
HyperparameterTuningJob
기본 리소스를 취소하는 구성요소 작성
파이프라인 작업이 취소될 때 기본 동작은 기본 Google Cloud 리소스가 계속 실행되는 것입니다. 이러한 리소스는 자동으로 취소되지 않습니다. 이 동작을 변경하려면 SIGTERM 핸들러를 파이프라인 작업에 연결해야 합니다. 이를 수행하기 위한 적합한 위치는 장시간 실행될 수 있는 작업의 폴링 루프 바로 앞입니다.
취소는 다음을 포함한 여러 Google Cloud 파이프라인 구성요소에서 구현되었습니다.
일괄 예측 작업
BigQuery ML 작업
커스텀 작업
Dataproc 서버리스 일괄 작업
초매개변수 조정 작업
SIGTERM 핸들러를 연결하는 방법을 보여주는 샘플 코드를 포함한 상세 설명은 다음 GitHub 링크를 참조하세요.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-07-08(UTC)"],[],[],null,["# Build your own pipeline components\n\n| To learn more,\n| run the \"Custom training workflow with prebuilt Pipeline Components and custom components\" notebook in one of the following\n| environments:\n|\n| [Open in Colab](https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/pipelines/google_cloud_pipeline_components_model_train_upload_deploy.ipynb)\n|\n|\n| \\|\n|\n| [Open in Colab Enterprise](https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Fpipelines%2Fgoogle_cloud_pipeline_components_model_train_upload_deploy.ipynb)\n|\n|\n| \\|\n|\n| [Open\n| in Vertex AI Workbench](https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Fpipelines%2Fgoogle_cloud_pipeline_components_model_train_upload_deploy.ipynb)\n|\n|\n| \\|\n|\n| [View on GitHub](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/pipelines/google_cloud_pipeline_components_model_train_upload_deploy.ipynb)\n\nWrite a component to show a Google Cloud console link\n-----------------------------------------------------\n\nIt's common that when running a component, you want to not only see the link to the component job being launched, but also the link to the underlying cloud resources, such as the Vertex batch prediction jobs or dataflow jobs.\n\nThe [`gcp_resource` proto](https://github.com/kubeflow/pipelines/tree/master/components/google-cloud/google_cloud_pipeline_components/proto) is a special parameter that you can use in your component to enable the Google Cloud console to provide a customized view of the resource's logs and status in the Vertex AI Pipelines console.\n\n### Output the `gcp_resource` parameter\n\n#### Using a container-based component\n\nFirst, you'll need to define the `gcp_resource` parameter in your component as shown in the following example `component.py` file: \n\n### Python\n\nTo learn how to install or update the Vertex AI SDK for Python, see [Install the Vertex AI SDK for Python](/vertex-ai/docs/start/use-vertex-ai-python-sdk).\n\nFor more information, see the\n[Python API reference documentation](/python/docs/reference/aiplatform/latest).\n\n # Copyright 2023 The Kubeflow Authors. All Rights Reserved.\n #\n # Licensed under the Apache License, Version 2.0 (the \"License\");\n # you may not use this file except in compliance with the License.\n # You may obtain a copy of the License at\n #\n # http://www.apache.org/licenses/LICENSE-2.0\n #\n # Unless required by applicable law or agreed to in writing, software\n # distributed under the License is distributed on an \"AS IS\" BASIS,\n # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n # See the License for the specific language governing permissions and\n # limitations under the License.\n from typing import List\n\n from google_cloud_pipeline_components import _image\n from google_cloud_pipeline_components import _placeholders\n from kfp.dsl import container_component\n from kfp.dsl import ContainerSpec\n from kfp.dsl import OutputPath\n\n\n @container_component\n def dataflow_python(\n python_module_path: str,\n temp_location: str,\n gcp_resources: OutputPath(str),\n location: str = 'us-central1',\n requirements_file_path: str = '',\n args: List[str] = [],\n project: str = _placeholders.PROJECT_ID_PLACEHOLDER,\n ):\n # fmt: off\n \"\"\"Launch a self-executing Beam Python file on Google Cloud using the\n Dataflow Runner.\n\n Args:\n location: Location of the Dataflow job. If not set, defaults to `'us-central1'`.\n python_module_path: The GCS path to the Python file to run.\n temp_location: A GCS path for Dataflow to stage temporary job files created during the execution of the pipeline.\n requirements_file_path: The GCS path to the pip requirements file.\n args: The list of args to pass to the Python file. Can include additional parameters for the Dataflow Runner.\n project: Project to create the Dataflow job. Defaults to the project in which the PipelineJob is run.\n\n Returns:\n gcp_resources: Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.\n \"\"\"\n # fmt: on\n return ContainerSpec(\n image=_image.GCPC_IMAGE_TAG,\n command=[\n 'python3',\n '-u',\n '-m',\n 'google_cloud_pipeline_components.container.v1.dataflow.dataflow_launcher',\n ],\n args=[\n '--project',\n project,\n '--location',\n location,\n '--python_module_path',\n python_module_path,\n '--temp_location',\n temp_location,\n '--requirements_file_path',\n requirements_file_path,\n '--args',\n args,\n '--gcp_resources',\n gcp_resources,\n ],\n )\n\n\u003cbr /\u003e\n\nNext, inside the container, install the Google Cloud Pipeline Components package: \n\n pip install --upgrade google-cloud-pipeline-components\n\nNext, in the Python code, define the resource as a `gcp_resource` parameter: \n\n### Python\n\nTo learn how to install or update the Vertex AI SDK for Python, see [Install the Vertex AI SDK for Python](/vertex-ai/docs/start/use-vertex-ai-python-sdk).\n\nFor more information, see the\n[Python API reference documentation](/python/docs/reference/aiplatform/latest).\n\n from google_cloud_pipeline_components.proto.gcp_resources_pb2 import GcpResources\n from google.protobuf.json_format import MessageToJson\n\n dataflow_resources = GcpResources()\n dr = dataflow_resources.resources.add()\n dr.resource_type='DataflowJob'\n dr.resource_uri='https://dataflow.googleapis.com/v1b3/projects/[your-project]/locations/us-east1/jobs/[dataflow-job-id]'\n\n with open(gcp_resources, 'w') as f:\n f.write(MessageToJson(dataflow_resources))\n\n\u003cbr /\u003e\n\n#### Using a Python component\n\nAlternatively, you can return the `gcp_resources` output parameter as you would any string output parameter: \n\n @dsl.component(\n base_image='python:3.9',\n packages_to_install=['google-cloud-pipeline-components==2.19.0'],\n )\n def launch_dataflow_component(project: str, location:str) -\u003e NamedTuple(\"Outputs\", [(\"gcp_resources\", str)]):\n # Launch the dataflow job\n dataflow_job_id = [dataflow-id]\n dataflow_resources = GcpResources()\n dr = dataflow_resources.resources.add()\n dr.resource_type='DataflowJob'\n dr.resource_uri=f'https://dataflow.googleapis.com/v1b3/projects/{project}/locations/{location}/jobs/{dataflow_job_id}'\n gcp_resources=MessageToJson(dataflow_resources)\n return gcp_resources\n\n#### Supported `resource_type` values\n\nYou can set the `resource_type` to be an arbitrary string, but only the following types have links in the Google Cloud console:\n\n- BatchPredictionJob\n- BigQueryJob\n- CustomJob\n- DataflowJob\n- HyperparameterTuningJob\n\nWrite a component to cancel the underlying resources\n----------------------------------------------------\n\nWhen a pipeline job is canceled, the default behavior is for the underlying Google Cloud resources to keep running. They are not canceled automatically. To change this behavior, you should attach a [SIGTERM](https://docs.python.org/3/library/signal.html#signal.SIGTERM) handler to the pipeline job. A good place to do this is just before a polling loop for a job that could run for a long time.\n\nCancellation has been implemented on several Google Cloud Pipeline Components, including:\n\n- Batch prediction job\n- BigQuery ML job\n- Custom job\n- Dataproc Serverless batch job\n- Hyperparameter tuning job\n\nFor more information, including sample code that shows how to attach a SIGTERM handler, see the following GitHub links:\n\n- \u003chttps://github.com/kubeflow/pipelines/blob/google-cloud-pipeline-components-2.19.0/components/google-cloud/google_cloud_pipeline_components/container/utils/execution_context.py\u003e\n- \u003chttps://github.com/kubeflow/pipelines/blob/google-cloud-pipeline-components-2.19.0/components/google-cloud/google_cloud_pipeline_components/container/v1/gcp_launcher/job_remote_runner.py#L124\u003e\n\nConsider the following when implementing your SIGTERM handler:\n\n- Cancellation propagation works only after the component has been running for a few minutes. This is typically due to background startup tasks that need to be [processed](https://docs.python.org/3/library/signal.html#execution-of-python-signal-handlers) before the Python signal handlers are called.\n- Some Google Cloud resources might not have cancellation implemented. For example, creating or deleting a Vertex AI Endpoint or Model could create a long-running operation that accepts a cancellation request through its REST API, but doesn't implement the cancellation operation itself."]]