Restez organisé à l'aide des collections
Enregistrez et classez les contenus selon vos préférences.
Écrire un composant pour afficher un lien Google Cloud Console
Lors de l'exécution d'un composant, il est courant de voir non seulement le lien vers le job de composant lancé, mais également le lien vers les ressources cloud sous-jacentes, telles que les jobs de prédiction par lot Vertex ou les jobs Dataflow.
Le proto gcp_resource est un paramètre spécial que vous pouvez utiliser dans votre composant pour permettre à Google Cloud Console d'offrir une vue personnalisée des journaux et de l'état de la ressource dans la console Vertex AI Pipelines.
Générer le paramètre gcp_resource
Utiliser un composant basé sur un conteneur
Tout d'abord, vous devez définir le paramètre gcp_resource dans votre composant, comme indiqué dans cet exemple de fichier component.py :
# Copyright 2023 The Kubeflow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import List
from google_cloud_pipeline_components import _image
from google_cloud_pipeline_components import _placeholders
from kfp.dsl import container_component
from kfp.dsl import ContainerSpec
from kfp.dsl import OutputPath
@container_component
def dataflow_python(
python_module_path: str,
temp_location: str,
gcp_resources: OutputPath(str),
location: str = 'us-central1',
requirements_file_path: str = '',
args: List[str] = [],
project: str = _placeholders.PROJECT_ID_PLACEHOLDER,
):
# fmt: off
"""Launch a self-executing Beam Python file on Google Cloud using the
Dataflow Runner.
Args:
location: Location of the Dataflow job. If not set, defaults to `'us-central1'`.
python_module_path: The GCS path to the Python file to run.
temp_location: A GCS path for Dataflow to stage temporary job files created during the execution of the pipeline.
requirements_file_path: The GCS path to the pip requirements file.
args: The list of args to pass to the Python file. Can include additional parameters for the Dataflow Runner.
project: Project to create the Dataflow job. Defaults to the project in which the PipelineJob is run.
Returns:
gcp_resources: Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
"""
# fmt: on
return ContainerSpec(
image=_image.GCPC_IMAGE_TAG,
command=[
'python3',
'-u',
'-m',
'google_cloud_pipeline_components.container.v1.dataflow.dataflow_launcher',
],
args=[
'--project',
project,
'--location',
location,
'--python_module_path',
python_module_path,
'--temp_location',
temp_location,
'--requirements_file_path',
requirements_file_path,
'--args',
args,
'--gcp_resources',
gcp_resources,
],
)
Ensuite, à l'intérieur du conteneur, installez le package des composants du pipeline Google Cloud :
from google_cloud_pipeline_components.proto.gcp_resources_pb2 import GcpResources
from google.protobuf.json_format import MessageToJson
dataflow_resources = GcpResources()
dr = dataflow_resources.resources.add()
dr.resource_type='DataflowJob'
dr.resource_uri='https://dataflow.googleapis.com/v1b3/projects/[your-project]/locations/us-east1/jobs/[dataflow-job-id]'
with open(gcp_resources, 'w') as f:
f.write(MessageToJson(dataflow_resources))
Utiliser un composant Python
Vous pouvez également renvoyer le paramètre de sortie gcp_resources comme vous le feriez pour n'importe quel paramètre de sortie de chaîne :
Vous pouvez définir resource_type sur une chaîne arbitraire, mais seuls les types suivants possèdent des liens dans Google Cloud Console :
BatchPredictionJob
BigQueryJob
CustomJob
DataflowJob
HyperparameterTuningJob
Écrire un composant pour annuler les ressources sous-jacentes
Lorsqu'un job de pipeline est annulé, le comportement par défaut consiste à maintenir l'exécution des ressources Google Cloud sous-jacentes. Elles ne sont pas automatiquement annulées. Pour modifier ce comportement, vous devez associer un gestionnaire SIGTERM au job de pipeline. Il est recommandé d'effectuer cette opération juste avant une boucle d'interrogation pour un job susceptible de s'exécuter sur une longue période.
L'annulation a été mise en œuvre dans plusieurs composants du pipeline Google Cloud, y compris :
Job de prédiction par lot
Job BigQuery ML
Job personnalisé
Job par lot Dataproc sans serveur
Job de réglage des hyperparamètres
Pour en savoir plus et obtenir un exemple de code montrant comment associer un gestionnaire SIGTERM, consultez les liens GitHub suivants :
Tenez compte des points suivants lorsque vous mettez en œuvre votre gestionnaire SIGTERM :
La propagation des annulations ne fonctionne qu'après quelques minutes d'exécution du composant. Cela est généralement dû à des jobs de démarrage en arrière-plan qui doivent être traités avant l'appel des gestionnaires de signaux Python.
L'annulation peut ne pas être mise en œuvre pour certaines ressources Google Cloud. Par exemple, la création ou la suppression d'un point de terminaison ou d'un modèle Vertex AI peut créer une opération de longue durée qui accepte une demande d'annulation via son API REST, mais ne met pas en œuvre l'opération d'annulation proprement dite.
Sauf indication contraire, le contenu de cette page est régi par une licence Creative Commons Attribution 4.0, et les échantillons de code sont régis par une licence Apache 2.0. Pour en savoir plus, consultez les Règles du site Google Developers. Java est une marque déposée d'Oracle et/ou de ses sociétés affiliées.
Dernière mise à jour le 2024/07/11 (UTC).
[[["Facile à comprendre","easyToUnderstand","thumb-up"],["J'ai pu résoudre mon problème","solvedMyProblem","thumb-up"],["Autre","otherUp","thumb-up"]],[["Difficile à comprendre","hardToUnderstand","thumb-down"],["Informations ou exemple de code incorrects","incorrectInformationOrSampleCode","thumb-down"],["Il n'y a pas l'information/les exemples dont j'ai besoin","missingTheInformationSamplesINeed","thumb-down"],["Problème de traduction","translationIssue","thumb-down"],["Autre","otherDown","thumb-down"]],["Dernière mise à jour le 2024/07/11 (UTC)."],[],[],null,["# Build your own pipeline components\n\n| To learn more,\n| run the \"Custom training workflow with prebuilt Pipeline Components and custom components\" notebook in one of the following\n| environments:\n|\n| [Open in Colab](https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/pipelines/google_cloud_pipeline_components_model_train_upload_deploy.ipynb)\n|\n|\n| \\|\n|\n| [Open in Colab Enterprise](https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Fpipelines%2Fgoogle_cloud_pipeline_components_model_train_upload_deploy.ipynb)\n|\n|\n| \\|\n|\n| [Open\n| in Vertex AI Workbench](https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Fpipelines%2Fgoogle_cloud_pipeline_components_model_train_upload_deploy.ipynb)\n|\n|\n| \\|\n|\n| [View on GitHub](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/pipelines/google_cloud_pipeline_components_model_train_upload_deploy.ipynb)\n\nWrite a component to show a Google Cloud console link\n-----------------------------------------------------\n\nIt's common that when running a component, you want to not only see the link to the component job being launched, but also the link to the underlying cloud resources, such as the Vertex batch prediction jobs or dataflow jobs.\n\nThe [`gcp_resource` proto](https://github.com/kubeflow/pipelines/tree/master/components/google-cloud/google_cloud_pipeline_components/proto) is a special parameter that you can use in your component to enable the Google Cloud console to provide a customized view of the resource's logs and status in the Vertex AI Pipelines console.\n\n### Output the `gcp_resource` parameter\n\n#### Using a container-based component\n\nFirst, you'll need to define the `gcp_resource` parameter in your component as shown in the following example `component.py` file: \n\n### Python\n\nTo learn how to install or update the Vertex AI SDK for Python, see [Install the Vertex AI SDK for Python](/vertex-ai/docs/start/use-vertex-ai-python-sdk).\n\nFor more information, see the\n[Python API reference documentation](/python/docs/reference/aiplatform/latest).\n\n # Copyright 2023 The Kubeflow Authors. All Rights Reserved.\n #\n # Licensed under the Apache License, Version 2.0 (the \"License\");\n # you may not use this file except in compliance with the License.\n # You may obtain a copy of the License at\n #\n # http://www.apache.org/licenses/LICENSE-2.0\n #\n # Unless required by applicable law or agreed to in writing, software\n # distributed under the License is distributed on an \"AS IS\" BASIS,\n # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n # See the License for the specific language governing permissions and\n # limitations under the License.\n from typing import List\n\n from google_cloud_pipeline_components import _image\n from google_cloud_pipeline_components import _placeholders\n from kfp.dsl import container_component\n from kfp.dsl import ContainerSpec\n from kfp.dsl import OutputPath\n\n\n @container_component\n def dataflow_python(\n python_module_path: str,\n temp_location: str,\n gcp_resources: OutputPath(str),\n location: str = 'us-central1',\n requirements_file_path: str = '',\n args: List[str] = [],\n project: str = _placeholders.PROJECT_ID_PLACEHOLDER,\n ):\n # fmt: off\n \"\"\"Launch a self-executing Beam Python file on Google Cloud using the\n Dataflow Runner.\n\n Args:\n location: Location of the Dataflow job. If not set, defaults to `'us-central1'`.\n python_module_path: The GCS path to the Python file to run.\n temp_location: A GCS path for Dataflow to stage temporary job files created during the execution of the pipeline.\n requirements_file_path: The GCS path to the pip requirements file.\n args: The list of args to pass to the Python file. Can include additional parameters for the Dataflow Runner.\n project: Project to create the Dataflow job. Defaults to the project in which the PipelineJob is run.\n\n Returns:\n gcp_resources: Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.\n \"\"\"\n # fmt: on\n return ContainerSpec(\n image=_image.GCPC_IMAGE_TAG,\n command=[\n 'python3',\n '-u',\n '-m',\n 'google_cloud_pipeline_components.container.v1.dataflow.dataflow_launcher',\n ],\n args=[\n '--project',\n project,\n '--location',\n location,\n '--python_module_path',\n python_module_path,\n '--temp_location',\n temp_location,\n '--requirements_file_path',\n requirements_file_path,\n '--args',\n args,\n '--gcp_resources',\n gcp_resources,\n ],\n )\n\n\u003cbr /\u003e\n\nNext, inside the container, install the Google Cloud Pipeline Components package: \n\n pip install --upgrade google-cloud-pipeline-components\n\nNext, in the Python code, define the resource as a `gcp_resource` parameter: \n\n### Python\n\nTo learn how to install or update the Vertex AI SDK for Python, see [Install the Vertex AI SDK for Python](/vertex-ai/docs/start/use-vertex-ai-python-sdk).\n\nFor more information, see the\n[Python API reference documentation](/python/docs/reference/aiplatform/latest).\n\n from google_cloud_pipeline_components.proto.gcp_resources_pb2 import GcpResources\n from google.protobuf.json_format import MessageToJson\n\n dataflow_resources = GcpResources()\n dr = dataflow_resources.resources.add()\n dr.resource_type='DataflowJob'\n dr.resource_uri='https://dataflow.googleapis.com/v1b3/projects/[your-project]/locations/us-east1/jobs/[dataflow-job-id]'\n\n with open(gcp_resources, 'w') as f:\n f.write(MessageToJson(dataflow_resources))\n\n\u003cbr /\u003e\n\n#### Using a Python component\n\nAlternatively, you can return the `gcp_resources` output parameter as you would any string output parameter: \n\n @dsl.component(\n base_image='python:3.9',\n packages_to_install=['google-cloud-pipeline-components==2.19.0'],\n )\n def launch_dataflow_component(project: str, location:str) -\u003e NamedTuple(\"Outputs\", [(\"gcp_resources\", str)]):\n # Launch the dataflow job\n dataflow_job_id = [dataflow-id]\n dataflow_resources = GcpResources()\n dr = dataflow_resources.resources.add()\n dr.resource_type='DataflowJob'\n dr.resource_uri=f'https://dataflow.googleapis.com/v1b3/projects/{project}/locations/{location}/jobs/{dataflow_job_id}'\n gcp_resources=MessageToJson(dataflow_resources)\n return gcp_resources\n\n#### Supported `resource_type` values\n\nYou can set the `resource_type` to be an arbitrary string, but only the following types have links in the Google Cloud console:\n\n- BatchPredictionJob\n- BigQueryJob\n- CustomJob\n- DataflowJob\n- HyperparameterTuningJob\n\nWrite a component to cancel the underlying resources\n----------------------------------------------------\n\nWhen a pipeline job is canceled, the default behavior is for the underlying Google Cloud resources to keep running. They are not canceled automatically. To change this behavior, you should attach a [SIGTERM](https://docs.python.org/3/library/signal.html#signal.SIGTERM) handler to the pipeline job. A good place to do this is just before a polling loop for a job that could run for a long time.\n\nCancellation has been implemented on several Google Cloud Pipeline Components, including:\n\n- Batch prediction job\n- BigQuery ML job\n- Custom job\n- Dataproc Serverless batch job\n- Hyperparameter tuning job\n\nFor more information, including sample code that shows how to attach a SIGTERM handler, see the following GitHub links:\n\n- \u003chttps://github.com/kubeflow/pipelines/blob/google-cloud-pipeline-components-2.19.0/components/google-cloud/google_cloud_pipeline_components/container/utils/execution_context.py\u003e\n- \u003chttps://github.com/kubeflow/pipelines/blob/google-cloud-pipeline-components-2.19.0/components/google-cloud/google_cloud_pipeline_components/container/v1/gcp_launcher/job_remote_runner.py#L124\u003e\n\nConsider the following when implementing your SIGTERM handler:\n\n- Cancellation propagation works only after the component has been running for a few minutes. This is typically due to background startup tasks that need to be [processed](https://docs.python.org/3/library/signal.html#execution-of-python-signal-handlers) before the Python signal handlers are called.\n- Some Google Cloud resources might not have cancellation implemented. For example, creating or deleting a Vertex AI Endpoint or Model could create a long-running operation that accepts a cancellation request through its REST API, but doesn't implement the cancellation operation itself."]]