Prova i modelli Gemini 1.5, gli ultimi modelli multimodali di Vertex AI, e scopri cosa puoi creare con una finestra contestuale fino a 2 milioni di token. Prova i modelli Gemini 1.5, i più recenti modelli multimodali di Vertex AI, e scopri cosa puoi creare con una finestra contestuale fino a 2 milioni di token.

Questa pagina è stata tradotta dall'API Cloud Translation.

Crea i tuoi componenti della pipeline

Scrivi un componente per mostrare un link alla console Google Cloud

È normale che, quando esegui un componente, tu voglia non solo vedere il link al job del componente che viene avviato, ma anche il link alle risorse cloud sottostanti, come i job di previsione batch di Vertex o Dataflow.

Il protocollo gcp_resource è un parametro speciale che puoi utilizzare nel componente per consentire alla console Google Cloud di fornire una visualizzazione personalizzata dei log e dello stato della risorsa nella console Vertex AI Pipelines.

Output del parametro `gcp_resource`

Utilizzo di un componente basato su container

Per prima cosa, devi definire il parametro gcp_resource nel componente, come mostrato nel seguente esempio di file component.py:

Python

Per scoprire come installare o aggiornare l'SDK Vertex AI per Python, vedi Installare l'SDK Vertex AI per Python. Per maggiori informazioni, consulta la documentazione di riferimento dell'API Python.

# Copyright 2023 The Kubeflow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import List

from google_cloud_pipeline_components import _image
from google_cloud_pipeline_components import _placeholders
from kfp.dsl import container_component
from kfp.dsl import ContainerSpec
from kfp.dsl import OutputPath


@container_component
def dataflow_python(
    python_module_path: str,
    temp_location: str,
    gcp_resources: OutputPath(str),
    location: str = 'us-central1',
    requirements_file_path: str = '',
    args: List[str] = [],
    project: str = _placeholders.PROJECT_ID_PLACEHOLDER,
):
  # fmt: off
  """Launch a self-executing Beam Python file on Google Cloud using the
  Dataflow Runner.

  Args:
      location: Location of the Dataflow job. If not set, defaults to `'us-central1'`.
      python_module_path: The GCS path to the Python file to run.
      temp_location: A GCS path for Dataflow to stage temporary job files created during the execution of the pipeline.
      requirements_file_path: The GCS path to the pip requirements file.
      args: The list of args to pass to the Python file. Can include additional parameters for the Dataflow Runner.
      project: Project to create the Dataflow job. Defaults to the project in which the PipelineJob is run.

  Returns:
      gcp_resources: Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
  """
  # fmt: on
  return ContainerSpec(
      image=_image.GCPC_IMAGE_TAG,
      command=[
          'python3',
          '-u',
          '-m',
          'google_cloud_pipeline_components.container.v1.dataflow.dataflow_launcher',
      ],
      args=[
          '--project',
          project,
          '--location',
          location,
          '--python_module_path',
          python_module_path,
          '--temp_location',
          temp_location,
          '--requirements_file_path',
          requirements_file_path,
          '--args',
          args,
          '--gcp_resources',
          gcp_resources,
      ],
  )

Quindi, all'interno del container, installa il pacchetto dei componenti della pipeline di Google Cloud:

pip install --upgrade google-cloud-pipeline-components

Quindi, nel codice Python, definisci la risorsa come parametro gcp_resource:

Python

Per scoprire come installare o aggiornare l'SDK Vertex AI per Python, vedi Installare l'SDK Vertex AI per Python. Per maggiori informazioni, consulta la documentazione di riferimento dell'API Python.

from google_cloud_pipeline_components.proto.gcp_resources_pb2 import GcpResources
from google.protobuf.json_format import MessageToJson

dataflow_resources = GcpResources()
dr = dataflow_resources.resources.add()
dr.resource_type='DataflowJob'
dr.resource_uri='https://dataflow.googleapis.com/v1b3/projects/[your-project]/locations/us-east1/jobs/[dataflow-job-id]'

with open(gcp_resources, 'w') as f:
    f.write(MessageToJson(dataflow_resources))

Utilizzo di un componente Python

In alternativa, puoi restituire il parametro di output gcp_resources come faresti con qualsiasi parametro di output della stringa:

@dsl.component(
    base_image='python:3.9',
    packages_to_install=['google-cloud-pipeline-components==2.15.0'],
)
def launch_dataflow_component(project: str, location:str) -> NamedTuple("Outputs",  [("gcp_resources", str)]):
  # Launch the dataflow job
  dataflow_job_id = [dataflow-id]
  dataflow_resources = GcpResources()
  dr = dataflow_resources.resources.add()
  dr.resource_type='DataflowJob'
  dr.resource_uri=f'https://dataflow.googleapis.com/v1b3/projects/{project}/locations/{location}/jobs/{dataflow_job_id}'
  gcp_resources=MessageToJson(dataflow_resources)
  return gcp_resources

Valori di `resource_type` supportati

Puoi impostare resource_type come stringa arbitraria, ma solo i seguenti tipi hanno link nella console Google Cloud:

BatchPredictionJob
BigQueryJob
CustomJob
DataflowJob
HyperparameterTuningJob

Scrivi un componente per annullare le risorse sottostanti

Quando un job della pipeline viene annullato, il comportamento predefinito prevede che le risorse Google Cloud sottostanti continuino a essere in esecuzione. Non vengono annullati automaticamente. Per modificare questo comportamento, devi collegare un gestore SIGTERM al job della pipeline. Un buon punto per farlo è appena prima di un loop di polling per un job che potrebbe essere eseguito per molto tempo.

L'annullamento è stato implementato su diversi componenti della pipeline di Google Cloud, tra cui:

Job di previsione batch
Job BigQuery ML
Job personalizzato
Job batch serverless Dataproc
Job di ottimizzazione degli iperparametri

Per maggiori informazioni, incluso un codice campione che mostra come collegare un gestore SIGTERM, consulta i seguenti link GitHub:

Tieni presente quanto segue quando implementi il gestore SIGTERM:

La propagazione dell'annullamento funziona solo dopo che il componente è stato in esecuzione per alcuni minuti. Ciò è generalmente dovuto ad attività di avvio in background che devono essere elaborate prima che vengano chiamati i gestori di segnali Python.
L'annullamento potrebbe non essere implementato per alcune risorse Google Cloud. Ad esempio, la creazione o l'eliminazione di un endpoint o di un modello Vertex AI potrebbe creare un'operazione a lunga esecuzione che accetta una richiesta di annullamento tramite la relativa API REST, ma non implementa l'operazione di annullamento stessa.

Crea i tuoi componenti della pipeline

Scrivi un componente per mostrare un link alla console Google Cloud

Output del parametro gcp_resource

Utilizzo di un componente basato su container

Python

Python

Utilizzo di un componente Python

Valori di resource_type supportati

Scrivi un componente per annullare le risorse sottostanti

Output del parametro `gcp_resource`

Valori di `resource_type` supportati