Track the lineage of pipeline artifacts

Each pipeline run created using Vertex AI Pipelines has several associated artifacts and parameters, such as models, datasets, pipeline templates, and components. The lineage of a pipeline artifact includes the factors that contributed to its creation, as well as artifacts and metadata derived from the artifact. For example, a model's lineage can include the following:

  • The training, test, and evaluation data used to create the model.

  • The hyperparameters used during model training.

  • Metadata recorded from the training and evaluation process, such as the model's accuracy.

  • Artifacts that descend from this model, such as the results of batch predictions.

You can use this metadata to help answer questions like the following:

  • Why did a certain pipeline run produce an especially accurate model?

  • Which pipeline run produced the most accurate model, and what hyperparameters were used to train the model?

  • Depending on the steps in your pipeline, you might be able to answer system governance questions. For example, you could use metadata to determine which version of your model was in production at a given point in time.

To view and analyze the pipeline artifact lineage, you can use either Vertex ML Metadata or Dataplex.

The following table outlines the differences between Vertex ML Metadata and Dataplex:

Feature Vertex ML Metadata Dataplex
Types of pipeline metadata captured All input and output artifacts produced by a pipeline run. Input and output artifacts that can be mapped to fully qualified names (FQNs) supported by Dataplex, generally by using Google Cloud Pipeline Components.
Geography Single region reads. Global reads, that is, across multiple regions.
Projects Single project reads. Organization-wide reads across multiple projects.
Integrated services Integrated with Vertex AI Pipelines, Vertex AI Experiments, Vertex AI Model Registry, and Datasets. Integrated with multiple Google Cloud products, such as Vertex AI, BigQuery, Cloud Composer, and Dataproc.
Opt-in? No, always on. Opt-in per project by enabling the Data Lineage API.

Map Vertex ML Metadata artifacts to Dataplex

To map Vertex ML Metadata artifacts to FQNs in Dataplex, you need to do the following:

  • Use Google Cloud Pipeline Components while creating Vertex AI models and managed datasets.

  • Use custom schema titles (google.VertexDataset or google.VertexModel) while specifying the model or managed dataset resource name in the metadata field, as illustrated in the following sample:

{
  "name": "projects/example-project/locations/us-central1/metadataStores/default/artifacts/example-artifact",
  "displayName": "My dataset",
  "uri": "https://us-central1-aiplatform.googleapis.com/v1/projects/example-project/locations/us-central1/datasets/example-dataset",
   ...
  "schemaTitle": "google.VertexDataset",
  "schemaVersion": "0.0.1",
  "metadata": {
    "resourceName": "projects/example-project/locations/us-central1/datasets/example-dataset"
  }
}

Analyze the lineage of pipeline artifacts using Vertex ML Metadata

When you run a pipeline using Vertex AI Pipelines, the artifacts and parameters of your pipeline run are stored using Vertex ML Metadata. Vertex ML Metadata makes it easier to analyze the lineage of your pipeline's artifacts, by saving you the difficulty of keeping track of your pipeline's metadata.

If you're new to Vertex ML Metadata, read the introduction to Vertex ML Metadata.

Follow these instructions to view the lineage graph for a pipeline artifact using Vertex ML Metadata:

  1. In the Google Cloud console, in the Vertex AI section, go to the Metadata page.

    Go to Metadata

    The Metadata page lists the artifacts that have been created in the default metadata store.

  2. In the Region drop-down list, select the region that your run was created in.

  3. Click the Display name of an artifact to see its lineage graph.

    A static graph showing the artifacts and executions that are a part of this lineage graph appears.

  4. Click an artifact or execution to learn more about it.

Analyze the lineage of pipeline artifacts using Dataplex

Dataplex Data Catalog discovers metadata from Google Cloud resources, which include Vertex AI Pipelines artifacts like Vertex AI models, managed datasets, and other Google Cloud resources discoverable in Data Catalog. You can discover these artifacts using the metadata search capability of Data Catalog and view their lineage graphs.

For more information about the Data Catalog metadata search capability, see Search and view data assets with Data Catalog.

Note that Data Catalog might not be available in all regions where Vertex AI Pipelines is supported. If Data Catalog is unsupported in your region, use Vertex ML Metadata. View the list of supported regions for Data Catalog.

Follow these instructions to view the lineage graph for a pipeline artifact on Dataplex:

  1. To launch a Dataplex search query in the Google Cloud console, go to the Dataplex Search page.

    Go to Dataplex Search

  2. Use the filters to search for the artifacts. For example, you can use the Data types filter to specify the type of artifact, such as model, dataset, or BigQuery table. For more information about the Data Catalog search, see Search for data assets

    You can also define your query in the search field.

  3. To view the lineage of an artifact, click the name of the artifact, and then click the Lineage tab.

    On the lineage graph, Vertex AI processes are preceded by Vertex AI lineage icon. These include pipeline artifacts, pipeline components, and pipeline templates.

    • To view the details of a process, click the process in the lineage graph.

    • For processes based on pipeline tasks from pipeline runs, you can do the following:

      • View the pipeline run in Vertex AI by clicking Open in Vertex AI in the Details tab. To view the runtime details of a pipeline run, such as states, timestamps, and attributes, click More. To view the pipeline run in Vertex AI, click Open in Vertex AI.
    • For processes based on a pipeline template, you can do the following:

      • View the template details in Vertex AI by clicking Open in Vertex AI in the Details tab.

      • View the list of pipeline tasks created in pipeline runs in the Runs tab. To view the details of the pipeline template in Vertex AI, click More, and then click Open in Vertex AI.

What's next