Vertex ML Metadata organizes resources hierarchically, where every resource belongs to a MetadataStore. You must first have a MetadataStore before you can create Metadata resources.
Vertex ML Metadata terminology
The following introduces the data model and terminology that is used to describe Vertex ML Metadata resources and components.
- A MetadataStore is the top-level container for metadata resources. MetadataStore is regionalized and associated with a specific Google Cloud project. Typically, an organization uses one shared MetadataStore for metadata resources within each project.
- Vertex ML Metadata exposes a graph-like data model for representing metadata produced and consumed from ML workflows. The primary concepts are artifacts, executions, events, and contexts.
- An artifact is a discrete entity or piece of data produced and consumed by a machine learning workflow. Examples of artifacts include datasets, models, input files, and training logs.
- A context is used to group artifacts and executions together under a single, queryable, and typed category. Contexts can be used to represent sets of metadata. An example of a Context would be a run of a machine learning pipeline.
A Vertex AI Pipelines pipeline run. In this case, the context represents one run and each execution represents a step in the ML pipeline.
An experiment run from a Jupyter notebook. In this case, the context could represent the notebook and each execution could represent a cell in that notebook.
- An event describes the relationship between artifacts and executions. Each artifact can be produced by an execution and consumed by other executions. Events help you to determine the provenance of artifacts in their ML workflows by chaining together artifacts and executions.
- An execution is a record of an individual machine learning workflow step, typically annotated with its runtime parameters. Examples of executions include data ingestion, data validation, model training, model evaluation, and model deployment.
- A MetadataSchema describes the schema for particular types of artifacts, executions, or contexts. MetadataSchemas are used to validate the key-value pairs during creation of the corresponding Metadata resources. Schema validation is only performed on matching fields between the resource and the MetadataSchema. Type schemas are represented using OpenAPI Schema Objects, which should be described using YAML.
Type schemas are represented using OpenAPI Schema Objects, which should be described using YAML.
The following is an example of how the predefined
Model system type is
specified in YAML format.
title: system.Model type: object properties: framework: type: string description: "The framework type, for example 'TensorFlow' or 'Scikit-Learn'." framework_version: type: string description: "The framework version, for example '1.15' or '2.1'" payload_format: type: string description: "The format of the Model payload, for example 'SavedModel' or 'TFLite'"
The title of the schema must use the format
Vertex ML Metadata publishes and maintains system-defined schemas for
representing common types widely used in ML workflows. These schemas live under
system, and can be accessed as MetadataSchema resources in the
API. Schemas are always versioned.
The Metadata resources exposed closely mirror those of the open source implementation of ML Metadata (MLMD).