Data model and resources

The following sections introduce the data model and terminology that is used to describe Vertex ML Metadata resources and components.

Vertex ML Metadata organizes resources hierarchically, where every resource belongs to a MetadataStore. You must first have a MetadataStore before you can create Metadata resources.

MetadataStore

A MetadataStore is the top-level container for metadata resources. MetadataStore is regionalized and associated with a specific Google Cloud project. Typically, an organization uses one shared MetadataStore for metadata resources within each project.

Metadata resources

Vertex ML Metadata exposes a graph-like data model for representing metadata produced and consumed from ML workflows. The primary concepts are Artifacts, Executions, Events, and Contexts.

Artifact

An Artifact is a discrete entity or piece of data produced and consumed by a machine learning workflow. Examples of Artifacts include input files, transformed datasets, trained models, training logs, and deployed model endpoints.

Execution

An Execution is a record of an individual machine learning workflow step, typically annotated with its runtime parameters. Examples of Executions include data ingestion, data validation, model training, model evaluation, and model deployment.

Event

An Event describes the relationship between Artifacts and Executions. Each Artifact can be produced by an Execution and consumed by other Executions. Events help you to determine the provenance of artifacts in their ML workflows by chaining together Artifacts and Executions.

Context

A Context is used to group Artifacts and Executions together under a single, queryable, and typed category. Contexts can be used to represent sets of metadata. An example of a Context would be a run of a machine learning pipeline.

Shows how artifacts, executions, and context combine into Vertex ML Metadata's graph data model.

MetadataSchema

A MetadataSchema describes the schema for particular types of Artifacts, Executions, or Contexts. MetadataSchemas are used to validate the key-value pairs during creation of the corresponding Metadata resources. Schema validation is only performed on matching fields between the resource and the MetadataSchema. Type schemas are represented using OpenAPI Schema Objects, which should be described using YAML.

The following is an example of how the predefined Model system type is specified in YAML format.

title: system.Model
type: object
properties:
  framework:
    type: string
    description: "The framework type, for example 'TensorFlow' or 'Scikit-Learn'."
  framework_version:
    type: string
    description: "The framework version, for example '1.15' or '2.1'"
  payload_format:
    type: string
    description: "The format of the Model payload, for example 'SavedModel' or 'TFLite'"

The title of the schema must use the format <namespace>.<type name>. Vertex ML Metadata publishes and maintains system-defined schemas for representing common types widely used in ML workflows. These schemas live under the namespace system, and can be accessed as MetadataSchema resources in the API. Schemas are always versioned.

The Metadata resources exposed closely mirror those of the open source implementation of ML Metadata (MLMD).

What's next