System schemas

Each metadata resource is associated with a specific MetadataSchema. To simplify the metadata resource creation process Vertex ML Metadata publishes predefined types called system schemas for common ML concepts. System schemas live under the namespace system. You can access system schemas as MetadataSchema resources in the Vertex ML Metadata API. Schemas are always versioned. The format of system schemas is a subset of the OpenAPI 3.0 specification.

How to use system schemas

Vertex AI uses system schemas to create metadata resources for tracking your ML workflows. You can then filter and group resources in metadata queries by using the schema_title field. For more information about how to use filter functions, see Analyze ML Metadata.

You can also use system schemas through the Vertex ML Metadata API to create metadata resources directly. You can identify a system schema by its schema title and schema version. Fields in system schemas are always considered optional. Users aren't restricted to the predefined fields of system schemas and can also log additional arbitrary metadata to any metadata resource. For more information about using system schemas to create metadata resources, see Track ML Metadata.

Strict schema matching

Vertex ML Metadata supports two flags that allow schema authors to enforce strict schema matching.

additionalProperties

The additionalProperties value can be true or false. Consistent with JSON Schema, additionalProperties defaults to true. This flag is set at the top level of the schema. If it's set to false, no optional properties are allowed. For example in the below schema, only the fields payload_format and container_format are accepted in the metadata based on this schema.

title: system.Dataset
version: 0.0.1
type: object
additionalProperties: false
properties:
  container_format:
    type: string
  payload_format:
    type: string

The above schema accepts the following metadata:

fields {
  key: 'container_format'
  value: { string_value: 'Text' }
}
fields {
  key: 'payload_format'
  value: { string_value: 'CSV' }
}

However following metadata will be rejected:

fields {
  key: 'container_format'
  value: { string_value: 'Text' }
}
fields {
  key: 'payload_format'
  value: { string_value: 'CSV' }
}
fields {
  key: 'optional_field'
  value: { string_value: 'optional_value' }
}

required

The required keyword takes an array of zero or more strings. Consistent with JSON Schema, the properties defined by the properties keyword are not required. You can provide a list of required properties using the required keyword. For example the following schema always requires container_format. Works on nested properties as well. For example following makes the container_format required.

title: system.Dataset
version: 0.0.1
type: object
required: ['container_format']
properties:
  container_format:
    type: string
  payload_format:
    type: string

The above schema accepts the following metadata:

fields {
  key: 'container_format'
  value: { string_value: 'Text' }
}

However following metadata will be rejected:

fields {
  key: 'payload_format'
  value: { string_value: 'CSV' }
}

The schema supports nested properties where properties has a field of type object. In a nested schema, the nested properties node can have a required keyword. For example:

title: system.Dataset
version: 0.0.1
type: object
properties:
  container_format:
    type: string
  payload:
    type: string
  nested_property:
    type: object
    required: ['property_1']
    properties:
      property_1:
        type: integer
      property_2:
        type: integer

The above schema accepts the following metadata, since the nested_property field itself is not required.

fields {
  key: 'container_format'
  value: { string_value: 'Text' }
}

Following metadata is also valid.

fields {
  key: 'nested_property'
  value: {
    struct_value {
      fields {
        key: 'property_1'
        value: { number_value: 1 }
      }
      fields {
        key: 'property_2'
        value: { number_value: 1 }
      }
    }
  }
}

However following metadata will be rejected:

fields {
  key: 'nested_property'
  value: {
    struct_value {
      fields {
        key: 'property_2'
        value: { number_value: 1 }
      }
    }
  }
}

System schema examples

The following examples are common system schemas that are available for immediate use.

Artifact

system.Artifact is a generic schema that can hold metadata about any artifact. No specific fields are defined in this schema.

title: system.Artifact
version: 0.0.1
type: object

Dataset

system.Dataset represents a container of data that was either consumed or produced by an ML workflow step. A dataset can point to either a file location or a query, for example a BigQuery URI.

title: system.Dataset
version: 0.0.1
type: object
properties:
  container_format:
    type: string
    description: "Format of the container. Examples include 'TFRecord', 'Text', or 'Parquet'."
  payload_format:
    type: string
   description: "Format of the payload. For example, 'proto:TFExample', 'CSV', or 'JSON'."

Model

system.Model represents a trained model. The URI of the model can point to a file location (PPP, Cloud Storage bucket, local drive) or an API resource such as the Model resource in Vertex AI API.

title: system.Model
version: 0.0.1
type: object
properties:
  framework:
    type: string
    description: "The framework type. For example: 'TensorFlow' or 'Scikit-Learn'."
  framework_version:
    type: string
    description: "The framework version. For example: '1.15' or '2.1'."
  payload_format:
    type: string
    description: "The format of the Model payload, for example: 'SavedModel' or 'TFLite'."

Metrics

system.Metrics represents evaluation metrics produced during an ML workflow. Metrics are application and use case dependent and can consist of simple scalar metrics like accuracy or complex metrics that are stored elsewhere in the system.

title: system.Metrics
version: 0.0.1
type: object
properties:
  type:
  accuracy:
    type: number
    description: "Optional summary metric describing accuracy of a model."
  precision:
    type: number
    description: "Optional summary metric describing precision of a model."
  recall:
    type: number
    description: "Optional summary metric describing the recall of a model."
  f1score:
    type: number
    description: "Optional summary metric describing the f1-score of a model."
  mean_absolute_error:
    type: number
    description: "Optional summary metric describing the mean absolute error of a model."
  mean_squared_error:
    type: number
    description: "Optional summary metric describing the mean-squared error of a model."

What's Next?