Metadata
Metadata is a key concept in Manufacturing Data Engine (MDE). It represents contextual data about facts. For example, sensor readings or events. Metadata helps answer questions such as the following:
- What tag emitted a numeric reading?
- What product was being processed at the time a numeric reading as taken?
- What device does a sensor belong to?
- What shift was in progress at the time an event happened?
- What recipe was active at the time of a reading?
MDE distinguishes between two types of metadata based on their pace of change:
- Slowly changing cloud metadata.
- Rapidly changing embedded metadata.
Cloud metadata
Slowly changing metadata represents contextual data that remains unchanged for an extended period of time, for example, asset context that describes the machine, cell, line, and plant of a given sensor. MDE lets you model, manage and explore your slowly changing metadata and link it to records. After metadata is linked to records, you can explore your records using the associated context.
Slowly changing metadata in MDE is called cloud metadata. Cloud metadata serves two functions in the solution:
- To contextualize and categorize records.
- To serve as a source of versioned master data about manufacturing entities, such as sensors, devices, and lines.
MDE allows cloud metadata to be sourced from the edge, be created manually using the MDE web interface, or be created programmatically using the API. The latter lets you to source metadata from existing Enterprise Asset Management (EAM) or Master Data Management (MDM) systems.
Metadata buckets
Cloud metadata buckets (also called "buckets" or "metadata buckets") are configuration entities that model a related set of slowly changing contextual data. For example, a bucket can model the attributes of a tag or recipe. Buckets can be thought of as data dimensions in the data analytics domain.
The key attribute of a metadata bucket is its schema. The schema (expressed as a JSON schema object) defines and constrains the structure of metadata instances contained within it. You can create a new metadata bucket version, but new versions must adhere to Cloud metadata bucket versioning rules.
Buckets are global, this means that they can be referenced by any type.
Metadata instances
Metadata instances represent the "contents" of cloud metadata buckets. Each instance describes some entity, such as an asset, process or aspect of the records being captured. Instances have two types of identifiers:
- A system-generated UUID (Universally Unique Identifier) that identifies the instance within MDE.
- A natural key that identifies the entity outside of MDE (for example, a serial number of a sensor).
The metadata instances are versioned on the natural key. That means, MDE keeps track of the evolution of attributes for a given natural key. For example, a tag with a natural key "tag-123" might start off being in cell "X", but later moved to cell "Y". MDE stores and timestamps each instance and gives it a unique UUID. This unique UUID lets you retrieve the history of instances for a natural key, contextualize records with the right instance at ingestion, as well as retroactively apply an instance to past records at query time.
MDE only allows adding instances to a bucket that comply with the schema of a specific version of that bucket.
Metadata bucket schema
Each metadata bucket version contains a schema, and metadata instances can only be added to a specific version of a bucket. Schemas further constrain the structure of metadata instances that can be added to a bucket version.
Metadata bucket schemas are expressed as JSON schema objects in accordance with the 2019-09 version of the JSON schema specification.
For example, if the schema later was added to a bucket version, it would state
that the each instance object must have a property called deviceName
with
string
value, and this property is required. See the following example:
{
"$schema": "https://json-schema.org/draft/2019-09/schema#",
"type": "object",
"properties": {
"deviceName": {
"type": "string"
}
},
"required": ["deviceName"]
}
Metadata instance validation
Metadata instances must comply with the schema defined for a specific metadata bucket version in order to be inserted.
Types of buckets
MDE defines three types of buckets.
- Tag buckets
- Record buckets
- Lookup buckets
The type of a bucket is defined when its created and can't be changed after.
Tag buckets
Tag buckets represent buckets that contextualize tags. This means that the natural key of the instances contained within the bucket must be the tag name.
Record buckets
Record buckets represent buckets that can contextualize any group of records that share a common natural key. The natural key of record bucket instances can be any value.
Lookup buckets
Lookup buckets represent buckets that don't contextualize records directly, but instead provide reference data that can be used in the parser. The natural key of lookup bucket instances can be any value.
Record bucket instances are never linked to records. Instead, instances can be
retrieved from a lookup bucket by calling the
mde::lookupByKey
function in a Whistle script. The function takes the lookup bucketName
,
bucketVersion
, and naturalKey
as arguments, and returns the latest metadata
instance for the provided natural key. You can use the instance to populate
fields in a proto record in the parser.
Versioning metadata buckets
The schema of metadata buckets can evolve, however, you must create a new version of a bucket to modify the schema. Existing bucket versions and any existing configuration entities that reference prior versions of a bucket are not impacted by this operation. To ensure data consistency across the lifetime of a metadata bucket, new metadata bucket schemas versions are subject to the following restrictions:
New versions may:
- Add new optional fields.
- Mark a required field optional.
New versions may not:
- Remove fields.
- Change data type of existing fields.
- Mark an optional attribute required.
Linking cloud metadata instances to records
Adding context to a record involves linking a record to a metadata instance. This is accomplished by storing a reference to the UUID of the metadata instance in the record. MDE provides two ways of creating this link in the parser:
- By providing an instance's natural key.
- By providing a proto metadata instance.
For example, the BigQuery data sink stores references to metadata
instances per bucket in a field called cloud_metadata_ref
. Here is
an example of how a metadata instance reference appears in a BigQuery
record:
{
"id": "e4b66cb9-7c60-4473-b1a1-1954eca92405",
"tag_name": "primepaintingrobot-01-airpressure",
"type_version": "1",
"event_timestamp": "2023-06-20 07:11:59.757000 UTC",
"value": "762.53",
"embedded_metadata": {},
"materialized_cloud_metadata": {
"device-metadata": {
"deviceName": "example-device"
}
},
"cloud_metadata_ref": {
"device-metadata": {
"bucket_number": 143,
"bucket_version": 1,
"instance_id": "50e156a0-dbd9-4f9b-bdc8-1e77574bc4b1"
}
},
"ingest_timestamp": "2023-06-20 07:12:06.335000 UTC",
"source_message_id": "8434396321424812"
}
Linking a record to a cloud metadata instance using natural key
You can link a record to a metadata instance by providing, in the parser, a reference
to a cloud metadata bucket version and the natural key of the instance in the proto record.
MDE automatically exchanges the natural key for
the instance's UUID, if one exists, and stores the link in the record. If there
are multiple instances for the provided natural key, MDE
picks the most recent instance (instance with the most recent created_timestamp
).
If the referenced bucket is a TAG
bucket, providing a natural key is optional.
If the natural key is omitted, MDE uses the
value of the tagName
field by default.
For information about how to link records to metadata instances using a natural
key, see
Resolving a metadata instance_id
by natural key.
Linking a record to a cloud metadata instance using a proto metadata instance
You can link a record to a metadata instance by providing a reference to a cloud metadata bucket version and supplying a proto metadata instance and, optionally, a natural key in the proto record in the parser. This method of linking metadata instances is particularly useful if source messages already contain contextual information to construct a valid proto instance.
Consider the following when linking a record to a cloud metadata instance using a proto metadata instance:
- If you omit the natural key, MDE automatically picks one for you depending on the bucket type.
- If you omit the natural key in a proto instance within the context of a
TAG
bucket, MDE automatically picks thetagName
as the natural key. - If you omit the natural key in a proto instance within the context of a
RECORD
bucket, MDE automatically generates a hash value of the message object and uses that as the natural key. - If the supplied proto instance matches the most recent metadata instance for the provided natural key, MDE exchanges the supplied proto instance for the UUID of the matched instance and stores the UUID in the record.
- If the supplied proto instance doesn´t match the most metadata instance for the provided natural key, MDE creates a new metadata instance for the provided natural key and stores the UUID of the newly created instance in the record. This behavior of the system lets you dynamically populate metadata buckets with instances generated from source messages.
For information about how to link records to metadata instances using a proto metadata instance, see Resolving a metadata instance ID by instance value.
Instance materialization
Instead of just storing the UUID of a metadata instance, records can optionally include
the entire instance itself. This is called materialization. This behavior can be configured
for each sink at the type level, by setting the value of the materializeCloudMetadata
field for a sink to true
.
For example, enabling metadata materialization for the BigQuery sink would produce a row like this one for a record that contains a metadata instance reference:
{
"id": "e4b66cb9-7c60-4473-b1a1-1954eca92405",
"tag_name": "primepaintingrobot-01-airpressure",
"type_version": "1",
"event_timestamp": "2023-06-20 07:11:59.757000 UTC",
"value": "762.53",
"embedded_metadata": {},
"materialized_cloud_metadata": {},
"cloud_metadata_ref": {
"device-metadata": {
"bucket_number": 143,
"bucket_version": 1,
"instance_id": "50e156a0-dbd9-4f9b-bdc8-1e77574bc4b1"
}
},
"ingest_timestamp": "2023-06-20 07:12:06.335000 UTC",
"source_message_id": "8434396321424812"
}
Embedded metadata
Rapidly changing metadata represents contextual data that changes at a fast pace. Typical examples for rapidly changing metadata include counters and IDs that are auto-incremented, for example, serial numbers or transaction IDs.
MDE lets you to structure, harmonize, and transform
rapidly changing metadata using Whistle, and embed it directly in the record by
populating a field called embeddedMetadata
in the proto record in the
parser.
All the supported MDE data sinks make embedded
metadata available. For example, populating the embeddedMetadata
field in the proto
record in the parser would produce a row like this one for the resulting record
in BigQuery:
{
"id": "e4b66cb9-7c60-4473-b1a1-1954eca92405",
"tag_name": "primepaintingrobot-01-airpressure",
"type_version": "1",
"event_timestamp": "2023-06-20 07:11:59.757000 UTC",
"value": "762.53",
"embedded_metadata": {
"transactionNumber": "1234"
},
"materialized_cloud_metadata": {},
"cloud_metadata_ref": {},
"ingest_timestamp": "2023-06-20 07:12:06.335000 UTC",
"source_message_id": "8434396321424812"
}
Metadata automatic deletion
For both, the record and tag metadata, MDE keeps track of the changes happening in each natural-key by comparing every new instance against the old instance. When there is a change in any of the instance attributes, MDE creates a new version and marks it as the latest effective instance. By design, the tag and record metadata are expected to be in the granularity of thousands and less than hundred thousand. These restrictions allow MDE to index the metadata instances as they come from the edge or the API without impacting the processing throughput.
Sometimes due to a configuration errors, the parser will inject a high cardinality field like a timestamp in the metadata instance, which results in a quick proliferation of versions for each natural-key. After a certain threshold, this negatively impacts the performance of the ingestion. In some cases it might lead to a stop of the processing altogether until the underlying cloud infrastructure services are scaled by the solution administrator.
From v1.4.0 MDE enforces a maximum number of instances per natural-key to ensure a consistent performance. When the natural-keys number approach this threshold (default is 200), MDE will send warning to the new notifications API to inform the user of the natural-keys that a have high number of metadata instance versions. When the natural-keys instance size breaches the threshold, MDE will delete the old instances from the internal store automatically. It will also send another notification to the Notifications API informing the user of the natural-keys that have been deleted.
Both the warning and the deletion activities are also reported in the log which could be used to create an alert policy on the project Cloud Monitoring.
Naming restrictions for Metadata Buckets
A Metadata Bucket name can contain the following:
- Letters (uppercase and lowercase), numbers and the special characters
-
and_
. - Can be up to 255 characters long.
You can use the following regular expression for validation: ^[a-z][a-z0-9\\-_]{1,255}$
.
If you try to create an entity violating the naming restrictions you will get a 400 error
.