Parser

Parsers are configuration entities that define how messages from a specific source message class are parsed and transformed to records of a specific target type.

source-to-target

A parser configuration has the following three components:

  1. Source message class association: A parser processes source messages from a single source message class. For more information, see Source Message classes.
  2. Type version association: A parser emits proto records of a single type version. The type version configuration defines what fields must be present on the emitted proto record and dictates their structure (schema). For more information, see Type.
  3. Whistle script: Whistle scripts define how to transform source messages into proto records using mapping, parsing, and transformation logic. Whistle scripts are written by users; however, Manufacturing Data Engine (MDE) provides configuration packages for typical use-cases. For more information, see the following section.

Whistle definition

Whistle is a mapping language that can be used to convert complex, nested data from one schema to another. In manufacturing, data models can be complex and can contain many nested and repeated structures. This makes it difficult to express mapping logic in a procedural format. Whistle addresses this issue by providing a declarative language that lets you to define mapping and transformation logic in a natural way.

For example, you could use Whistle to harmonize several sensor data models across different factories into a unified MDE type model. The source data might contain nested structures, such as a list of components or a hierarchy of features. Whistle lets you express the mapping logic for these nested structures in a natural way, without having to write procedural code.

Whistle also supports functions which lets you break down complex mappings involving repeated structures into functions. This makes it easier to understand and maintain your mapping code, and it also makes it easier to reuse code.

Whistle provides benefits over procedural approaches. See the following example:

Given this sample payload:

{
  "payload": {
    "tag": "vibration-sensor"
  },
  "details": {
    "value": 0.24,
    "timestamp": "2023-06-26 12:19:20.046000 UTC"
  }
}

And the following Whistle script:

package mde

[{
    tagName: $root.payload.tag
    timestamps: {
        eventTimestamp: $root.details.timestamp
    }
    data: {
        numeric: $root.details.value
    }
}]

Applying the previous Whistle script, the parser would produce a proto record like the following:

[
  {
    "tagName": "vibration-sensor",
    "timestamps": {
      "eventTimestamp": "2023-06-26 12:19:20.046000 UTC"
    },
    "data": {
      "value": 0.24
    }
  }
]

For more information about the Whistle language syntax and available functions, see Whistle documentation.

Runtime behavior of parsers

At runtime the parser receives all messages of the source message class with which it is associated, applies the configured Whistle script on each message, and emits one or more proto records of the configured type.

The emitted proto records must comply with the type configuration. If they don't comply, they are moved to the dead letter queue.

Association rules

A parser can only be associated to a single message class and a single type version. However, a parser may emit one or more records as long as they are of the type version to which the parser is linked. The output of a parser is an array of proto record objects.

parser-multiple-proto-records

Emitting more than one record from a parser is useful in scenarios where a source message contains an array of readings or events that you want to de-aggregate. Parsers allow you to "split" the source message into multiple proto records so that each reading, for example, becomes a row in a record table in BigQuery.

The emitted proto records may reference any tag name. This behavior is a change from v1.1 and v1.2 where tag names were scoped to the type. After v1.3 MDE proto records emitted by any parser can

Proto record schema

The JSON schema that the proto records must conform to depends on:

  1. Archetype: The specific archetype associated with the type.
  2. Type Configuration: The configuration settings defined for the type.

The archetype defines the base schema for records. For example, a type of in the discrete archetype family requires that proto records contain values for the following attributes:

  • tagName
  • timestamps.eventTimestamp
  • data.complex

The type can impose further restrictions on the proto records. For example, you can define a schema for the data field, or you can require that proto records provide a reference to a metadata instance.

For more information, see the proto record reference.

Reference data lookup

MDE provides a custom Whistle function to lookup a value for a provided key from a lookup bucket.

You can look up a look up bucket instance by its natural key by calling the mde::lookupByKey function in a Whistle script. The function takes the lookupbucketName, bucketVersion, and naturalKey of the instance as the arguments, and returns the latest metadata instance for the provided natural key. You can use the instance to populate fields in a proto record in the parser. For example:

"data" : {
  "complex" : {
    "VIN" : mde::lookupByKey("vin-lookup-bucket", input.vinKey, 1).VIN,
    "vin_registration_time" : mde::lookupByKey("vin-lookup-bucket", input.vinKey, 1).vin_registration_time,
    "ResultValue" : 163.0482614,
  }
}

Naming restrictions for Parsers

A Parser name can contain the following:

  • Letters (uppercase and lowercase), numbers and the special characters - and _.
  • Can be up to 255 characters long.

You can use the following regular expression for validation: ^[a-z][a-z0-9\\-_]{1,255}$.

If you try to create an entity violating the naming restrictions you will get a 400 error.

You can use the mde::sanitizeTagName() function to ensure that your name is compliant with the naming restrictions. For more details see the Whistle MDE functions.