BigQuery schemas
Manufacturing Data Engine (MDE) distinguishes between three different types of tables in BigQuery:
- Record tables
- Metadata tables
- System tables (which include configuration, logs, and raw messages)
Record tables
MDE creates a record table in BigQuery for
each Type. Record tables are stored in the mde_data
dataset. The
BigQuery schema of record tables depends on the Archetype.
Nr | Archetype* | Field | Data Type | Description |
---|---|---|---|---|
1 | All | id | string | Row identifier |
2 | All | tag_name | string | Name of the tag configuration entity/ natural key in tag buckets |
3 | All | type_version | integer | Type version |
4 | All | embedded_metadata | JSON | Stores rapidly changing data in a record |
5 | All | cloud_metadata_ref | JSON | Stores a foreign key (UUIDs) to a metadata instance for bucket |
6 | All | materialized_cloud_metadata | JSON | Stores metadata instances in the record that are retrieved from the foreign keys stored in cloud_metadata_ref |
7 | All | ingest_timestamp | timestamp | The timestamp when a record was written to BigQuery |
8 | All | source_message_id | string | Source message Id (Pub/Sub assigned) |
9 | DDS-S, DDS-C, CDS-S, CDS-C | data | JSON | Object holding the data payload |
10 | NDS | value | numeric | Measurement value |
11 | DDS-S, DDS-C, NDS-S, NDS-C | event_timestamp | timestamp | Event timestamp |
12 | CDS-S, CDS-C | event_timestamp_start | timestamp | Event timestamp start |
13 | CDS-S, CDS-C | event_timestamp_end | timestamp | Event timestamp end |
14 | CDS-S, CDS-C | duration | integer | Duration |
15 | DDS-C, NDS-C, CDS-C | clustered_column_1 | string | Generic cluster column |
16 | DDS-C, NDS-C, CDS-C | clustered_column_2 | string | Generic cluster column |
17 | DDS-C, NDS-C, CDS-C | clustered_column_3 | string | Generic cluster column |
18 | DDS-C, NDS-C, CDS-C | clustered_column_4 | string | Generic cluster column |
*Archetype reference:
- DDS-S: Discrete Data Series Standard.
- DDS-C: Discrete Data Series Clustered.
- DDS-S: Discrete Data Series Standard.
- DDS-C: Discrete Data Series Clustered.
- NDS-S: Numeric Data Series Standard.
Partitioning
Record tables in the Discrete Data Series and Numeric Data Series archetype
families are partitioned by day on the event_timestamp
field.
Record tables in the Continuous Data Series archetype family are partitioned by
day on the event_timestamp_start
field.
Clustering
Record tables of clustered archetypes are automatically clustered by
clustered_column_1
, clustered_column_2
, clustered_column_3
, and
clustered_column_4
in that order upon creation. You can populate these fields
in the parser by assigning values to the corresponding fields in the proto
record.
Metadata tables
Metadata instances for all buckets are stored in a single metadata-store
table
in the metadata_dimension
dataset. The following describes the
BigQuery schema of table:
Nr | Field | Data Type | Description |
---|---|---|---|
1 | instance_id | string | Unique metadata instance identifier - UUID (PK) |
2 | bucket_number | integer | Bucket identifier |
3 | bucket_name | string | Bucket name |
4 | bucket_version | integer | Bucket version |
5 | natural_key | string | Natural key of the metadata |
6 | instance | JSON | Metadata instance object |
7 | created_timestamp | timestamp | Timestamp when the instance was created |
Partitioning
The metadata-store
table is partitioned on bucket_number
.
System tables
The mde_system
dataset exposes the following sets of tables:
- Type and bucket configuration tables for the MDE configuration model
- Operations logs
- Raw messages
Bucket specification
The bucket-specification
table exposes bucket configuration data. The table
has the following schema:
Nr | Field | Data Type |
---|---|---|
1 | id | string |
2 | bucket_type | string |
3 | bucket_number | integer |
4 | bucket_name | string |
5 | schema | JSON |
6 | bucket_version | integer |
7 | created_timestamp | timestamp |
8 | provider | string |
8 | bucket_attributes | JSON |
Type specification
The type-specification
table exposes bucket configuration data. The table has
the following schema:
Nr | Field | Data Type |
---|---|---|
1 | id | string |
2 | type_name | string |
3 | type_version | integer |
4 | data_schema | JSON |
5 | storage_specs | JSON |
6 | metadata_bucket_references | JSON |
7 | created_timestamp | timestamp |
Operations logs
The operations-log
table exposes MDE operations log data.
The logging level can be configured. See the
Operations guide
for more details. The table has the following schema:
Nr | Field | Data Type | Description |
---|---|---|---|
1 | message_id | string | Row identifier |
2 | source_message_id | string | Source message Id (PubSub assigned) |
3 | step | string | Processing pipeline step |
4 | error_message | STRING | Error cause |
5 | event_timestamp | timestamp | Event timestamp |
6 | step_timestamp | timestamp | Step timestamp |
7 | headers | JSON | Message headers at the processing step |
8 | payload | JSON | Message payload at the processing step |
Raw messages
MDE stores every incoming message in the raw-messages
table.
The table has the following schema:
Nr | Field | Data Type |
---|---|---|
1 | subscription_name | string |
2 | message_id | string |
3 | publish_time | timestamp |
4 | data | string |
5 | attributes | timestamp |