BigQuery queries
This guide explains how to query data in BigQuery for typical Manufacturing Data Engine (MDE) use-cases.
Records join with cloud metadata
If cloud metadata materialization is disabled, you can access cloud metadata
instances by joining relevant record table with the metadata-store
on the
metadata instance_id
with the following SQL query:
SELECT
dnr.*,
ms.instance
FROM
mde_data.`RECORD_TABLE_NAME` AS dnr
LEFT JOIN
mde_dimension.`metadata-store` AS ms
ON
ms.instance_id = JSON_VALUE(cloud_metadata_ref, "$.BUCKET_NAME.instance_id")
WHERE
DATE(event_timestamp) = 'EVENT_TIMESTAMP'
LIMIT 100
Replace the following:
RECORD_TABLE_NAME
: the name of the record table.BUCKET_NAME
: the name of the cloud metadata bucket.EVENT_TIMESTAMP
: the timestamp of the event.
To improve query performance and since the metadata-store
is partitioned on
bucket number, you can optionally specify the bucket number in the ON
clause,
as the following SQL query:
SELECT
dnr.*,
ms.instance
FROM
mde_data.`<RECORD_TABLE_NAME>` AS dnr
LEFT JOIN
mde_dimension.`metadata-store` AS ms
ON
ms.instance_id = JSON_VALUE(cloud_metadata_ref, "$.BUCKET_NAME.instance_id")
AND ms.bucket_number = <BUCKET_NUMBER>
WHERE
DATE(event_timestamp) = 'EVENT_TIMESTAMP'
LIMIT 100
Replace the following:
BUCKET_NAME
: the name of the cloud metadata bucket.EVENT_TIMESTAMP
: the timestamp of the event.
Cloud metadata instance attributes access
You can access metadata instance attributes using the JSON dot notation which
always returns a JSON object, or using one of the BigQuery JSON functions,
such as JSON_VALUE
to extract strings or other data types. See the following
example:
SELECT
dnr.*,
ms.instance.deviceName -- this returns a double quoted JSON string
JSON_VALUE(ms.instance, '$.deviceName') -- this returns a string
FROM
mde_data.`example-record-tbl` AS dnr
LEFT JOIN
mde_dimension.`metadata-store` AS ms
ON
ms.instance_id = JSON_VALUE(cloud_metadata_ref, "$.bucket.instance_id")
WHERE
DATE(event_timestamp) = '2023-01-01'
LIMIT 100
Similarly, if cloud metadata materialization is enabled, you can access metadata instance attributes directly from the record. See the following example:
SELECT
* (EXCEPT materialized_cloud_metadata),
materialized_cloud_metadata.device.deviceName -- this returns a double quoted JSON string
JSON_VALUE(materialized_cloud_metadata., '$.device.deviceName') -- this returns a string
FROM
mde_data.`example-record-tbl`
WHERE
DATE(event_timestamp) = '2023-01-01'
LIMIT 100
Obtaining a list of all instance IDs contained in cloud_metadata_ref
To obtain an array of all metadata instance IDs contained in
the cloud_metadata_ref
field of a record follow these steps:
Create the user defined function (UDF) with the following SQL query:
CREATE OR REPLACE FUNCTION `mde_data.get_instance_ids`(input JSON) RETURNS ARRAY<STRING> LANGUAGE js AS R""" return input ? Object.keys(input).map(bucketName => input[bucketName].instance_id).filter(instance_id => instance_id != null) : []; """;
Execute the function in a query:
SELECT mde_data.get_instance_ids(cloud_metadata_ref) as metadata_instance_ids, *, FROM mde_data.`RECORD_TABLE_NAME` WHERE DATE(event_timestamp) = 'EVENT_TIMESTAMP' LIMIT 100
Replace the following:
RECORD_TABLE_NAME
: the name of the record table.EVENT_TIMESTAMP
: the timestamp of the event.