The Slowly changing dimensions package contains common data models for creating type 2 slowly changing dimensions tables from mutable data sources in Dataform.
Slowly changing dimensions tables are incremental tables that contain data that can change unpredictably, not on a regular schedule, such as customers or products. In a type 2 slowly changing dimensions table, new data is appended in a new row without overwriting existing table rows. Table history is preserved in multiple records for a given key in the slowly changing dimension key. Each record has a unique key.
The Slowly changing dimensions package creates the following relations in
BigQuery for a given NAME
:
NAME
- a view withscd_valid_from
andscd_valid_to
fieldsNAME_updates
- an incremental table that stores the change history of the source table
Dataform updates slowly changing dimensions every time it executes a slowly changing dimensions incremental table. You might want to schedule slowly changing dimensions table to run every day or every hour, depending on the granularity of changes you want to capture.
To learn how to schedule Dataform executions, see Schedule executions with Cloud Composer or Schedule executions with Workflows and Cloud Scheduler.
Before you begin
- On the Slowly changing dimensions releases page,
copy the
.tar.gz
URL of the newest release. - Create a Dataform repository.
- Create and initialize a workspace in the repository.
- Install the Slowly changing dimensions package in your Dataform repository.
Required roles
To get the permissions that you need to configure a package,
ask your administrator to grant you the
Dataform Editor (roles/dataform.editor
) IAM role on workspaces.
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Create a slowly changing dimensions table with the Slowly changing dimensions package
To create a slowly changing dimensions table with the Slowly changing dimensions package package in Dataform, follow these steps:
In the Files pane, next to
definitions/
, click the More menu.Click Create file.
In the Create new file pane, do the following:
In the Add a file path field, after
definitions/
, enter the name of the file followed by.js
. For example,definitions/definitions.js
.Click Create file.
In the Files pane, select the newly created
.js
. file.Import the package to the file in the following format:
const CONSTANT-NAME = require("dataform-scd");
Replace CONSTANT-NAME with a name for the constant, for example,
scd
.Create slowly changing dimensions tables in the following format:
scd("source_data_scd", { uniqueKey: "UNIQUE_ID", timestamp: "UPDATED_AT", // A field that stores a timestamp or date of when the row was last changed. source: { schema: "SOURCE_SCHEMA", // The source table to build slowly changing dimensions from. name: "SOURCE_SCHEMA_NAME", }, incrementalConfig: { // Any configuration parameters to apply to the incremental table that will be created. bigquery: { partitionBy: "UPDATED_AT", }, }, });
Replace the following:
- UNIQUE_ID: a unique identifier for rows in the table
- UPDATED_AT: a name for the field that
stores a timestamp or date of when the row was last changed, for example,
updated_at
- SOURCE_SCHEMA: the schema of the source
table, for example,
dataform_scd_example
- SOURCE_SCHEMA_NAME: the name of the source
table, for example,
source_data
Optional: Click Format.
The following code sample shows a slowly changing dimensions table definition created with the Slowly changing dimensions package:
const scd = require("dataform-scd");
/**
* Create an SCD table on top of the table defined in source_data.sqlx.
*/
const { updates, view } = scd("source_data_scd", {
// A unique identifier for rows in the table.
uniqueKey: "user_id",
// A field that stores a timestamp or date of when the row was last changed.
timestamp: "updated_at",
// The source table to build slowly changing dimensions from.
source: {
schema: "dataform_scd_example",
name: "source_data",
},
// Any tags that will be added to actions.
tags: ["slowly-changing-dimensions"],
// Documentation of table columns
columns: {user_id: "User ID", some_field: "Data Field", updated_at: "Timestamp for updates"},
// Configuration parameters to apply to the incremental table that will be created.
incrementalConfig: {
bigquery: {
partitionBy: "updated_at",
},
},
});
// Additional customization of the created models can be done by using the returned actions objects.
updates.config({
description: "Updates table for SCD",
});
What's next
- To learn how to create incremental tables in Dataform, see Configure an incremental table.
- To learn more about packages in Dataform with JavaScript includes, see Introduction to JavaScript in Dataform.
- To learn how to create your own package and use it in Dataform, see Create a package in Dataform.