Dataform core sample scripts

This document shows example Dataform core and JavaScript scripts that you can use to create a SQL workflow in Dataform.

Creating tables

Creating a view with Dataform core

The following code sample shows definition of a view called new_view in the definitions/new_view.sqlx file:

config { type: "view" }

SELECT * FROM source_data

Creating a materialized view with Dataform core

The following code sample shows definition of a materialized view called new_materialized_view in the definitions/new_materialized_view.sqlx file:

config {
  type: "view",
  materialized: true
}

SELECT * FROM source_data

Creating a table with Dataform core

The following code sample shows definition of a table called new_table in the definitions/new_table.sqlx file:

config { type: "table" }

SELECT * FROM source_data

Creating an incremental table with Dataform core

The following code sample shows an incremental table that incrementally processes rows of the productiondb.logs table:

config { type: "incremental" }

SELECT timestamp, message FROM ${ref("productiondb", "logs")}

${when(incremental(), `WHERE timestamp > (SELECT MAX(timestamp) FROM ${self()})`) }

Using the ref function to reference tables with Dataform core

The following code sample shows the ref function used to reference the source_data table in the definitions/new_table_with_ref.sqlx table definition file:

config { type: "table" }

SELECT * FROM ${ref("source_data")}

Adding documentation to a table, view, or declaration with Dataform core

The following code sample shows table and columns descriptions in the definitions/documented_table.sqlx table definition file:

config { type: "table",
         description: "This table is an example",
         columns:{
             user_name: "Name of the user",
             user_id: "ID of the user"
      }
  }

SELECT user_name, user_id FROM ${ref("source_data")}

Configuring incremental tables

Adding new table rows for new dates in source data with Dataform core

The following code sample shows a configuration of an incremental table in the definitions/incremental_table.sqlx file. In this configuration, Dataform appends a new row to the incremental_table for each new date:

config { type: "incremental" }

SELECT date(timestamp) AS date, action
FROM weblogs.user_actions

${ when(incremental(), `WHERE timestamp > (select max(date) FROM ${self()})`)

Taking a snapshot of a table periodically with Dataform core

The following code sample shows a configuration of an incremental table in the definitions/snapshots_table.sqlx file. In this configuration, Dataform creates snapshots_table with a snapshot of productiondb.customers at the specified date:

config { type: "incremental" }

SELECT current_date() AS snapshot_date, customer_id, name, account_settings FROM productiondb.customers

${ when(incremental(), `WHERE snapshot_date > (SELECT max(snapshot_date) FROM ${self()})`) }

Building a rolling 30-days table that updates incrementally with Dataform core

The following code sample shows a configuration of an incremental table in the definitions/incremental_example.sqlx file. In this configuration, Dataform creates a temporary incremental_example that updates incrementally, and deletes the table after 30 days from its creation:

config {type: "incremental"}

post_operations {
  delete FROM ${self()} WHERE date < (date_add(Day, -30, CURRENT_DATE))
}

SELECT
 date(timestamp) AS date,
 order_id,
FROM source_table
  ${ when(incremental(), `WHERE timestamp > (SELECT max(date) FROM ${self()})`) }

Creating custom SQL operations

Running several SQL operations in a SQLX file with Dataform core

The following code sample shows ; used to separate multiple SQL operations defined in definitions/operations.sqlx:

config { type: "operations" }

DELETE FROM datatable where country = 'GB';
DELETE FROM datatable where country = 'FR';

Running custom SQL before creating a table with Dataform core

The following code sample shows a custom SQL operation defined in the pre_operations block of the definitions/table_with_preops.sqlx table definition file:

config {type: "table"}

SELECT * FROM ...

pre_operations {
  INSERT INTO table ...
}

Running custom SQL after creating a table with Dataform core

The following code sample shows a custom SQL operation defined in the post_operations block of the definitions/table_with_postops.sqlx table definition file:

config {type: "table"}

SELECT * FROM ...

post_operations {
  GRANT SELECT on ${self()} to group "allusers@examplecompany.com";
  GRANT SELECT on ${self()} to group "allotherusers@examplecompany.com";
}

Validating tables

Adding assertions to a table, view, or declaration with Dataform core

The following code sample shows uniqueKey, nonNull, and rowConditions assertions added to the definitions/tested_table.sqlx table definition file:

config {
  type: "table",
  assertions: {
    uniqueKey: ["user_id"],
    nonNull: ["user_id", "customer_id"],
    rowConditions: [
      'signup_date is null or signup_date > "2022-01-01"',
      'email like "%@%.%"'
    ]
  }
}
SELECT ...

Adding a custom assertion with Dataform core

The following code sample shows a custom assertion in a table definition file that validates if columns a, or b, or c from source_data are null:

config { type: "assertion" }

SELECT
  *
FROM
  ${ref("source_data")}
WHERE
  a is null
  or b is null
  or c is null

Developing with JavaScript

Using inline variables and functions with JavaScript

The following code sample shows the foo variable defined in a js block and then used inline in a SQLX file:

js {
 const foo = 1;
 function bar(number){
     return number+1;
 }
}

SELECT
 ${foo} AS one,
 ${bar(foo)} AS two

Generating one table per country with JavaScript

The following code sample shows the use of the forEach function to generate one table per each country defined in countries in the definitions/one_table_per_country.js file:

const countries = ["GB", "US", "FR", "TH", "NG"];

countries.forEach(country => {
  publish("reporting_" + country)
    .dependencies(["source_table"])
    .query(
      ctx => `
      SELECT '${country}' AS country
      `
    );
});

Declaring multiple sources within one file with JavaScript

The following code sample shows declaration of multiple data sources in the definitions/external_dependencies.js file:

declare({
  schema: "stripe",
  name: "charges"
});

declare({
  schema: "shopify",
  name: "orders"
});

declare({
  schema: "salesforce",
  name: "accounts"
});

Declaring multiple sources within one file using forEach

The following code sample shows declaration of multiple data sources with the forEach function in the definitions/external_dependencies.js file:

["charges", "subscriptions", "line_items", "invoices"]
  .forEach(source => declare({
      schema: "stripe",
      name: source
    })
  );

Deleting sensitive information in all tables containing PII with JavaScript

The following code sample shows a function in the definitions/delete_pii.js file that deletes selected information in all tables that contain personal identifiable information (PII):

const pii_tables = ["users", "customers", "leads"];
pii_tables.forEach(table =>
  operate(`gdpr_cleanup: ${table}`,
    ctx => `
      DELETE FROM raw_data.${table}
      WHERE user_id in (SELECT * FROM users_who_requested_deletion)`)
      .tags(["gdpr_deletion"]))
);

Adding preOps and postOps with JavaScript

The following code sample shows the publish function used to create a query with preOps and postOps in the definitions/pre_and_post_ops_example.js table:

publish("example")
  .query(ctx => `SELECT * FROM ${ctx.ref("other_table")}`)
  .preOps(ctx => `DELETE ${ctx.self()}`)
  .postOps(ctx => `GRANT SELECT on ${ctx.self()} to role`)

Creating incremental tables with JavaScript

The following code sample shows the publish function used to create an incremental table in the definitions/incremental_example.js file:

publish("incremental_example", {
  type: "incremental"
}).query(ctx => `
  SELECT * FROM ${ctx.ref("other_table")}
  ${ctx.when(ctx.incremental(),`WHERE timestamp > (SELECT MAX(date) FROM ${ctx.self()}`)}
`)

Backfilling a daily table with JavaScript

The following code sample shows backfilling a table that updates daily in the definitions/backfill_daily_data.js file:

var getDateArray = function(start, end) {
  var startDate = new Date(start); //YYYY-MM-DD
  var endDate = new Date(end); //YYYY-MM-DD

  var arr = new Array();
  var dt = new Date(startDate);
  while (dt <= endDate) {
    arr.push(new Date(dt).toISOString().split("T")[0]);
    dt.setDate(dt.getDate() + 1);
  }
  return arr;
};

var dateArr = getDateArray("2020-03-01", "2020-04-01");

// step 1: create table
operate(`create table`, 'create table if not exists backfill_table (`fields`) `);
// step 2: insert into the table

dateArr.forEach((day, i) =>
  operate(`backfill ${day}`
   `insert into backfill_table select fields where day = '${day}'`)
);

Reusing code with includes

Using global variables with JavaScript

The following code sample shows definition of project_id and first_date constants in the includes/constants.js:

const project_id = "project_id";
const first_date = "'1970-01-01'";
module.exports = {
  project_id,
  first_date
};

The following code sample shows the first_date constant referenced in the definitions/new_table.sqlx file:

config {type: "table"}

SELECT * FROM source_table WHERE date > ${constants.first_date}

Creating a country mapping with JavaScript

The following code sample shows the country_group custom function defined in the includes/mapping.js file:

function country_group(country){
  return `
  case
    when ${country} in ('US', 'CA') then 'NA'
    when ${country} in ('GB', 'FR', 'DE', 'IT', 'PL', 'SE') then 'EU'
    when ${country} in ('AU') then ${country}
    else 'Other'
  end`;

The following code sample shows a table definition that uses the country_group function in the definitions/new_table.sqlx table definition file:

config { type: "table"}

SELECT
  country AS country,
  ${mapping.country_group("country")} AS country_group,
  device_type AS device_type,
  sum(revenue) AS revenue,
  sum(pageviews) AS pageviews,
  sum(sessions) AS sessions

FROM ${ref("source_table")}

GROUP BY 1, 2, 3

The following code sample shows the query defined in definitions/new_table.sqlx compiled to SQL:

SELECT
  country AS country,
  case
    when country in ('US', 'CA') then 'NA'
    when country in ('GB', 'FR', 'DE', 'IT', 'PL', 'SE') then 'EU'
    when country in ('AU') then country
    else 'Other'
  end AS country_group,
  device_type AS device_type,
  sum(revenue) AS revenue,
  sum(pageviews) AS pageviews,
  sum(sessions) AS sessions

FROM "dataform"."source_table"

GROUP BY 1, 2, 3

Generating a SQL script with a custom JavaScript function

The following code sample shows the render_script custom function defined in includes/script_builder.js:

function render_script(table, dimensions, metrics) {
  return `
      SELECT
      ${dimensions.map(field => `${field} AS ${field}`).join(",")},
      ${metrics.map(field => `sum(${field}) AS ${field}`).join(",\n")}
      FROM ${table}
      GROUP BY ${dimensions.map((field, i) => `${i + 1}`).join(", ")}
    `;
}

module.exports = { render_script };

The following code sample shows a table definition that uses the render_script function in the definitions/new_table.sqlx table definition file:

config {
    type: "table",
    tags: ["advanced", "hourly"],
    disabled: true
}

${script_builder.render_script(ref("source_table"),
                               ["country", "device_type"],
                               ["revenue", "pageviews", "sessions"]
                               )}

The following code sample shows the query defined in definitions/new_table.sqlx compiled to SQL:

SELECT
  country AS country,
  device_type AS device_type,
  sum(revenue) AS revenue,
  sum(pageviews) AS pageviews,
  sum(sessions) AS sessions

FROM "dataform"."source_table"

GROUP BY 1, 2