Test tables with assertions

This document shows you how to use Dataform core to create Dataform table assertions and test your workflow code.

About assertions

An assertion is a data quality test query that finds rows that violate one or more conditions specified in the query. If the query returns any rows, the assertion fails. Dataform runs assertions every time it updates your SQL workflow and it alerts you if any assertions fail.

Dataform automatically creates views in BigQuery that contain the results of compiled assertion queries. As configured in your workflow settings file, Dataform creates these views in an assertions schema where you can inspect assertion results.

For example, for the default dataform_assertions schema, Dataform creates a view in BigQuery in the following format: dataform_assertions.assertion_name.

You can create assertions for all Dataform table types: tables, incremental tables, views, and materialized views.

You can create assertions in the following ways:

Before you begin

  1. In the Google Cloud console, go to the Dataform page.

    Go to the Dataform page

  2. Select or create a repository.

  3. Select or create a development workspace.

  4. Define a table.

Required roles

To get the permissions that you need to create assertions, ask your administrator to grant you the Dataform Editor (roles/dataform.editor) IAM role on workspaces. For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Create built-in assertions

You can add built-in Dataform assertions to the config block of a table. Dataform runs these assertions after table creation. After Dataform creates the table, you can see if the assertion passed in the Workflow execution logs tab of your workspace.

You can create the following assertions in the config block of a table:

  • nonNull

    This condition asserts that the specified columns are not null across all table rows. This condition is used for columns that can never be null.

    The following code sample shows a nonNull assertion in the config block of a table:

config {
  type: "table",
  assertions: {
    nonNull: ["user_id", "customer_id", "email"]
  }
}
SELECT ...
  • rowConditions

    This condition asserts that all table rows follow the custom logic you define. Each row condition is a custom SQL expression, and each table row is evaluated against each row condition. The assertion fails if any table row results in false.

    The following code sample shows a custom rowConditions assertion in the config block of an incremental table:

config {
  type: "incremental",
  assertions: {
    rowConditions: [
      'signup_date is null or signup_date > "2022-08-01"',
      'email like "%@%.%"'
    ]
  }
}
SELECT ...
  • uniqueKey

    This condition asserts that, in a specified column, no table rows have the same value.

    The following code sample shows a uniqueKey assertion in the config block of a view:

config {
  type: "view",
  assertions: {
    uniqueKey: ["user_id"]
  }
}
SELECT ...
  • uniqueKeys

    This condition asserts that, in the specified columns, no table rows have the same value. The assertion fails if there is more than one row in the table with the same values for all the specified columns.

    The following code sample shows a uniqueKeys assertion in the config block of a table:

config {
  type: "table",
  assertions: {
    uniqueKeys: [["user_id"], ["signup_date", "customer_id"]]
  }
}
SELECT ...

Add assertions to the config block

To add assertions to the config block of a table, follow these steps:

  1. In your development workspace, in the Files pane, select a table definition SQLX file.
  2. In the config block of the table file, enter assertions: {}.
  3. Inside assertions: {}, add your assertions.
  4. Optional: Click Format.

The following code sample shows the conditions added in the config block:

config {
  type: "table",
  assertions: {
    uniqueKey: ["user_id"],
    nonNull: ["user_id", "customer_id"],
    rowConditions: [
      'signup_date is null or signup_date > "2019-01-01"',
      'email like "%@%.%"'
    ]
  }
}
SELECT ...

Create manual assertions with SQLX

Manual assertions are SQL queries that you write in a dedicated SQLX file. A manual assertion SQL query must return zero rows. If the query returns rows when executed, the assertion fails.

To add manual assertions in a new SQLX file, follow these steps:

  1. In the Files pane, next to definitions/, click the More menu.
  2. Click Create file.
  3. In the Add a file path field, enter the name of the file followed by .sqlx. For example, definitions/custom_assertion.sqlx.

    Filenames can only include numbers, letters, hyphens, and underscores.

  4. Click Create file.

  5. In the Files pane, click the new file.

  6. In the file, enter:

    config {
      type: "assertion"
    }
    
  7. Below the config block, write your SQL query or multiple queries.

  8. Optional: Click Format.

The following code sample shows a manual assertion in a SQLX file that asserts that fields A, B, and c are never NULL in sometable:

config { type: "assertion" }

SELECT
  *
FROM
  ${ref("sometable")}
WHERE
  a IS NULL
  OR b IS NULL
  OR c IS NULL

Set assertions as dependencies

When workflow action B depends on workflow action A that has assertions, failure of assertions of action A does not block Dataform from executing action B. To execute action B only if assertions of action A pass, you need to set assertions of action A as dependencies of action B.

You can set assertions as dependencies of a selected action in the following ways:

Set selected assertions as dependencies

You can manually set selected assertions as dependencies by adding them to dependencies: [ "" ] in the config block of the edited action.

For example, if action B depends on action A, and you want action B to depend only on selected assertions of action A, you can add those selected assertions to the config block of action B.

You can manually set selected assertions as dependencies for all action types except data source declarations.

Set assertions of a selected dependency action as dependencies

You can set the includeDependentAssertions parameter to automatically set all direct assertions of a selected dependency workflow action as dependencies of the edited action. Dataform adds these assertions as dependencies during each compilation of the action to ensure that dependencies are up to date if assertions of the dependency action change.

For example, if action C depends on actions A and B, but you only want action C to depend on assertions of action A, you can edit action C and set the includeDependentAssertions parameter to automatically set all assertions of action A as dependencies of action C.

You can set the includeDependentAssertions parameter for actions of the following types:

  • table
  • view
  • operations
Set assertions of all dependency actions as dependencies

You can set the dependOnDependencyAssertions parameter to automatically set all direct assertions from all dependency actions of the edited action as additional dependencies of the edited action. Dataform adds these assertions as dependencies during each compilation of the action to ensure that dependencies are up to date if assertions of the dependency action change.

For example, if action C depends on actions A and B, you can edit action C and set the dependOnDependencyAssertions parameter to automatically set all assertions of actions A and B as dependencies of action C.

You can set the dependOnDependencyAssertions parameter for actions of the following types:

  • table
  • view
  • operations

When you set the dependOnDependencyAssertions parameter and the includeDependentAssertions parameters in a single file, the includeDependentAssertions parameter takes priority. For example, if you set dependOnDependencyAssertions to true, but also set includeDependentAssertions to false for a selected dependency action, Dataform won't add assertions of that action to dependencies.

The following code sample shows the dependOnDependencyAssertions and includeDependentAssertions parameters set in the same table definition file:

// filename is tableName.sqlx

config {
type: "table",
dependOnDependencyAssertions: true,
dependencies: [ "actionA", {name: "actionB", includeDependentAssertions: false} ]
}

SELECT * FROM ${ref("actionC")}

In the preceding code sample, Dataform adds all direct assertions of actionA and actionC to dependencies of tableName during compilation.

Set selected assertions as dependencies

To execute a workflow action only when selected assertions pass, you can add the selected assertion to dependencies: [ "" ] in the config block of the edited action.

To set a selected assertion as a dependency of a selected workflow action, follow these steps:

  1. In your development workspace, in the Files pane, expand definitions/.
  2. Select a workflow action SQLX file.
  3. In the config block of the action file, enter dependencies: [ "" ].
  4. Inside dependencies: [ "" ], enter the name of the action assertion or filename of the manual assertion that you want to set as a dependency in one of the following formats:

    nonNull

    config {
      type: "ACTION_TYPE",
      dependencies: [ "ACTION_DATASET_NAME_ACTION_NAME_assertions_nonNull"]
    }
    

    Replace the following:

    • ACTION_TYPE: the type of the workflow action: table, view, or operations.
    • ACTION_DATASET_NAME: the name of the dataset in which the action is defined. The default dataset is defined in the workflow settings file.
    • ACTION_NAME: the name of the action in which the assertion is defined.

    rowConditions

    config {
      type: "ACTION_TYPE",
      dependencies: [ "ACTION_DATASET_NAME_ACTION_NAME_assertions_rowConditions"]
    }
    

    Replace the following:

    • ACTION_TYPE: the type of the workflow action: table, view, or operations.
    • DATASET_NAME: the name of the dataset in which the action is defined. The default dataset is defined in the workflow settings file.
    • ACTION_NAME: the name of the action in which the assertion is defined.

    uniqueKey

    config {
      type: "ACTION_TYPE",
      dependencies: [ "ACTION_DATASET_NAME_ACTION_NAME_assertions_uniqueKey_INDEX"]
    }
    

    Replace the following:

    • ACTION_TYPE: the type of the workflow action: table, view, or operations.
    • DATASET_NAME: the name of the dataset in which the table is defined. The default dataset is defined in the workflow settings file.
    • ACTION_NAME: the name of the table in which the assertion is defined.
    • INDEX: the index of the array of keys defined in the uniqueKey assertion which you want to add as a dependency. For example, 0 or 1. If only one array of keys is defined in the assertion, the index is 0.

    uniqueKeys

    config {
      type: "ACTION_TYPE",
      dependencies: [ "ACTION_DATASET_NAME_ACTION_NAME_assertions_uniqueKeys_INDEX"]
    }
    

    Replace the following:

    • ACTION_TYPE: the type of the workflow action: table, view, or operations.
    • DATASET_NAME: the name of the dataset in which the table is defined. The default dataset is defined in the workflow settings file.
    • ACTION_NAME: the name of the table in which the assertion is defined.
    • INDEX: the index of the array of keys defined in the uniqueKeys assertion which you want to add as a dependency— for example, 0 or 1. If only one array of keys is defined in the assertion, the index is 0.

    manual assertion

    config {
      type: "ACTION_TYPE",
      dependencies: [ "MANUAL_ASSERTION_NAME"]
    }
    

    Replace the following:

    • ACTION_TYPE: the type of the workflow action: table, view, or operations.
    • MANUAL_ASSERTION_NAME the name of the manual assertion.
  5. To add another assertion as a dependency to the edited table, repeat Step 4.

  6. Optional: Click Format.

The following code sample shows assertions added to table A, defined in the dataform dataset:

config {
  type: "table",
  assertions: {
    uniqueKey: ["user_id"],
    nonNull: ["user_id", "customer_id"],
  }
}

The following code sample shows table A assertions added as dependencies to table B:

config {
  type: "table",
  dependencies: [ "dataform_A_assertions_uniqueKey_0",  "dataform_A_assertions_nonNull"]
}

The following code sample show a manual assertion defined in the manualAssertion.sqlx file, added as a dependency to a view:

config {
  type: "view",
  dependencies: [ "manualAssertion"]
}

The following code sample shows the manual_assertion file and the assertions of the sometable table added as dependencies to a table:

config {
  type: "table",
  dependencies: [ "manual_assertion",  "dataform_sometable_assertions_nonNull" ,  "dataform_sometable_assertions_rowConditions"]
}

SELECT * FROM ${ref("referenced_table")} LEFT JOIN ...

Set assertions of a selected action as dependencies

To execute a workflow action only when all direct assertions of a selected dependency action pass, set the includeDependentAssertions parameter to true in the edited action. Dataform automatically adds direct assertions of the selected dependency action to dependencies during compilation. The default value is false.

To set all assertions of a selected dependency action as a dependencies, follow these steps:

  1. In your development workspace, in the Files pane, expand definitions/.
  2. Select a workflow action SQLX file.
  3. In the file, set the includeDependentAssertions parameter to true in one of the following ways:

    In the config block

    config {
    type: "ACTION_TYPE",
    dependencies: [{name: "dEPENDENCY_ACTION_NAME", includeDependentAssertions: true}]
    }
    

    Replace the following:

    • ACTION_TYPE: the type of the workflow action: table, view, or operations.
    • DEPENDENCY_ACTION_NAME: the name of the dependency action which assertions you want to set as dependencies of the edited action.

    In the SELECT statement

      config { type: "ACTION_TYPE" }
    
      SELECT * FROM ${ref({name: "DEPENDENCY_ACTION_NAME", includeDependentAssertions: true})}
    

    Replace the following:

    • ACTION_TYPE: the type of the workflow action: table, view, or operations.
    • DEPENDENCY_ACTION_NAME: the name of the dependency action which assertions you want to set as dependencies of the edited action.
  4. Optional: Click Format.

The following code sample shows tableC that depends on viewA, tableB, and all assertions of tableB:

// filename is tableC.sqlx

config {
type: "table",
dependencies: ["viewA", {name: "tableB", includeDependentAssertions: true}]
}

SELECT * FROM ...

In the preceding code sample, Dataform automatically adds all direct assertions of tableB as dependencies to tableC during compilation.

Set assertions of all dependency actions as dependencies

To execute a workflow action only when all direct assertions of all dependency actions pass, set the dependOnDependencyAssertions parameter to true in the edited action. Dataform automatically adds direct assertions of dependency actions as dependencies during compilation. The default value is false.

When you set the dependOnDependencyAssertions parameter and the includeDependentAssertions parameters in a single file, the includeDependentAssertions parameter takes priority for the dependency action for which it is set.

To set all assertions of a selected dependency action as a dependencies, follow these steps:

  1. In your development workspace, in the Files pane, expand definitions/.
  2. Select a workflow action SQLX file.
  3. In the file, set the dependOnDependencyAssertions parameter to true in the following format:

    config {
    type: "ACTION_TYPE",
    dependOnDependencyAssertions: true,
    dependencies: [ "dependency1", "dependency2" ]
    }
    

    Replace ACTION_TYPE: the type of the workflow action. Supported values include table, view, and operations.

  4. Optional: Click Format.

The following code sample shows sometableE that depends on sometableA, sometabletableB, sometableC, and sometableD and all direct assertions of dependency tables:

// filename is sometableE.sqlx

config {
type: "table",
dependOnDependencyAssertions: true,
dependencies: [ "sometableA", "sometableB" ]
}

SELECT * FROM ${ref("sometableC")}
SELECT * FROM ${ref("sometableD")}

In the preceding code sample, Dataform automatically adds all direct assertions of sometableA, sometableB, sometableC, and sometableD as dependencies to sometableE during compilation.

What's next