Test tables with assertions

This document shows you how to use Dataform core to create Dataform table assertions and test your workflow code.

About assertions

An assertion is a data quality test query that finds rows that violate one or more rules specified in the query. If the query returns any rows, the assertion fails. Dataform runs assertions every time it updates your SQL workflow and it alerts you if any assertions fail.

Dataform automatically creates views in BigQuery that contain the results of compiled assertion queries. As configured in your dataform.json file, Dataform creates these views in an assertions schema where you can inspect assertion results.

For example, for the default dataform_assertions schema, Dataform creates a view in BigQuery in the following format: dataform_assertions.assertion_name.

You can create assertions for all Dataform table types: tables, incremental tables, views, and materialized views.

You can create assertions in the following ways:

Before you begin

  1. In the Google Cloud console, go to the Dataform page.

    Go to the Dataform page

  2. Select or create a repository.

  3. Select or create a development workspace.

  4. Define a table.

Required roles

To get the permissions that you need to create assertions, ask your administrator to grant you the Dataform Editor (roles/dataform.editor) IAM role on workspaces. For more information about granting roles, see Manage access.

You might also be able to get the required permissions through custom roles or other predefined roles.

Create built-in assertions

You can add built-in Dataform assertions to the config block of a table. Dataform runs these assertions after table creation. After Dataform publishes the table, you can inspect the assertion.

You can create the following assertions in the config block of a table:

  • nonNull

    This condition asserts that the specified columns are not null across all table rows. This condition is used for columns that can never be null.

    The following code sample shows a nonNull assertion in the config block of a table:

config {
  type: "table",
  assertions: {
    nonNull: ["user_id", "customer_id", "email"]
  }
}
SELECT ...
  • rowConditions

    This condition asserts that all table rows follow the custom logic you define. Each row condition is a custom SQL expression, and each table row is evaluated against each row condition. The assertion fails if any table row violates any row condition.

    The following code sample shows a custom rowConditions assertion in the config block of an incremental table:

config {
  type: "incremental",
  assertions: {
    rowConditions: [
      'signup_date is null or signup_date > "2022-08-01"',
      'email like "%@%.%"'
    ]
  }
}
SELECT ...
  • uniqueKey

    This condition asserts that, in a specified column, no table rows have the same value.

    The following code sample shows a uniqueKey assertion in the config block of a view:

config {
  type: "view",
  assertions: {
    uniqueKey: ["user_id"]
  }
}
SELECT ...
  • uniqueKeys

    This condition asserts that, in the specified columns, no table rows have the same value. The assertion fails if there is more than one row in the table with the same values for all the specified columns.

    The following code sample shows a uniqueKeys assertion in the config block of a table:

config {
  type: "table",
  assertions: {
    uniqueKeys: [["user_id"], ["signup_date", "customer_id"]]
  }
}
SELECT ...

Add assertions to the config block

To add assertions to the config block of a table, follow these steps:

  1. In your development workspace, in the Files pane, select a table definition SQLX file.
  2. In the config block of the table file, enter assertions: {}.
  3. Inside assertions: {}, add your assertions.
  4. Optional: Click Format.

The following code sample shows the conditions added in the config block:

config {
  type: "table",
  assertions: {
    uniqueKey: ["user_id"],
    nonNull: ["user_id", "customer_id"],
    rowConditions: [
      'signup_date is null or signup_date > "2019-01-01"',
      'email like "%@%.%"'
    ]
  }
}
SELECT ...

Create manual assertions with SQLX

Manual assertions are SQL queries that you write in a dedicated SQLX file. A manual assertion SQL query must return zero rows. If the query returns rows when executed, the assertion fails.

To add manual assertions in a new SQLX file, follow these steps:

  1. In the Files pane, next to definitions/, click the More menu.
  2. Click Create file.
  3. In the Add a file path field, enter the name of the file followed by .sqlx. For example, definitions/custom_assertion.sqlx.

    Filenames can only include numbers, letters, hyphens, and underscores.

  4. Click Create file.

  5. In the Files pane, click the new file.

  6. In the file, enter:

    config {
      type: "assertion"
    }
    
  7. Below the config block, write your SQL query or multiple queries.

  8. Optional: Click Format.

The following code sample shows a manual assertion in a SQLX file that asserts that fields a, b, and c are never NULL in sometable:

config { type: "assertion" }

SELECT
  *
FROM
  ${ref("sometable")}
WHERE
  a IS NULL
  OR b IS NULL
  OR c IS NULL

Add assertions as dependencies

When table B depends on table A that has assertions, failure of table A assertions does not block Dataform from creating table B. To execute table B only if table A assertions pass, add these assertions as dependencies to dependencies: [ "" ] in the config block of table B.

The following code sample shows assertions added to table A in the dataform dataset:

config {
  type: "table",
  assertions: {
    uniqueKey: ["user_id"],
    nonNull: ["user_id", "customer_id"],
  }
}

The following code sample shows table A assertions added as dependencies to table B:

config {
  type: "table",
  dependencies: [ "dataform_A_assertions_uniqueKey_0",  "dataform_A_assertions_nonNull"]
}

To execute a table only when certain manual assertions pass, you can add the manual assertion SQLX file to dependencies: [ "" ] in the config block.

The following code sample show a manual assertion defined in the manualAssertion file, added as a dependency to a view:

config {
  type: "view",
  dependencies: [ "manualAssertion"]
}

To add an assertion file or a table with assertions in its config block as a table dependency, follow these steps:

  1. In your development workspace, in the Files pane, expand definitions/.
  2. Select a table SQLX file.
  3. In the config block of the table file, enter dependencies: [ "" ].
  4. Inside dependencies: [ "" ], enter the name of the table assertion or filename of the manual assertion that you want to set as a dependency in one of the following formats:

nonNull

config {
  type: "table",
  dependencies: [ "TABLE_DATASET_NAME_TABLE_NAME_assertions_nonNull"]
}

Replace the following:

  • DATASET_NAME with the name of the dataset in which the table is defined. The default dataset is defined in the workflow_settings.yaml file.
  • TABLE_NAME with the name of the table in which the assertion is defined.

rowConditions

config {
  type: "table",
  dependencies: [ "TABLE_DATASET_NAME_TABLE_NAME_assertions_rowConditions"]
}

Replace the following:

  • DATASET_NAME with the name of the dataset in which the table is defined. The default dataset is defined in the workflow_settings.yaml file.
  • TABLE_NAME with the name of the table in which the assertion is defined.

uniqueKey

config {
  type: "table",
  dependencies: [ "TABLE_DATASET_NAME_TABLE_NAME_assertions_uniqueKey_INDEX"]
}

Replace the following:

  • DATASET_NAME with the name of the dataset in which the table is defined. The default dataset is defined in the workflow_settings.yaml file.
  • TABLE_NAME with the name of the table in which the assertion is defined.
  • INDEX with the index of the array of keys defined in the uniqueKey assertion which you want to add as a dependency. For example, 0 or 1. If only one array of keys is defined in the assertion, the index is 0.

uniqueKeys

config {
  type: "table",
  dependencies: [ "TABLE_DATASET_NAME_TABLE_NAME_assertions_uniqueKeys_INDEX"]
}

Replace the following:

  • DATASET_NAME with the name of the dataset in which the table is defined. The default dataset is defined in the workflow_settings.yaml file.
  • TABLE_NAME with the name of the table in which the assertion is defined.
  • INDEX with the index of the array of keys defined in the uniqueKeys assertion which you want to add as a dependency— for example, 0 or 1. If only one array of keys is defined in the assertion, the index is 0.

manual assertion

config {
  type: "table",
  dependencies: [ "MANUAL_ASSERTION_NAME"]
}

Replace the following:

  • MANUAL_ASSERTION_NAME the name of the manual assertion.
  1. To add another assertion as a dependency to the edited table, repeat Step 4.
  2. Optional: Click Format.

The following code sample shows the manual_assertion file and the assertions of the sometable table added as dependencies to a table:

config {
  type: "table",
  dependencies: [ "manual_assertion",  "dataform_sometable_assertions_nonNull" ,  "dataform_sometable_assertions_rowConditions"]
}

SELECT * FROM ${ref("referenced_table")} LEFT JOIN ...

What's next