About assertions
An assertion is a data quality test query that finds rows that violate one or more rules specified in the query. If the query returns any rows, the assertion fails. Dataform runs assertions every time it updates your SQL workflow and it alerts you if any assertions fail.
Dataform automatically creates views in BigQuery that contain the results of compiled assertion queries. As configured in your workflow settings file, Dataform creates these views in an assertions schema where you can inspect assertion results.
For example, for the default dataform_assertions
schema, Dataform
creates a view in BigQuery in the following format:
dataform_assertions.assertion_name
.
You can create assertions for all Dataform table types: tables, incremental tables, views, and materialized views.
You can create assertions in the following ways:
Add built-in assertions to the config block of a table.
You can add built-in assertions to the
config
block of a table and specify their conditions.Add manual assertions in a separate SQLX file.
You manually write custom assertions in a separate SQLX file for advanced use cases or for datasets not created by Dataform.
Before you begin
In the Google Cloud console, go to the Dataform page.
Select or create a repository.
Select or create a development workspace.
Required roles
To get the permissions that you need to create assertions,
ask your administrator to grant you the
Dataform Editor (roles/dataform.editor
) IAM role on workspaces.
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Create built-in assertions
You can add built-in Dataform assertions to the config
block of a
table. Dataform runs these assertions after table creation. After
Dataform publishes the table, you can inspect the assertion.
You can create the following assertions in the config
block of a table:
nonNull
This condition asserts that the specified columns are not null across all table rows. This condition is used for columns that can never be null.
The following code sample shows a
nonNull
assertion in theconfig
block of a table:
config {
type: "table",
assertions: {
nonNull: ["user_id", "customer_id", "email"]
}
}
SELECT ...
rowConditions
This condition asserts that all table rows follow the custom logic you define. Each row condition is a custom SQL expression, and each table row is evaluated against each row condition. The assertion fails if any table row violates any row condition.
The following code sample shows a custom
rowConditions
assertion in theconfig
block of an incremental table:
config {
type: "incremental",
assertions: {
rowConditions: [
'signup_date is null or signup_date > "2022-08-01"',
'email like "%@%.%"'
]
}
}
SELECT ...
uniqueKey
This condition asserts that, in a specified column, no table rows have the same value.
The following code sample shows a
uniqueKey
assertion in theconfig
block of a view:
config {
type: "view",
assertions: {
uniqueKey: ["user_id"]
}
}
SELECT ...
uniqueKeys
This condition asserts that, in the specified columns, no table rows have the same value. The assertion fails if there is more than one row in the table with the same values for all the specified columns.
The following code sample shows a
uniqueKeys
assertion in theconfig
block of a table:
config {
type: "table",
assertions: {
uniqueKeys: [["user_id"], ["signup_date", "customer_id"]]
}
}
SELECT ...
Add assertions to the config
block
To add assertions to the config block of a table, follow these steps:
- In your development workspace, in the Files pane, select a table definition SQLX file.
- In the
config
block of the table file, enterassertions: {}
. - Inside
assertions: {}
, add your assertions. - Optional: Click Format.
The following code sample shows the conditions added in the config
block:
config {
type: "table",
assertions: {
uniqueKey: ["user_id"],
nonNull: ["user_id", "customer_id"],
rowConditions: [
'signup_date is null or signup_date > "2019-01-01"',
'email like "%@%.%"'
]
}
}
SELECT ...
Create manual assertions with SQLX
Manual assertions are SQL queries that you write in a dedicated SQLX file. A manual assertion SQL query must return zero rows. If the query returns rows when executed, the assertion fails.
To add manual assertions in a new SQLX file, follow these steps:
- In the Files pane, next to
definitions/
, click the More menu. - Click Create file.
In the Add a file path field, enter the name of the file followed by
.sqlx
. For example,definitions/custom_assertion.sqlx
.Filenames can only include numbers, letters, hyphens, and underscores.
Click Create file.
In the Files pane, click the new file.
In the file, enter:
config { type: "assertion" }
Below the
config
block, write your SQL query or multiple queries.Optional: Click Format.
The following code sample shows a manual assertion in a SQLX file that asserts
that fields A, B, and c
are never NULL
in sometable
:
config { type: "assertion" }
SELECT
*
FROM
${ref("sometable")}
WHERE
a IS NULL
OR b IS NULL
OR c IS NULL
Set assertions as dependencies
When workflow action B depends on workflow action A that has assertions, failure of assertions of action A does not block Dataform from executing action B. To execute action B only if assertions of action A pass, you need to set assertions of action A as dependencies of action B.
You can set assertions as dependencies of a selected action in the following ways:
- Set selected assertions as dependencies
You can manually set selected assertions as dependencies by adding them to
dependencies: [ "" ]
in theconfig
block of the edited action.For example, if action B depends on action A, and you want action B to depend only on selected assertions of action A, you can add those selected assertions to the
config
block of action B.You can manually set selected assertions as dependencies for all action types except data source declarations.
- Set assertions of a selected dependency action as dependencies
You can set the
includeDependentAssertions
parameter to automatically set all direct assertions of a selected dependency workflow action as dependencies of the edited action. Dataform adds these assertions as dependencies during each compilation of the action to ensure that dependencies are up to date if assertions of the dependency action change.For example, if action C depends on actions A and B, but you only want action C to depend on assertions of action A, you can edit action C and set the
includeDependentAssertions
parameter to automatically set all assertions of action A as dependencies of action C.You can set the
includeDependentAssertions
parameter for actions of the following types:table
view
operations
- Set assertions of all dependency actions as dependencies
You can set the
dependOnDependencyAssertions
parameter to automatically set all direct assertions from all dependency actions of the edited action as additional dependencies of the edited action. Dataform adds these assertions as dependencies during each compilation of the action to ensure that dependencies are up to date if assertions of the dependency action change.For example, if action C depends on actions A and B, you can edit action C and set the
dependOnDependencyAssertions
parameter to automatically set all assertions of actions A and B as dependencies of action C.You can set the
dependOnDependencyAssertions
parameter for actions of the following types:table
view
operations
When you set the dependOnDependencyAssertions
parameter and the
includeDependentAssertions
parameters in a single file,
the includeDependentAssertions
parameter takes priority.
For example, if you set dependOnDependencyAssertions
to true
,
but also set includeDependentAssertions
to false
for a selected dependency
action, Dataform won't add assertions of that action to dependencies.
The following code sample shows the dependOnDependencyAssertions
and
includeDependentAssertions
parameters set in the same table definition file:
// filename is tableName.sqlx
config {
type: "table",
dependOnDependencyAssertions: true,
dependencies: [ "actionA", {name: "actionB", includeDependentAssertions: false} ]
}
SELECT * FROM ${ref("actionC")}
In the preceding code sample, Dataform adds all direct assertions
of actionA
and actionC
to dependencies of tableName
during compilation.
Set selected assertions as dependencies
To execute a workflow action only when selected assertions pass,
you can add the selected assertion to dependencies: [ "" ]
in the config
block of the edited action.
To set a selected assertion as a dependency of a selected workflow action, follow these steps:
- In your development workspace, in the Files pane, expand
definitions/
. - Select a workflow action SQLX file.
- In the
config
block of the action file, enterdependencies: [ "" ]
. Inside
dependencies: [ "" ]
, enter the name of the action assertion or filename of the manual assertion that you want to set as a dependency in one of the following formats:nonNull
config { type: "ACTION_TYPE", dependencies: [ "ACTION_DATASET_NAME_ACTION_NAME_assertions_nonNull"] }
Replace the following:
- ACTION_TYPE: the type of the workflow action:
table
,view
, oroperations
. - ACTION_DATASET_NAME: the name of the dataset in which the action is defined. The default dataset is defined in the workflow settings file.
- ACTION_NAME: the name of the action in which the assertion is defined.
rowConditions
config { type: "ACTION_TYPE", dependencies: [ "ACTION_DATASET_NAME_ACTION_NAME_assertions_rowConditions"] }
Replace the following:
- ACTION_TYPE: the type of the workflow action:
table
,view
, oroperations
. - DATASET_NAME: the name of the dataset in which the action is defined. The default dataset is defined in the workflow settings file.
- ACTION_NAME: the name of the action in which the assertion is defined.
uniqueKey
config { type: "ACTION_TYPE", dependencies: [ "ACTION_DATASET_NAME_ACTION_NAME_assertions_uniqueKey_INDEX"] }
Replace the following:
- ACTION_TYPE: the type of the workflow action:
table
,view
, oroperations
. - DATASET_NAME: the name of the dataset in which the table is defined. The default dataset is defined in the workflow settings file.
- ACTION_NAME: the name of the table in which the assertion is defined.
- INDEX: the index of the array of keys defined in the
uniqueKey
assertion which you want to add as a dependency. For example,0
or1
. If only one array of keys is defined in the assertion, the index is0
.
uniqueKeys
config { type: "ACTION_TYPE", dependencies: [ "ACTION_DATASET_NAME_ACTION_NAME_assertions_uniqueKeys_INDEX"] }
Replace the following:
- ACTION_TYPE: the type of the workflow action:
table
,view
, oroperations
. - DATASET_NAME: the name of the dataset in which the table is defined. The default dataset is defined in the workflow settings file.
- ACTION_NAME: the name of the table in which the assertion is defined.
- INDEX: the index of the array of keys defined in the
uniqueKeys
assertion which you want to add as a dependency— for example,0
or1
. If only one array of keys is defined in the assertion, the index is0
.
manual assertion
config { type: "ACTION_TYPE", dependencies: [ "MANUAL_ASSERTION_NAME"] }
Replace the following:
- ACTION_TYPE: the type of the workflow action:
table
,view
, oroperations
. - MANUAL_ASSERTION_NAME the name of the manual assertion.
- ACTION_TYPE: the type of the workflow action:
To add another assertion as a dependency to the edited table, repeat Step 4.
Optional: Click Format.
The following code sample shows assertions added to table A,
defined in the dataform
dataset:
config {
type: "table",
assertions: {
uniqueKey: ["user_id"],
nonNull: ["user_id", "customer_id"],
}
}
The following code sample shows table A assertions added as dependencies to table B:
config {
type: "table",
dependencies: [ "dataform_A_assertions_uniqueKey_0", "dataform_A_assertions_nonNull"]
}
The following code sample show a manual assertion defined in the
manualAssertion.sqlx
file, added as a dependency to a view:
config {
type: "view",
dependencies: [ "manualAssertion"]
}
The following code sample shows the manual_assertion
file and the
assertions of the sometable
table added as dependencies to a table:
config {
type: "table",
dependencies: [ "manual_assertion", "dataform_sometable_assertions_nonNull" , "dataform_sometable_assertions_rowConditions"]
}
SELECT * FROM ${ref("referenced_table")} LEFT JOIN ...
Set assertions of a selected action as dependencies
To execute a workflow action only when all direct assertions
of a selected dependency action pass,
set the includeDependentAssertions
parameter to true
in the edited action.
Dataform automatically adds direct assertions of the selected dependency
action to dependencies during compilation. The default value is false
.
To set all assertions of a selected dependency action as a dependencies, follow these steps:
- In your development workspace, in the Files pane, expand
definitions/
. - Select a workflow action SQLX file.
In the file, set the
includeDependentAssertions
parameter totrue
in one of the following ways:In the
config
blockconfig { type: "ACTION_TYPE", dependencies: [{name: "dEPENDENCY_ACTION_NAME", includeDependentAssertions: true}] }
Replace the following:
- ACTION_TYPE: the type of the workflow action:
table
,view
, oroperations
. - DEPENDENCY_ACTION_NAME: the name of the dependency action which assertions you want to set as dependencies of the edited action.
In the
SELECT
statementconfig { type: "ACTION_TYPE" } SELECT * FROM ${ref({name: "DEPENDENCY_ACTION_NAME", includeDependentAssertions: true})}
Replace the following:
- ACTION_TYPE: the type of the workflow action:
table
,view
, oroperations
. - DEPENDENCY_ACTION_NAME: the name of the dependency action which assertions you want to set as dependencies of the edited action.
- ACTION_TYPE: the type of the workflow action:
Optional: Click Format.
The following code sample shows tableC
that depends on viewA
, tableB
,
and all assertions of tableB
:
// filename is tableC.sqlx
config {
type: "table",
dependencies: ["viewA", {name: "tableB", includeDependentAssertions: true}]
}
SELECT * FROM ...
In the preceding code sample, Dataform automatically adds all
direct assertions of tableB
as dependencies to tableC
during compilation.
Set assertions of all dependency actions as dependencies
To execute a workflow action only when all direct assertions
of all dependency actions pass,
set the dependOnDependencyAssertions
parameter to true
in the edited action.
Dataform automatically adds direct assertions of dependency
actions as dependencies during compilation. The default value is false
.
When you set the dependOnDependencyAssertions
parameter and the
includeDependentAssertions
parameters in a single file,
the includeDependentAssertions
parameter takes priority for the dependency
action for which it is set.
To set all assertions of a selected dependency action as a dependencies, follow these steps:
- In your development workspace, in the Files pane, expand
definitions/
. - Select a workflow action SQLX file.
In the file, set the
dependOnDependencyAssertions
parameter totrue
in the following format:config { type: "ACTION_TYPE", dependOnDependencyAssertions: true, dependencies: [ "dependency1", "dependency2" ] }
Replace ACTION_TYPE: the type of the workflow action. Supported values include
table
,view
, andoperations
.Optional: Click Format.
The following code sample shows sometableE
that depends on sometableA
,
sometabletableB
, sometableC
, and sometableD
and all direct assertions of
dependency tables:
// filename is sometableE.sqlx
config {
type: "table",
dependOnDependencyAssertions: true,
dependencies: [ "sometableA", "sometableB" ]
}
SELECT * FROM ${ref("sometableC")}
SELECT * FROM ${ref("sometableD")}
In the preceding code sample, Dataform automatically adds all
direct assertions of sometableA
, sometableB
, sometableC
, and sometableD
as dependencies to sometableE
during compilation.
What's next
- To learn more about assertion types, see Dataform API.
- To learn how to define assertions with JavaScript, see Create SQL workflows with JavaScript.
- To learn how to manually execute workflows, see Trigger execution.