This document shows you how to define the relationship between objects in your SQL workflow in Dataform by declaring dependencies.
You can define a dependency relationship between objects of a SQL workflow. In a dependency relationship, the execution of the dependent object depends on the execution of the dependency object. This means that Dataform executes the dependent after the dependency. You define the relationship by declaring dependencies inside the SQLX definition file of the dependent object.
The dependency declarations make up a dependency tree of your SQL workflow that determines the order in which Dataform executes your SQL workflow objects.
You can define the dependency relationship between the following SQL workflow objects:
- Data source declarations
- Declarations of BigQuery data sources that let you reference these data sources in Dataform table definitions and SQL operations. You can set a data source declaration as a dependency, but not as a dependent.
- Tables
- Tables that you create in Dataform based on the declared data sources or other tables in your SQL workflow. Dataform supports the following table types: table, incremental table, view, and materialized view. You can set a table as a dependency and as a dependent.
- Custom SQL operations
- SQL statements that Dataform runs in BigQuery as they are,
without modification. You can set a custom SQL operation defined in a
type: operations
file as a dependency and as a dependent. To declare a custom SQL operation as a dependency in theref
function, you need to set thehasOutput
property totrue
in the custom SQL operation SQLX definition file. - Assertions
- Data quality test queries that you can use to test table data.
Dataform runs assertions every time it updates your SQL workflow and
it alerts you if any assertions fail. You can set an assertion defined in a
type: assertion
file as a dependency and as a dependent by declaring dependencies in theconfig
block.
You can define the dependency relationship in the following ways:
- Declare a dependency by using the Dataform core
ref
function to reference the dependency in aSELECT
statement. - Declare a list of dependencies in the
config
block of a SQLX definition file.
Before you begin
- Create and initialize a development workspace in your repository.
- Optional: Declare a data source.
- Create at least two SQL workflow objects: tables, assertions, data source declarations, or operations.
Required roles
To get the permissions that you need to declare dependencies for tables, assertions, data source
declarations, and custom SQL operations,
ask your administrator to grant you the
Dataform Editor (roles/dataform.editor
) IAM role on workspaces.
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Declare a dependency as an argument of the ref
function
To reference and automatically declare a dependency in a SELECT
statement,
add the dependency as an argument of the ref
function.
The ref
function is a Dataform core built-in function that lets you
reference and automatically depend on any table, data source declaration, or
custom SQL operation with the hasOutput
property set to true
in your SQL workflow.
For more information about the ref
function, see
Dataform core context methods reference.
For more information about using the ref
function in a table definition, see
About table definitions.
The following code sample shows the source_data
data source declaration added
as an argument of the ref
function in the incremental_table.sqlx
SQLX
definition file of an incremental table:
// filename is incremental_table.sqlx
config { type: "incremental" }
SELECT * FROM ${ref("source_data")}
In the preceding code sample, source_data
is automatically declared a
dependency of incremental_table
.
The following code sample shows some_table
table definition SQLX file added
as an argument of the ref
function in the custom_assertion.sqlx
SQLX definition file of an assertion:
// filename is custom_assertion.sqlx
config { type: "assertion" }
SELECT
*
FROM
${ref("some_table")}
WHERE
a is null
or b is null
or c is null
In the preceding code sample, some_table
is automatically declared a
dependency of custom_assertion
. During execution, Dataform executes
some_table
first, and then executes custom_assertion
once some_table
is created.
Declare dependencies in the config
block
To declare dependencies that are not referenced in the SQL statement definition of the dependent, but need to be executed before the table, assertion, or custom SQL operation, follow these steps:
- In your development workspace, in the Files pane, expand
the
definitions/
directory. - Select the table, assertion, or custom SQL operation SQLX file that you want to edit.
In the
config
block of the file, enter the following code snippet:dependencies: [ "DEPENDENCY", ]
Replace DEPENDENCY with the filename of the table, assertion, data source declaration, or custom SQL operation that you want to add as a dependency. You can enter multiple filenames, separated by commas.
Optional: Click Format.
The following code sample shows the some_table
table and some_assertion
assertion added as dependencies to the config
block of a table definition file:
config { dependencies: [ "some_table", "some_assertion" ] }
What's next
- To learn how to set assertions as dependencies, see Test tables with assertions.
- To learn how to declare a data source, see Declare a data source.
- To learn how to define custom SQL operations, see Add custom SQL operations.
- To learn how to reuse code across your SQL workflow with includes, see Reuse variables and functions with includes.