Introduction to SQL workflows

This document helps you understand the architecture and execution of SQL workflows in Dataform.

You can use Dataform to develop, test, and version control SQL workflows that you can execute in BigQuery to transform data for analytics purposes. You can develop SQL workflows with Dataform core, using SQLX files and optionally JavaScript files, or with JavaScript.

A SQL workflow can consist of the following objects:

Data source declarations
Declarations of BigQuery data sources that let you reference these data sources in Dataform table definitions and SQL operations.
Tables
Tables that you create in Dataform based on the declared data sources or other tables in your SQL workflow. Dataform supports the following table types: table, incremental table, view, and materialized view.
Assertions
Data quality test queries that you can use to validate table data. Dataform runs assertions every time it updates your SQL workflow and it alerts you if any assertions fail.
Custom SQL operations
SQL statements that Dataform runs in BigQuery as they are, without modification.
Includes
JavaScript files with definitions of variables and functions that you can reuse across your SQL workflow.

Visualisation of a SQL workflow

You can view your SQL workflow visualized in the form of a Directed Acyclic Graph (DAG). The DAG displays all objects of the SQL workflow defined in your workspace, and the relationships between them. You can zoom in and out, and use drag and drop to navigate the DAG. If there are compilation errors in your SQL workflow, Dataform displays an error message instead of the DAG.

To view the DAG of your SQL workflow, in your workspace, click Compiled graph.

Execution of a SQL workflow

In your development workspace, you can manually trigger execution of your entire SQL workflow, a selection of actions, or a selection of tags.

You can schedule executions either with Cloud Composer, or with Workflows and Cloud Scheduler.

During execution, Dataform executes SQL queries in BigQuery, following the order of object dependencies in your SQL workflow. After execution, you can use your defined tables and views for all your analytics purposes in BigQuery.

Execution configuration options

To execute a specific group of your SQL workflow objects, you can add Dataform execution tags to the files you select. You can then execute only the files with a selected tag when you manually trigger execution.

To isolate data executed at different stages of the development lifecycle, you can configure staging and production workflow execution settings with compilation overrides.

What's next