Introduction to SQL workflows

This document helps you understand the architecture and execution of SQL workflows in Dataform.

You can use Dataform to develop, test, and version control SQL workflows that you can execute in BigQuery to transform data for analytics purposes. You can develop SQL workflows with Dataform core, using SQLX files and optionally JavaScript files, or with JavaScript.

A SQL workflow can consist of the following objects:

Data source declarations
Declarations of BigQuery data sources that let you reference these data sources in Dataform table definitions and SQL operations.
Tables
Tables that you create in Dataform based on the declared data sources or other tables in your SQL workflow. Dataform supports the following table types: table, incremental table, view, and materialized view.
Assertions
Data quality test queries that you can use to validate table data. Dataform runs assertions every time it updates your SQL workflow and it alerts you if any assertions fail.
Custom SQL operations
SQL statements that Dataform runs in BigQuery as they are, without modification.
Includes
JavaScript files with definitions of variables and functions that you can reuse across your SQL workflow.

Visualisation of a SQL workflow

You can view your SQL workflow visualized in the form of a Directed Acyclic Graph (DAG). The DAG displays all objects of the SQL workflow defined in your workspace, and the relationships between them. You can zoom in and out, and use drag and drop to navigate the DAG. If there are compilation errors in your SQL workflow, Dataform displays an error message instead of the DAG.

To view the DAG of your SQL workflow, in your workspace, click Compiled graph.

Execution of a SQL workflow

In your development workspace, you can manually trigger execution of your entire SQL workflow, a selection of actions, or a selection of tags.

You can schedule executions with Dataform release configurations and workflow configurations. First, create a release configuration to create compilation results of your repository. Then, create a workflow configuration, select a release configuration, select SQL workflow actions that you want to execute, and set the execution schedule.

Alternatively, you can schedule executions either with Cloud Composer, or with Workflows and Cloud Scheduler.

During execution, Dataform executes SQL queries in BigQuery, following the order of object dependencies in your SQL workflow. After execution, you can use your defined tables and views for all your analytics purposes in BigQuery.

Execution configuration options

To execute a specific group of your SQL workflow objects, you can add Dataform execution tags to the files you select. You can then execute only the files with a selected tag when you manually trigger execution.

By default, Dataform executes your SQL workflow with the execution settings defined in the dataform.json file. You can override these execution settings with compilation overrides.

With workspace compilation overrides, you can turn workspaces into isolated execution environments. This means that when you manually trigger execution in a workspace, Dataform executes the output in an isolated location in BigQuery.

To create and execute a single compilation result with compilation overrides you can pass requests with the Dataform API.

With release configurations, you can configure compilation overrides for your whole repository, as well as the frequency of creating compilation results with the applied settings.

To learn more about ways to configure compilation and code lifecycle in Dataform, see Introduction to code lifecycle in Dataform.

What's next