Dataform is a serverless service for data analysts to develop and deploy tables, incremental tables, or views to BigQuery. Dataform offers a web environment for SQL workflow development, connection with GitHub, GitLab, Azure DevOps Services, and Bitbucket, continuous integration, continuous deployment, and workflow execution.
Repositories
Each Dataform project is stored in a repository. A Dataform repository houses a collection of JSON configuration files, SQLX files, and JavaScript files.
Dataform repositories contain the following types of files:
Config files
Config JSON or SQLX files let you configure your SQL workflows. They contain general configuration, execution schedules, or schema for creating new tables and views.
Definitions
Definitions are SQLX and JavaScript files that define new tables, views, and additional SQL operations to run in BigQuery.
Includes
Includes are JavaScript files where you can define variables and functions to use in your project.
Each Dataform repository is connected to a service account. You can select a service account when you create a repository or edit the service account later.
By default, Dataform uses a service account derived from your project number in the following format:
service-YOUR_PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com
Version control
Dataform uses the Git version control system to maintain a record of each change made to project files and to manage file versions.
Each Dataform repository can manage its own Git repository, or be connected to a remote third-party Git repository. You can connect a Dataform repository to a GitHub, GitLab, Azure DevOps Services, or Bitbucket repository.
Users version control their SQL workflow code inside Dataform workspaces. In a Dataform workspace, you can pull changes from the repository, commit all or selected changes, and push them to Git branches of the repository.
Workflow development
In Dataform, you make changes to files and directories inside a development workspace. A development workspace is a virtual, editable copy of the contents of a Git repository. Dataform preserves the state of files in your development workspace between sessions.
In a development workspace, you can develop SQL workflow actions by using Dataform core with SQLX and JavaScript, or exclusively with JavaScript. You can automatically format your Dataform core or JavaScript code.
Each element of a Dataform SQL workflow, such as a table or assertion, corresponds to an action that Dataform performs in BigQuery. For example, a table definition file is an action of creating or updating the table in BigQuery.
In a Dataform workspace, you can develop the following SQL workflow actions:
- Source data declarations
- Tables and views
- Incremental tables
- Table partitions and clusters
- Dependencies between actions
- Documentation of tables
- Custom SQL operations
- BigQuery labels
- BigQuery policy tags
- Dataform tags
- Data quality tests, called assertions
You can use JavaScript to reuse your Dataform SQL workflow code in the following ways:
- Across a file with code encapsulation
- Across a repository with includes
- Across repositories with packages
Dataform compiles the SQL workflow code in your workspace in real-time. In your workspace, you can view the compiled queries and details of actions in each file. You can also view the compilation status and errors in the edited file or in the repository.
To test the output of a compiled SQL query before you execute it to BigQuery, you can run preview of the query in your Dataform workspace.
To inspect the entire SQL workflow defined in your workspace, you can view an interactive compiled graph that shows all compiled actions in your SQL workflow and relationships between them.
Workflow compilation
Dataform uses default compilation settings, configured in the workflow settings file, to compile the SQL workflow code in your workspace to SQL in real-time, creating a compilation result of the workspace.
You can override compilation settings to customize how Dataform compiles your SQL workflow into a compilation result.
With workspace compilation overrides, you can configure compilation overrides for all workspaces in a repository. You can set dynamic workspace overrides to create compilation results custom for each workspace, turning workspaces into isolated development environments. You can override the Google Cloud project in which Dataform will execute the contents of a workspace, add a prefix to names of all compiled tables, and add a suffix to the default schema.
With release configurations, you can configure templates of compilation settings for creating compilation results of a Dataform repository. In a release configuration, you can override the Google Cloud project in which Dataform will execute compilation results, add a prefix to names of all compiled tables, add a suffix the default schema, and add compilation variables. You can also set the frequency of creating compilation results. To schedule executions of compilation results created in a selected release configuration, you can create a workflow configuration.
Workflow execution
During workflow execution, Dataform executes compilation results of SQL workflows to create or update assets in BigQuery.
To create or refresh the tables and views defined in your SQL workflow in BigQuery, you can start a workflow execution manually in a development workspace or schedule executions.
You can schedule Dataform executions in BigQuery in the following ways:
- Create workflow configurations to schedule executions of compilation results created in release configurations
- Schedule executions with Cloud Composer
- Schedule executions with Workflows and Cloud Scheduler
To debug errors, you can monitor executions in the following ways:
- View detailed Dataform execution logs
- View audit logs for Dataform
- View Cloud Logging logs for Dataform
What's next
- To learn more about Dataform core, see Overview of Dataform core.
- To learn more about Dataform repositories, see Introduction to repositories.
- To learn more about Dataform workspaces, see Introduction to developing in a workspace.
- To learn more about developing SQL workflows in Dataform, see Introduction to SQL workflows.
- To learn more about using JavaScript in Dataform, see Introduction to JavaScript in Dataform.
- To learn more about code lifecycle in Dataform, see Introduction to code lifecycle in Dataform.