Introduction to code lifecycle in Dataform

This document describes code lifecycle in Dataform and ways to configure compilation and execution within Dataform.

About code lifecycle in Dataform

Dataform code lifecycle consists of the following phases:

Development
You develop a SQL workflow in a Dataform workspace.
Compilation

Dataform compiles the SQL workflow code in your workspace to SQL in real time, creating a compilation result of the workspace that you can execute in BigQuery. Dataform uses settings that you defined in the workflow_settings.yaml file to create the compilation result.

Dataform compilation is hermetic to ensure compilation consistency, meaning that the same code compiles to the same SQL compilation result every time. Dataform compiles your code in a sandbox environment with no internet access. No additional actions, such as calling external APIs, are available during compilation.

Execution

In a workflow invocation, Dataform executes the workspace compilation result in BigQuery.

To tailor Dataform code lifecycle to your needs, you can configure the compilation result to influence where and how Dataform executes your SQL workflow. Then, you can manually trigger or schedule executions to influence when Dataform executes your whole SQL workflow or its selected elements.

Ways to configure Dataform compilation

By default, Dataform uses settings in the workflow_settings.yaml file to create compilation results. You can override the default settings with compilation overrides to create custom compilation results. You can then manually trigger execution of a custom compilation result, or schedule executions.

Dataform provides the following options of configuring compilation results:

Workspace compilation overrides
You can configure compilation overrides that apply to all workspaces in a repository. You can use workspace compilation overrides to create isolated development environments.
Release configurations
You can create release configurations to configure templates for creating compilation results of a Dataform repository. You can then create a workflow configuration to schedule executions of compilation results created in a selected release configuration.
Dataform API compilation overrides
You can pass Dataform API requests in the terminal to create and execute a single compilation result with compilation overrides.

Configure workspace compilation overrides

With workspace compilation overrides, you can create compilation overrides for all workspaces in a Dataform repository. You can create one configuration of workspace compilation overrides per repository.

When you manually trigger execution in a workspace in a repository with workspace compilation overrides, Dataform applies these overrides to the compilation result of the workspace.

You can configure the following workspace compilation overrides:

  • Google Cloud project in which Dataform executes the contents of the workspace
  • Table prefix
  • Schema suffix

You can use workspace compilation overrides to create isolated development environments by isolating workspace compilation results in BigQuery with dynamic compilation overrides. Dynamic table prefix and schema suffix compilation overrides contain the ${workspaceName} variable. When you trigger execution in a workspace, Dataform replaces the ${workspaceName} variable with the name of the current workspace, creating compilation overrides unique to the workspace.

Keep in mind that you cannot schedule executions of compilation results created with workspace compilation overrides.

Create release configurations

With release configurations, you can configure templates of settings for creating compilation results of repositories.

In a release configuration, you can configure compilation overrides of workflow_settings.yaml settings, compilation variables, and the frequency of creating compilation results of your whole repository.

In a release configuration, you can configure the following compilation overrides:

You can create multiple release configurations in a Dataform repository, one for each stage of your development lifecycle, creating isolated repository compilation results.

You can then create workflow configurations to schedule executions of compilation results created in a selected release configuration.

You can also manually trigger execution of a compilation result in a selected release configuration.

Configure a single compilation result with Dataform API compilation overrides

By passing Dataform API requests in the terminal, you can configure compilation overrides for a single compilation result.

In the compilationResults.create request, you can create a single compilation result of a Dataform workspace or a specified Git comittish.

In the CodeCompilationConfig object of the compilationResults.create request, you can configure compilation overrides for the compilation request.

You can configure the following Dataform API compilation overrides:

Keep in mind that Dataform API compilation overrides apply to a single compilation result and a single execution. You cannot use them to schedule Dataform executions.

You can execute a compilation result in the workflowInvocations.create request.

Ways to configure Dataform execution

Dataform provides the following options of configuring execution:

Manual execution in a workspace
You can manually trigger instant execution of a SQL workflow in a Dataform workspace, outside of any schedule. You can execute selected actions in the SQL workflow.
Workflow configurations
You can schedule executions of compilation results created in a selected release configuration. You can select SQL workflow actions to execute, and set the frequency and time zone of executions.

Trigger instant execution in a workspace

In a Dataform workspace, you can manually instant execution of the SQL workflow in your workspace, outside of any schedule.

You can manually execute the following elements of the SQL workflow in your workspace:

If your repository contains workspace compilation overrides, you can view what compilation overrides Dataform will apply to the workspace compilation result.

Create workflow configurations

With workflow configurations, you can schedule executions of compilation results from a selected release configuration. You can create multiple workflow configurations in a Dataform repository.

In a workflow configuration, you can configure the following execution settings:

  • Applied compilation release configuration
  • Selection of SQL workflow actions to be executed
  • Schedule and time zone of executions

You can select the following SQL workflow actions to be executed:

  • All actions
  • Selected actions
  • Actions with selected tags

Then, during a scheduled execution of your workflow configuration, Dataform deploys your selection of actions from the applied compilation result to BigQuery.

Dataform release configurations and workflow configurations let you configure compilation and schedule executions within Dataform, without the need to rely on additional services.

Expiration of lifecycle resources

Dataform stores compilation results and workflow invocations for a specific period of time.

Expiration of workflow invocations

Workflow invocations expire after 90 days, or when you manually delete them.

In a workflow configuration, you can view a list of most recent workflow invocations created by the configuration. When a workflow invocation created by a workflow configuration expires, Dataform removes that workflow invocation from the list of recent invocations.

Expiration of compilation results

Expiration of compilation results depends on the way they are created: in a development workspace, in a release configuration, or by a workflow invocation.

When you develop a SQL workflow in a Dataform workspace, Dataform compiles your code into a compilation result in real-time to provide query validation. Compilation results created this way expire after 24 hours.

In a release configuration, the latest compilation result becomes the live compilation result. A new compilation result replaces the current live compilation result. Dataform retains the live compilation result until it is replaced with a new compilation result. A replaced compilation result expires in up to 24 hours.

Dataform removes expired compilation results from from the list of past compilation results on the Details page of a release configuration.

Dataform retains compilation results created by workflow invocations for the whole life of the workflow invocation, up to 24 hours after workflow invocation expires or is deleted.

What's next