This page shows you how to edit the Dataform SQL workflow processing settings for a specific repository.
You might want to edit the settings file to rename the schemas or add custom compilation variables to your repository.
About repository settings
Each Dataform repository contains a unique workflow settings file. The file contains the Google Cloud project ID and the schema in which Dataform publishes assets in BigQuery. Dataform uses default settings that you can override to best suit your needs by editing the settings file.
As of Dataform core 3.0.0,
workflow settings are stored in the workflow_settings.yaml file by default.
In earlier versions of Dataform core,
workflow settings are stored in the dataform.json
file.
The Dataform core 3.0 workflow_settings.yaml
file is
backward compatible with dataform.json
. You can continue to use
dataform.json
to store workflow settings. As a best practice, you should
migrate your repository workflow settings
to the workflow_settings.yaml
format for future compatibility.
About workflow_settings.yaml
The workflow_settings.yaml
file, introduced in
Dataform core 3.0,
stores Dataform workflow settings in the YAML
format.
The following code sample shows a sample workflow_settings.yaml
file:
defaultProject: my-gcp-project-id
defaultDataset: dataform
defaultLocation: australia-southeast2
defaultAssertionDataset: dataform_assertions
In the preceding code sample, the key-value pairs are described as follows:
defaultProject
: Your BigQuery Google Cloud project ID.defaultDataset
: The BigQuery dataset in which Dataform creates assets, calleddataform
by default.defaultLocation
: Your default BigQuery dataset region. For more information on BigQuery dataset locations, see Dataset locations In this location, Dataform processes your code and stores executed data. This processing region has to match the location of your BigQuery datasets, but does not need to match the Dataform repository region.defaultAssertionDataset
: The BigQuery dataset in which Dataform creates views with assertion results, calleddataform_assertions
by default.
For more information about workflow_settings.yaml
properties, see
configs reference for workflow settings
in GitHub.
You can access the properties defined in workflow_settings.yaml
in your
Dataform code as properties of the
dataform.projectConfig
object.
The following mappings from workflow_settings.yaml
options to the code accessible
dataform.projectConfig
options apply:
defaultProject
=>defaultDatabase
.defaultDataset
=>defaultSchema
.defaultAssertionDataset
=>assertionSchema
.projectSuffix
=>databaseSuffix
.datasetSuffix
=>schemaSuffix
.namePrefix
=>tablePrefix
.
The following code sample shows the dataform.projectConfig
object referenced
in a SELECT
statement in a view:
config { type: "view" }
SELECT ${when(
!dataform.projectConfig.tablePrefix,
"table prefix is set!",
"table prefix is not set!"
)}
About dataform.json
The dataform.json
file stores Dataform workflow settings
in the JSON
format.
The following code sample shows a sample dataform.json
file:
{
"warehouse": "bigquery",
"defaultDatabase": "my-gcp-project-id",
"defaultSchema": "dataform",
"defaultLocation": "australia-southeast2",
"assertionSchema": "dataform_assertions"
}
In the preceding code sample, the key-value pairs are described as follows:
warehouse
: Pointer to BigQuery, where Dataform creates assets.defaultDatabase
: Your BigQuery Google Cloud project ID.defaultSchema
: The BigQuery dataset in which Dataform creates assets.defaultLocation
: Your default BigQuery dataset region. For more information on BigQuery dataset locations, see Dataset locations. In this location, Dataform processes your code and stores executed data. This processing region has to match the location of your BigQuery datasets, but does not need to match the Dataform repository region.assertionSchema
: The BigQuery dataset in which Dataform creates views with assertion results, calleddataform_assertions
by default.
You can access the properties defined in dataform.json
in your project code as
properties of the dataform.projectConfig
object.
Before you begin
Required roles
To get the permissions that you need to configure Dataform settings,
ask your administrator to grant you the
Dataform Admin (roles/dataform.admin
) IAM role on repositories.
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Configure schema names
To configure schema names, you need to edit the defaultDataset
and
defaultAssertionSchema
properties in
the workflow_settings.yaml
file, or the defaultSchema
and assertionSchema
properties in the dataform.json
file.
To configure the name of a schema, follow these steps:
workflow_settings.yaml
In your development workspace, in the Files pane, click the
workflow_settings.yaml
file.Edit the value of
defaultDataset
,defaultAssertionSchema
, or both.
The following code sample shows a custom defaultDataset
name defined in
the workflow_settings.yaml
file:
...
defaultDataset: mytables
...
dataform.json
In your development workspace, in the Files pane, click the
dataform.json
file.Edit the value of
defaultSchema
,assertionSchema
, or both.
The following code sample shows a custom defaultSchema
name defined in the
dataform.json
file:
{
...
"defaultSchema": "mytables",
...
}
Create custom compilation variables
Compilation variables contain values that you can modify with compilation overrides in a release configuration or in a Dataform API request.
After you define a compilation variable in workflow_settings.yaml
and add it to
selected tables, you can use modify its value in a release configuration or
Dataform API compilation overrides
to execute tables conditionally.
For more information about executing tables conditionally by using compilation variables, see Introduction to code lifecycle in Dataform.
To create a compilation variable that you can use across a repository, follow these steps:
workflow_settings.yaml
- Go to your Dataform development workspace.
- In the Files pane, select the
workflow_settings.yaml
file. Enter the following code snippet:
"vars": { "YOUR_VARIABLE":"VALUE" }
Replace the following:
- YOUR_VARIABLE with a name for the variable.
- VALUE with the default value of the compilation variable.
The following code sample shows the myVariableName
compilation variable set
to myVariableValue
in the workflow_settings.yaml
file:
...
vars:
myVariableName: myVariableValue
...
The following code sample shows the workflow_settings.yaml
file with the
executionSetting
compilation variable set to dev
:
defaultProject: default_bigquery_database
defaultLocation: us-west1
defaultDataset: dataform_data,
vars:
executionSetting: dev
dataform.json
- Go to your Dataform development workspace.
- In the Files pane, select the
dataform.json
file. Enter the following code snippet:
"vars": { "YOUR_VARIABLE":"VALUE" }
Replace the following:
- YOUR_VARIABLE with a name for the variable.
- VALUE with the default value of the compilation variable.
The following code sample shows the myVariableName
compilation variable set
to myVariableValue
in the dataform.json
file:
{
...
"vars": {
"myVariableName": "myVariableValue"
},
...
}
The following code sample shows the dataform.json
file with the
executionSetting
compilation variable set to dev
:
{
"warehouse": "bigquery",
"defaultSchema": "dataform_data",
"defaultDatabase": "default_bigquery_database".
"defaultLocation":"us-west-1",
"vars": {
"executionSetting":"dev"
}
}
Add a compilation variable to a table
To add a compilation variable to a SQLX table definition file, follow these steps:
- Go to your Dataform development workspace.
- In the Files pane, select a SQLX table definition file.
In the file, enter a
when
clause in the following format:${when(dataform.projectConfig.vars.YOUR_VARIABLE === "SET_VALUE", "CONDITION")}
Replace the following:
- YOUR_VARIABLE with the name of your variable,
for example
executionSetting
. - SET_VALUE with a value for the variable,
for example,
staging
. - CONDITION with a condition for execution of the table.
- YOUR_VARIABLE with the name of your variable,
for example
The following code sample shows a table definition SQLX file with a when
clause and the executionSetting
variable that executes 10% of data in the
staging execution setting:
select
*
from ${ref("data")}
${when(
dataform.projectConfig.vars.executionSetting === "staging",
"where mod(farm_fingerprint(id) / 10) = 0",
)}
The following code sample shows a view definition SQLX file with a when
clause and the myVariableName
variable:
config { type: "view" }
SELECT ${when(
dataform.projectConfig.vars.myVariableName === "myVariableValue",
"myVariableName is set to myVariableValue!",
"myVariableName is not set to myVariableValue!"
)}
Migrate workflow settings to workflow_settings.yaml
To ensure your workflow settings file is compatible with future
Dataform core framework versions, you should migrate your workflow
settings from the dataform.json
file to the workflow_settings.yaml
file.
The workflow_settings.yaml
file replaces the dataform.json
file.
If Dataform core is the only dependency package in your repository,
then the workflow_settings.yaml
file also replaces the package.json
file.
For more information about replacing package.json
with
workflow_settings.yaml
, see
Manage the Dataform core package.
The following table shows mapping of workflow settings properties from
dataform.json
to workflow_settings.yaml
:
Property in dataform.json |
Property in workflow_settings.yaml |
Description |
---|---|---|
assertionSchema |
defaultAssertionDataset |
Required. The default dataset for assertions. |
defaultDatabase |
defaultProject |
Required. The default Google Cloud project name. |
defaultLocation |
defaultLocation |
Required. The default BigQuery location where Dataform will create tables. For more information on BigQuery locations, see Dataset locations. |
defaultSchema |
defaultDataset |
Required. The default dataset. |
databaseSuffix |
projectSuffix |
Optional. The suffix to append to all Google Cloud project references. |
schemaSuffix |
datasetSuffix |
Optional. The suffix to append to all action datasets. |
tablePrefix |
namePrefix |
Optional. The prefix to prepend to all action names. |
vars |
vars |
Optional. User-defined variables that are made available to project code during compilation. An object containing a list of key: value pairs. |
warehouse |
- | Required in dataform.json . Must be set to bigquery . Not available in workflow_settings.yaml . |
- | dataformCoreVersion |
Not available in dataform.json . Istalled in workflow_settings.yaml by default for new repositories. For more information, see Manage the Dataform core package. |
To migrate workflow settings to workflow_settings.yaml
, follow these steps:
In the Google Cloud console, go to the Dataform page.
Select a repository, and then select a workspace.
In the Files pane, click addAdd, and then click Create file.
In the Add a file path field, enter
workflow_settings.yaml
.Click Create file.
In the
workflow_settings.yaml
file, add the settings from yourdataform.json
file, mapped to the YAML format.In the Files pane, click the More menu next to
dataform.json
, and then click Delete.To confirm deletion of
dataform.json
, click Delete.
The following code sample shows workflow settings defined in
a dataform.json
file:
{
"warehouse": "bigquery",
"defaultDatabase": "dataform-demos",
"defaultLocation": "US",
"defaultSchema": "dataform",
"assertionSchema": "dataform_assertions"
"vars": {
"environmentName": "development"
}
}
The following code sample shows the preceding dataform.json
file
converted to workflow_settings.yaml
:
defaultProject: dataform-demos
defaultLocation: US
defaultDataset: dataform
defaultAssertionDataset: dataform_assertions
vars:
environmentName: "development"
What's next
To learn about migrating the Dataform core package to
workflow_settings.yaml
, see Manage the Dataform core package.To learn more about Dataform project settings, see
IProjectConfig
reference.