Configure Dataform workflow settings

This page shows you how to edit the Dataform SQL workflow processing settings for a specific repository.

You might want to edit the settings file to rename the schemas or add custom compilation variables to your repository.

About repository settings

Each Dataform repository contains a unique workflow settings file. The file contains the Google Cloud project ID and the schema in which Dataform publishes assets in BigQuery. Dataform uses default settings that you can override to best suit your needs by editing the settings file.

As of Dataform core 3.0, by default, workflow settings are stored in the workflow_settings.yaml file. In earlier versions of Dataform core, workflow settings are stored in the dataform.json file. The Dataform core 3.0 workflow_settings.yaml file is backward compatible with dataform.json. You can continue to use dataform.json to configure Dataform settings.

About workflow_settings.yaml

The workflow_settings.yaml file, introduced in Dataform core 3.0, stores Dataform workflow settings in the YAML format.

The following code sample shows a sample workflow_settings.yaml file:

  defaultProject: my-gcp-project-id
  defaultDataset: dataform
  defaultLocation: australia-southeast2
  defaultAssertionDataset: dataform_assertions

In the preceding code sample, the key-value pairs are described as follows:

  • defaultProject: Your BigQuery Google Cloud project ID.
  • defaultDataset: The BigQuery dataset in which Dataform creates assets, called dataform by default.
  • defaultLocation: Your default BigQuery dataset region. For more information on BigQuery dataset locations, see Dataset locations In this location, Dataform processes your code and stores executed data. This processing region has to match the location of your BigQuery datasets, but does not need to match the Dataform repository region.
  • defaultAssertionDataset: The BigQuery dataset in which Dataform creates views with assertion results, called dataform_assertions by default.

For more information about workflow_settings.yaml properties, see configs reference for workflow settings in GitHub.

You can access the properties defined in workflow_settings.yaml in your Dataform code as properties of the dataform.projectConfig object.

The following mappings from workflow_settings.yaml options to the code accessible dataform.projectConfig options apply:

  • defaultProject => defaultDatabase.
  • defaultDataset => defaultSchema.
  • defaultAssertionDataset => assertionSchema.
  • projectSuffix => databaseSuffix.
  • datasetSuffix => schemaSuffix.
  • namePrefix => tablePrefix.

The following code sample shows the dataform.projectConfig object called in a SELECT statement in a view:

  config { type: "view" }
  SELECT ${when(
    !dataform.projectConfig.tablePrefix,
    "table prefix is set!",
    "table prefix is not set!"
  )}

About dataform.json

The dataform.json file stores Dataform workflow settings in the JSON format.

The following code sample shows a sample dataform.json file:

  {
    "warehouse": "bigquery",
    "defaultDatabase": "my-gcp-project-id",
    "defaultSchema": "dataform",
    "defaultLocation": "australia-southeast2",
    "assertionSchema": "dataform_assertions"
  }

In the preceding code sample, the key-value pairs are described as follows:

  • warehouse: Pointer to BigQuery, where Dataform creates assets.
  • defaultDatabase: Your BigQuery Google Cloud project ID.
  • defaultSchema: The BigQuery dataset in which Dataform creates assets, called dataform by default.
  • defaultLocation: Your default BigQuery dataset region. For more information on BigQuery dataset locations, see Dataset locations In this location, Dataform processes your code and stores executed data. This processing region has to match the location of your BigQuery datasets, but does not need to match the Dataform repository region.
  • assertionSchema: The BigQuery dataset in which Dataform creates views with assertion results, called dataform_assertions by default.

You can access the properties defined in dataform.json in your project code as properties of the dataform.projectConfig object.

Before you begin

  1. Create a repository.

  2. Create and initialize a development workspace in your repository.

Required roles

To get the permissions that you need to configure Dataform settings, ask your administrator to grant you the Dataform Admin (roles/dataform.admin) IAM role on repositories. For more information about granting roles, see Manage access.

You might also be able to get the required permissions through custom roles or other predefined roles.

Configure schema names

To configure schema names, you need to edit the defaultDataset and defaultAssertionSchema properties in the workflow_settings.yaml file, or the defaultSchema and assertionSchema properties in the dataform.json file.

To configure the name of a schema, follow these steps:

workflow_settings.yaml

  1. In your development workspace, in the Files pane, click the workflow_settings.yaml file.

  2. Edit the value of defaultDataset, defaultAssertionSchema, or both.

The following code sample shows a custom defaultDataset name defined in the workflow_settings.yaml file:

  ...
  defaultDataset: mytables
  ...

dataform.json

  1. In your development workspace, in the Files pane, click the dataform.json file.

  2. Edit the value of defaultSchema, assertionSchema, or both.

The following code sample shows a custom defaultSchema name defined in the dataform.json file:

{
  ...
  "defaultSchema": "mytables",
  ...
}

Create custom compilation variables

Compilation variables contain values that you can modify with compilation overrides in a release configuration or in a Dataform API request.

After you define a compilation variable in workflow_settings.yaml and add it to selected tables, you can use modify its value in a release configuration or Dataform API compilation overrides to execute tables conditionally.

For more information about executing tables conditionally by using compilation variables, see Introduction to code lifecycle in Dataform.

To create a compilation variable that you can use across a repository, follow these steps:

workflow_settings.yaml

  1. Go to your Dataform development workspace.
  2. In the Files pane, select the workflow_settings.yaml file.
  3. Enter the following code snippet:

    "vars": {
      "YOUR_VARIABLE":"VALUE"
    }
    

    Replace the following:

    • YOUR_VARIABLE with a name for the variable.
    • VALUE with the default value of the compilation variable.

The following code sample shows the myVariableName compilation variable set to myVariableValue in the workflow_settings.yaml file:

...
vars:
  myVariableName: myVariableValue
...

The following code sample shows the workflow_settings.yaml file with the executionSetting compilation variable set to dev:

defaultProject: default_bigquery_database
defaultLocation: us-west1
defaultDataset: dataform_data,
vars:
executionSetting: dev

dataform.json

  1. Go to your Dataform development workspace.
  2. In the Files pane, select the dataform.json file.
  3. Enter the following code snippet:

    "vars": {
      "YOUR_VARIABLE":"VALUE"
    }
    

    Replace the following:

    • YOUR_VARIABLE with a name for the variable.
    • VALUE with the default value of the compilation variable.

The following code sample shows the myVariableName compilation variable set to myVariableValue in the dataform.json file:

{
  ...
  "vars": {
    "myVariableName": "myVariableValue"
  },
  ...
}

The following code sample shows the dataform.json file with the executionSetting compilation variable set to dev:

{
"warehouse": "bigquery",
"defaultSchema": "dataform_data",
"defaultDatabase": "default_bigquery_database".
"defaultLocation":"us-west-1",
"vars": {
"executionSetting":"dev"
}
}

Add a compilation variable to a table

To add a compilation variable to a SQLX table definition file, follow these steps:

  1. Go to your Dataform development workspace.
  2. In the Files pane, select a SQLX table definition file.
  3. In the file, enter a when clause in the following format:

    ${when(dataform.projectConfig.vars.YOUR_VARIABLE === "SET_VALUE", "CONDITION")}
    

    Replace the following:

    • YOUR_VARIABLE with the name of your variable, for example executionSetting.
    • SET_VALUE with a value for the variable, for example, staging.
    • CONDITION with a condition for execution of the table.

The following code sample shows a table definition SQLX file with a when clause and the executionSetting variable that executes 10% of data in the staging execution setting:

  select
    *
  from ${ref("data")}
  ${when(
    dataform.projectConfig.vars.executionSetting === "staging",
    "where mod(farm_fingerprint(id) / 10) = 0",
  )}

The following code sample shows a view definition SQLX file with a when clause and the myVariableName variable:

  config { type: "view" }
  SELECT ${when(
    dataform.projectConfig.vars.myVariableName === "myVariableValue",
    "myVariableName is set to myVariableValue!",
    "myVariableName is not set to myVariableValue!"
  )}

What's next