Configure compilation overrides with the Dataform API

This document shows you how to create and execute a compilation result with compilation overrides by using the Dataform API.

About Dataform API compilation overrides

To execute your SQL workflow, Dataform compiles your code to SQL to create a compilation result. Then, during a workflow invocation, Dataform executes the compilation result in BigQuery.

By default, Dataform uses settings in the workflow settings file to create the compilation result. To isolate data executed at different stages of your development lifecycle, you can override the default settings with compilation overrides.

By passing Dataform API requests in the terminal, you can create and execute a single compilation result with compilation overrides. You can create a compilation result of a workspace or of a selected Git committish.

To create a compilation result with compilation overrides, you need to raise the Dataform API compilationResults.create request. In the request, you need to specify a source, a workspace or Git commitish, for Dataform to compile into the compilation result. In the CodeCompilationConfig object of the compilationResults.create request, you can configure compilation overrides.

You can then execute the created compilation result in a Dataform API workflowInvocations.create request.

You can configure the following compilation overrides by using the Dataform API:

Google Cloud project : Google Cloud project in which Dataform executes the compilation result, set in workflow_settings.yaml as defaultProject or in dataform.json as defaultDatabase.

Table prefix
Custom prefix added to all table names in the compilation result.
Schema suffix
Custom suffix appended to the schema of tables defined in defaultDataset in workflow_settings.yaml, defaultSchema in dataform.json, or in the schema parameter in the config block of a table.

Value of a compilation variable : Value of a compilation variable to be used in the compilation result. You can use compilation variables to execute tables conditionally.

As an alternative to Dataform API compilation overrides that you can only use for one compilation result, you can configure workspace compilation overrides in the Google Cloud console.

To learn about alternative ways to configure compilation overrides in Dataform, see Introduction to code lifecycle.

Before you begin

  1. In the Google Cloud console, go to the Dataform page.

    Go to Dataform

  2. Select or create a repository.

  3. Select or create a development workspace.

Set a compilation result source

To raise the Dataform API compilationResults.create request, you need to specify a source for the compilation result.

You can set a Dataform workspace or a Git branch, Git tag, or Git commit SHA as the source in the compilationResults.create request.

Set a workspace as a compilation result source

  • In the compilationResults.create request, populate the workspace property with the path of a selected Dataform workspace in the following format:
{
"workspace": "projects/PROJECT_NAME/locations/LOCATION/repositories/REPOSITORY_NAME/workspaces/WORKSPACE_NAME"
}

Replace the following:

  • PROJECT_NAME with the name of your Google Cloud project.
  • LOCATION with the location of your Dataform repository, set in workflow settings.
  • REPOSITORY_NAME with the name of your Dataform repository.
  • WORKSPACE_NAME with the name of your Dataform workspace.

The following code sample shows the workspace property in the compilationResults.create request set to a workspace called "sales-test":

{
"workspace": "projects/analytics/locations/europe-west4/repositories/sales/workspaces/sales-test"
}

Set a Git commitish as a compilation result source

  • In the compilationResults.create request, populate the gitCommitish property with the selected Git branch, tag, or commit SHA in the following format:

    {
      "gitCommitish": "GIT_COMMITISH"
    }
    

Replace GIT_COMMITISH with the selected Git branch, Git tag, or a Git commit SHA for the compilation result.

The following code sample shows the gitCommitish property in the compilationResults.create request set to "staging":

{
  "gitCommitish": "staging"
}

Override the default Google Cloud project

To create staging or production tables in a Google Cloud project separate from the project used for development, you can pass a different Google Cloud project ID in the CodeCompilationConfig object in the Dataform API compilationResults.create request.

Passing a separate default project ID in the compilationResults.create request overrides the defaultGoogle Cloud project ID configured in the workflow settings file, but does not override Google Cloud project IDs configured in individual tables.

  • To override the default Google Cloud project ID, set the defaultDatabase property to the selected Google Cloud project ID in the CodeCompilationConfig object in the following format:

    {
      "codeCompilationConfig": {
        "defaultDatabase": "PROJECT_NAME"
      }
    }
    

Replace PROJECT_NAME with the Google Cloud project ID that you want to set for the compilation result.

Add a table prefix

To quickly identify tables from the compilation result, you can add a prefix to all table names in the compilation result by passing the table prefix in the CodeCompilationConfig object in the Dataform API compilationResults.create request.

  • To add a table prefix, set the tablePrefix property in the CodeCompilationConfig object in the following format:
{
  "codeCompilationConfig": {
    "tablePrefix": "PREFIX",
  }
}

Replace PREFIX with the suffix you want to append, for example, _staging. For example, if your defaultDataset in workflow_settings.yaml is set to dataform, Dataform will create tables in the dataform_staging schema.

Append a schema suffix

To separate development, staging, and production data, you can append a suffix to schemas in a compilation result by passing the schema suffix in the CodeCompilationConfig object in the Dataform API compilationResults.create request.

  • To append a schema suffix, set the schemaSuffix property in the CodeCompilationConfig object in the following format:
{
  "codeCompilationConfig": {
    "schemaSuffix": "SUFFIX",
  }
}

Replace SUFFIX with the suffix you want to append, for example, _staging. For example, if your defaultDataset in workflow_settings.yaml is set to dataform, Dataform will create tables in the dataform_staging schema.

Note: The CodeCompilationConfig schemaSuffix parameter overrides schemas configured in the config block of individual files.

Execute selected files conditionally with compilation variables

To execute a selected table only in a specific execution setting, you can create a compilation variable for the execution setting and then pass its value in the CodeCompilationConfig object in the Dataform API compilationResults.create request.

To execute a table conditionally in a specific execution setting by using Dataform API, follow these steps:

  1. Create a compilation variable and add it to selected tables.
  2. Set the YOUR_VARIABLE and VALUE key-value pair in the codeCompilationConfig block of a Dataform API compilation request in the following format:

    {
     "codeCompilationConfig": {
       "vars": {
         "YOUR_VARIABLE": "VALUE"
       }
     }
    }
    
  3. Replace YOUR_VARIABLE with the name of your variable, for example executionSetting.

  4. Replace VALUE with the value of the variable for this compilation result that fulfills the when condition set in selected tables.

The following code sample shows the executionSetting variable passed to a Dataform API compilation request:

{
  "gitCommitish": "staging",
  "codeCompilationConfig": {
    "vars": {
      "executionSetting": "staging"
    }
  }
}

Execute a compilation result with compilation overrides

The following code sample shows a compilation result ID passed in a workflowInvocations.create request:

{
  "compilationResult": "projects/my-project-name/locations/europe-west4/repositories/my-repository-name/compilationResults/7646b4ed-ac8e-447f-93cf-63c43249ff11"
}

What's next