Add custom SQL operations

This document shows you how to define custom SQL operations in a dedicated SQLX file.

Dataform can execute custom SQL operations that don't fit into the Dataform model of publishing a table or writing an assertion. You can define custom SQL commands for Dataform to execute in BigQuery.

Before you begin

  1. In the Google Cloud console, go to the Dataform page.

    Go to the Dataform page

  2. Select or create a repository.

  3. Select or create a development workspace.

Required roles

To get the permissions that you need to define a custom SQL operation, ask your administrator to grant you the Dataform Editor (roles/dataform.editor) IAM role on workspaces. For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Create a file for custom operation definition

Store custom operation definition SQLX files in the definitions/ directory. To create a new SQLX file in the definitions/ directory, follow these steps:

  1. Go to your development workspace.
  2. In the Files pane, next to definitions/, click the More menu.
  3. Click Create file.
  4. In the Add a file path field, enter the name of the file followed by .sqlx after definitions/. For example, definitions/sample-operation.sqlx.

    Filenames can only include numbers, letters, hyphens, and underscores.

  5. Click Create file.

Define a custom SQL operation

You can define a custom SQL operation in a SQLX file of type: operations. You can write any BigQuery SQL statement in an operations file. Dataform runs your custom SQL operations in BigQuery without modification.

You can define multiple custom SQL operations in one SQLX file. BigQuery runs all operations in a file in the same context, and creates the executed SQL by joining all operations with a semicolon ;.

To define multiple custom SQL operations when using the open source Dataform framework with a data warehouse other than BigQuery, separate operations with ---.

To define a custom SQL operation in a dedicated SQLX file, follow these steps:

  1. In your development workspace, select the SQLX file for custom operation definition.
  2. In the file, enter the following code snippet:

    config { type: "operations" }
    
  3. Outside the config block, write your SQL operation.

  4. Optional: Click Format.

The following code sample shows multiple custom SQL operations defined in an operations file:

config { type: "operations" }

DELETE FROM dataset.table WHERE country = 'GB';

DELETE FROM dataset.table WHERE country = 'FR';

The following code sample shows a custom SQL operation that manually creates a view:

config { type: "operations" }
CREATE OR REPLACE VIEW dataset.table AS (SELECT 1 AS TEST)

Create a referenceable output table

You can manually create a table in a custom SQL operation that you can reference in other scripts. To create a table available to other scripts, you need to declare that the operation has an output.

To make the name of the output table match the name of the operations file, you can use the self function in the CREATE operation.

To create a table in a custom operation and make it available to other scripts, follow these steps:

  1. In your development workspace, select the SQLX file for custom operation definition.
  2. In the SQLX file, enter the following code snippet:

    config {
     type: "operations",
     hasOutput: true
     }
    
  3. Optional: To match the name of the output table with the name of the file, write a SQL CREATE operation with the self function in the following format:

    CREATE OR REPLACE TABLE ${self()} AS (CUSTOM_SQL_QUERY)
    

    Replace CUSTOM_SQL_QUERY with your table definition SQL SELECTstatement.

  4. Optional: Click Format.

Reference a custom SQL operation output table

  • To reference a custom SQL operation output table in a SQLX definition of a different table, enter the output table filename in the ref function.

The following code sample shows a custom SQL operation in a custom_SQL_operation_table.sqlx file that creates a referenceable table called custom_SQL_operation_table:

// filename is custom_SQL_operation_table.sqlx
config {
type: "operations",
hasOutput: true
}
CREATE OR REPLACE VIEW ${self()} AS (SELECT 1 AS TEST)

The following code sample shows referencing the custom\_SQL\_operation\_table table in a table definition SQLX file:

config { type: "table" }
SELECT * FROM ${ref("custom_SQL_operation_table")}

Create an empty table

You might want to create an empty table so that a different service can populate it with data. You can create an empty table in a custom SQL operation with the CREATE TABLE function. To be able to reference the empty table in other SQL workflow object definitions, for example, tables and views, you can add the hasOutput:true property to the config block of the empty table operation.

  • To create an empty table, use the CREATE TABLE function in a type: "operations" file in the following format:
config {
  type: "operations",
  hasOutput: true  // optional, lets you reference the empty table
}

CREATE TABLE ${self()} (

)

The following code sample shows a custom SQL operation that creates an empty table with an integer and a string column. The created empty table cannot be referenced by other SQL workflow objects:

config {
  type: "operations"
}

CREATE TABLE ${self()} (
  x INT64,
  y STRING
)

What's next