Use the open-source Dataform CLI

This document shows you how to use the open-source Dataform command line interface (CLI) to locally develop SQL workflows by using the terminal.

With the open-source Dataform CLI, you can initialize, compile, test, and run Dataform core locally, outside of Google Cloud to deploy assets to the following data warehouses:

  • BigQuery
  • Snowflake
  • Redshift
  • Azure SQL Data Warehouse
  • Postgres

Dataform distributes a Docker image which you can use to run the equivalent of Dataform CLI commands.

Before you begin

Before installing the Dataform CLI, install NPM.

Install Dataform CLI

  • To install Dataform CLI, run the following command:

    npm i -g @dataform/cli@^2.3.2
    

Initialize a Dataform project

  • To initialize a new Dataform project, run the following command inside your project directory:

    dataform init
    

Update Dataform

  • To update the Dataform framework, run the following command from inside the project directory that contains your dataform.json file:

    npm i @dataform/core@^2.3.2
    

Update Dataform CLI

  • To update the Dataform CLI tool, run the following command:

    npm i -g @dataform/cli@^2.3.2
    

Create a credentials file

Dataform requires a credentials file to connect to your warehouse and create the .df-credentials.json file on your disk.

To create the credentials file, follow these steps:

  1. Run the following command:

    dataform init-creds WAREHOUSE
    

    Replace WAREHOUSE with your warehouse: bigquery, postgres, redshift, snowflake, or sqldatawarehouse.

  2. Follow the init-creds wizard that walks you through credentials file creation.

Create a project

An empty Dataform project has the following structure:

   project-dir
   ├── definitions
   ├── includes
   ├── package.json
   └── dataform.json
  • To create a Dataform project to deploy assets to BigQuery, run the following command:

    dataform init bigquery PROJECT_NAME --default-database YOUR_GOOGLE_CLOUD_PROJECT_ID
    

    Replace PROJECT_NAME with the name of your project and YOUR_GOOGLE_CLOUD_PROJECT_ID with your Google Cloud project ID.

  • To create a Dataform project to deploy assets to Postgres, Redshift, Snowflake, or Azure SQL Data Warehouse, run the following command:

    dataform init WAREHOUSE PROJECT_NAME
    

    Replace PROJECT_NAME with the name of your project and WAREHOUSE with your warehouse: postgres, redshift, snowflake, or sqldatawarehouse.

Clone a project

To clone an existing Dataform project from a third-party Git repository, follow the instructions from your Git provider.

  • Once the repository is cloned, run the following command inside the cloned repository directory:

    dataform install
    

Define a table

Store definitions in the definitions/ folder.

  • To define a table, run the following command:

    echo "config { type: 'TABLE_TYPE' } SELECT_STATEMENT" > definitions/FILE.sqlx
    

    Replace the following:

    • TABLE_TYPE with the type of the table: table, incremental, or view.
    • SELECT_STATEMENT with a SELECT statement that defines the table.
    • FILE with the name for the table definition file.

The following code sample defines a view in the example SQLX file.

echo "config { type: 'view' } SELECT 1 AS test" > definitions/example.sqlx

Define a manual assertion

Store definitions in the definitions/ folder.

  • To define a manual assertion, run the following command:

    echo "config { type: 'assertion' } SELECT_STATEMENT" > definitions/FILE.sqlx
    

    Replace the following:

    • SELECT_STATEMENT with a SELECT statement that defines the assertion.
    • FILE with the name for the custom SQL operation definition file.

Define a custom SQL operation

Store definitions in the definitions/ folder.

  • To define a custom SQL operation, run the following command:

    echo "config { type: 'operations' } SQL_QUERY" > definitions/FILE.sqlx
    

    Replace the following:

    • SQL_QUERY with your custom SQL operation.
    • FILE with the name for the custom SQL operation definition file.

View compilation output

Dataform compiles your code in real time.

  • To view the output of the compilation process in the terminal, run the following command:

    dataform compile
    
  • To view the output of the compilation process as a JSON object, run the following command:

    dataform compile --json
    
  • To view the output of the compilation with custom compilation variables, run the following command:

    dataform compile --vars=SAMPLE_VAR=SAMPLE_VALUE,foo=bar
    

    Replace the following:

    • SAMPLE_VAR with your custom compilation variable.
    • SAMPLE_VALUE with the value of your custom compilation variable.

Execute code

To execute your code, Dataform accesses your data warehouse to determine its current state and tailor the resulting SQL accordingly.

  • To execute the code of your Dataform project in your data warehouse, run the following command:

    dataform run
    
  • To execute the code of your Dataform project in your data warehouse with custom compilation variables, run the following command:

    dataform run --vars=SAMPLE_VAR=SAMPLE_VALUE,sampleVar2=sampleValue2
    

    Replace the following:

    • SAMPLE_VAR with your custom compilation variable.
    • SAMPLE_VALUE with the value of your custom compilation variable.
  • To execute the code of your Dataform project in your data warehouse and rebuild all tables from scratch, run the following command:

    dataform run --full-refresh
    

Without --full-refresh, Dataform updates incremental tables without rebuilding them from scratch.

  • To see the final compiled SQL code tailored to the current state of your warehouse, without executing it inside your warehouse, run the following command:

    dataform run --dry-run
    

Get help

  • To list all of the available commands and options, run the following command:

    dataform help
    
  • To view a description of a specific command, run the following command:

    dataform help COMMAND
    

    Replace COMMAND with the command you want to learn about.

What's next