With the open-source Dataform CLI, you can initialize, compile, test, and run Dataform core locally, outside of Google Cloud to deploy assets to the following data warehouses:
- BigQuery
- Snowflake
- Redshift
- Azure SQL Data Warehouse
- Postgres
Dataform distributes a Docker image which you can use to run the equivalent of Dataform CLI commands.
Before you begin
Before installing the Dataform CLI, install NPM.
Install Dataform CLI
To install Dataform CLI, run the following command:
npm i -g @dataform/cli@^2.3.2
Initialize a Dataform project
To initialize a new Dataform project, run the following command inside your project directory:
dataform init
Update Dataform
To update the Dataform framework, run the following command from inside the project directory that contains your
dataform.json
file:npm i @dataform/core@^2.3.2
Update Dataform CLI
To update the Dataform CLI tool, run the following command:
npm i -g @dataform/cli@^2.3.2
Create a credentials file
Dataform requires a credentials file to connect to your warehouse
and create the .df-credentials.json
file on your disk.
To create the credentials file, follow these steps:
Run the following command:
dataform init-creds WAREHOUSE
Replace WAREHOUSE with your warehouse:
bigquery
,postgres
,redshift
,snowflake
, orsqldatawarehouse
.Follow the
init-creds
wizard that walks you through credentials file creation.
Create a project
An empty Dataform project has the following structure:
project-dir
├── definitions
├── includes
├── package.json
└── dataform.json
To create a Dataform project to deploy assets to BigQuery, run the following command:
dataform init bigquery PROJECT_NAME --default-database YOUR_GOOGLE_CLOUD_PROJECT_ID
Replace PROJECT_NAME with the name of your project and YOUR_GOOGLE_CLOUD_PROJECT_ID with your Google Cloud project ID.
To create a Dataform project to deploy assets to Postgres, Redshift, Snowflake, or Azure SQL Data Warehouse, run the following command:
dataform init WAREHOUSE PROJECT_NAME
Replace PROJECT_NAME with the name of your project and WAREHOUSE with your warehouse:
postgres
,redshift
,snowflake
, orsqldatawarehouse
.
Clone a project
To clone an existing Dataform project from a third-party Git repository, follow the instructions from your Git provider.
Once the repository is cloned, run the following command inside the cloned repository directory:
dataform install
Define a table
Store definitions in the definitions/
folder.
To define a table, run the following command:
echo "config { type: 'TABLE_TYPE' } SELECT_STATEMENT" > definitions/FILE.sqlx
Replace the following:
- TABLE_TYPE with the type of the table:
table
,incremental
, orview
. - SELECT_STATEMENT with a
SELECT
statement that defines the table. - FILE with the name for the table definition file.
- TABLE_TYPE with the type of the table:
The following code sample defines a view in the example
SQLX file.
echo "config { type: 'view' } SELECT 1 AS test" > definitions/example.sqlx
Define a manual assertion
Store definitions in the definitions/
folder.
To define a manual assertion, run the following command:
echo "config { type: 'assertion' } SELECT_STATEMENT" > definitions/FILE.sqlx
Replace the following:
- SELECT_STATEMENT with a
SELECT
statement that defines the assertion. - FILE with the name for the custom SQL operation definition file.
- SELECT_STATEMENT with a
Define a custom SQL operation
Store definitions in the definitions/
folder.
To define a custom SQL operation, run the following command:
echo "config { type: 'operations' } SQL_QUERY" > definitions/FILE.sqlx
Replace the following:
- SQL_QUERY with your custom SQL operation.
- FILE with the name for the custom SQL operation definition file.
View compilation output
Dataform compiles your code in real time.
To view the output of the compilation process in the terminal, run the following command:
dataform compile
To view the output of the compilation process as a JSON object, run the following command:
dataform compile --json
To view the output of the compilation with custom compilation variables, run the following command:
dataform compile --vars=SAMPLE_VAR=SAMPLE_VALUE,foo=bar
Replace the following:
- SAMPLE_VAR with your custom compilation variable.
- SAMPLE_VALUE with the value of your custom compilation variable.
Execute code
To execute your code, Dataform accesses your data warehouse to determine its current state and tailor the resulting SQL accordingly.
To execute the code of your Dataform project in your data warehouse, run the following command:
dataform run
To execute the code of your Dataform project in your data warehouse with custom compilation variables, run the following command:
dataform run --vars=SAMPLE_VAR=SAMPLE_VALUE,sampleVar2=sampleValue2
Replace the following:
- SAMPLE_VAR with your custom compilation variable.
- SAMPLE_VALUE with the value of your custom compilation variable.
To execute the code of your Dataform project in your data warehouse and rebuild all tables from scratch, run the following command:
dataform run --full-refresh
Without --full-refresh
, Dataform updates incremental tables
without rebuilding them from scratch.
To see the final compiled SQL code tailored to the current state of your warehouse, without executing it inside your warehouse, run the following command:
dataform run --dry-run
Get help
To list all of the available commands and options, run the following command:
dataform help
To view a description of a specific command, run the following command:
dataform help COMMAND
Replace COMMAND with the command you want to learn about.
What's next
- To learn more about Dataform CLI, see Dataform CLI reference
- To learn more about Dataform, see
Dataform overview.