With the open-source Dataform CLI, you can initialize, compile, test, and run Dataform core locally, outside of Google Cloud.
The Dataform CLI supports Application Default Credentials (ADC). With ADC, you can make credentials available to your application in a variety of environments, such as local development or production, without needing to modify your application code. To use ADC, you must first provide your credentials to ADC.
Before you begin
Before installing the Dataform CLI, install NPM.
Install Dataform CLI
To install Dataform CLI, run the following command:
npm i -g @dataform/cli@^3.0.0-beta
Initialize a Dataform project
To initialize a new Dataform project, run the following command inside your project directory:
dataform init . PROJECT_NAME DEFAULT_LOCATION
Replace the following:
- PROJECT_NAME: the name of your project.
- DEFAULT_LOCATION: the region where you want Dataform to write BigQuery data. For more information about BigQuery regions, see BigQuery locations.
Update Dataform
To update the Dataform framework, update the
dataformCoreVersion
inworkflow_settings.yaml
file, then re-run NPM install:npm i
Update Dataform CLI
To update the Dataform CLI tool, run the following command:
npm i -g @dataform/cli@^3.0.0-beta.2
Create a credentials file
Dataform requires a credentials file to connect to remote services
and create the .df-credentials.json
file on your disk.
To create the credentials file, follow these steps:
Run the following command:
dataform init-creds
Follow the
init-creds
wizard that walks you through credentials file creation.
Create a project
An empty Dataform project in Dataform core 3.0.0-beta.0
or later has the following structure:
project-dir
├── definitions
├── includes
└── workflow_settings.yaml
To create a Dataform project to deploy assets to BigQuery, run the following command:
dataform init PROJECT_NAME --default-project YOUR_GOOGLE_CLOUD_PROJECT_ID
Replace the following:
- PROJECT_NAME: the name of your project.
- YOUR_GOOGLE_CLOUD_PROJECT_ID: your Google Cloud project ID.
Clone a project
To clone an existing Dataform project from a third-party Git repository, follow the instructions from your Git provider.
Once the repository is cloned, run the following command inside the cloned repository directory:
dataform install
Define a table
Store definitions in the definitions/
folder.
To define a table, run the following command:
echo "config { type: 'TABLE_TYPE' } SELECT_STATEMENT" > definitions/FILE.sqlx
Replace the following:
- TABLE_TYPE: the type of the table:
table
,incremental
, orview
. - SELECT_STATEMENT: a
SELECT
statement that defines the table. - FILE: the name for the table definition file.
- TABLE_TYPE: the type of the table:
The following code sample defines a view in the example
SQLX file.
echo "config { type: 'view' } SELECT 1 AS test" > definitions/example.sqlx
Define a manual assertion
Store definitions in the definitions/
folder.
To define a manual assertion, run the following command:
echo "config { type: 'assertion' } SELECT_STATEMENT" > definitions/FILE.sqlx
Replace the following:
- SELECT_STATEMENT: a
SELECT
statement that defines the assertion. - FILE: the name for the custom SQL operation definition file.
- SELECT_STATEMENT: a
Define a custom SQL operation
Store definitions in the definitions/
folder.
To define a custom SQL operation, run the following command:
echo "config { type: 'operations' } SQL_QUERY" > definitions/FILE.sqlx
Replace the following:
- SQL_QUERY: your custom SQL operation.
- FILE: the name for the custom SQL operation definition file.
View compilation output
Dataform compiles your code in real time.
To view the output of the compilation process in the terminal, run the following command:
dataform compile
To view the output of the compilation process as a JSON object, run the following command:
dataform compile --json
To view the output of the compilation with custom compilation variables, run the following command:
dataform compile --vars=SAMPLE_VAR=SAMPLE_VALUE,foo=bar
Replace the following:
- SAMPLE_VAR: your custom compilation variable.
- SAMPLE_VALUE: the value of your custom compilation variable.
Execute code
To execute your code, Dataform accesses BigQuery to determine its current state and tailor the resulting SQL accordingly.
To execute the code of your Dataform project, run the following command:
dataform run
To execute the code of your Dataform project in BigQuery with custom compilation variables, run the following command:
dataform run --vars=SAMPLE_VAR=SAMPLE_VALUE,sampleVar2=sampleValue2
Replace the following:
- SAMPLE_VAR: your custom compilation variable.
- SAMPLE_VALUE: the value of your custom compilation variable.
To execute the code of your Dataform project in BigQuery and rebuild all tables from scratch, run the following command:
dataform run --full-refresh
Without --full-refresh
, Dataform updates incremental tables
without rebuilding them from scratch.
To see the final compiled SQL code tailored to the current state of BigQuery, without executing it inside BigQuery, run the following command:
dataform run --dry-run
Get help
To list all of the available commands and options, run the following command:
dataform help
To view a description of a specific command, run the following command:
dataform help COMMAND
Replace COMMAND with the command you want to learn about.
What's next
- To learn more about Dataform CLI, see Dataform CLI reference
- To learn more about Dataform, see
Dataform overview.