This document helps you to do the following:
- Understand the concept of workspaces in Dataform and their capabilities.
- Create a Dataform development workspace.
- Initialize an empty Dataform development workspace.
- Delete a Dataform development workspace.
Overview of development workspaces
In Dataform, you interact with your workflow code in a development workspace, which is your own editable copy of a repository. In a workspace, you create, edit, or delete the contents of the repository without affecting others who work on the same repository, and then commit and push your changes to the repository.
You can develop your workflow with Dataform core or JavaScript. Dataform compiles your code into SQL queries in real time.
In a development workspace, you can perform the following tasks:
- Configure Dataform settings.
- Create tables of all supported types.
- Document tables.
- Partition tables.
- Add BigQuery labels and policy tags to tables.
- Override table settings.
- Configure additional actions.
- Declare data sources for your workflow.
- Add custom SQL operations and tags to your workflow.
- Trigger execution of your workflow or its selected actions.
- Monitor execution logs.
Components of a development workspace
A Dataform development workspace consists of the tabs described in the following sections.
Code tab
You can use the development environment in the Code tab to explore files, develop queries, version control your changes, and trigger execution to BigQuery. The tab contains the following components that let you develop workflows:
- Files pane: displays the file structure of your workspace and autosaves your edits to the file structure. Selected files open in the tabs of the development pane. In the Type to search field, you can search for files or directories in your workspace by their path.
- Version control pane: displays the version control status of your workspace and the version control options.
- Development pane: displays open files in tabs. Autosaves your changes to edited files.
- Run button: runs a preview of the compiled SQL query.
- Format button: formats your Dataform core or JavaScript code.
- Compilation status: displays the compilation status of the code in your workspace.
- Metadata pane: displays the metadata of the edited file.
- Compiled queries pane: compiles the queries in the edited file to SQL. If the query compilation fails, the pane displays the compilation errors. If the query compilation is successful, the pane displays the compiled queries and validates them through dry runs against BigQuery. If the query validation fails, the pane displays the validation errors. If the query validation is successful, the pane displays how many bytes the execution of the queries to BigQuery would use.
- Start execution menu: displays the options for triggering the execution of the entire workflow in your workspace or its selected elements.
Compiled graph tab
In the Compiled graph tab, you can see the compiled workflow defined in your workspace as an interactive directed acyclic graph (DAG). The graph shows all the actions in your workflow and the relationships between them. If there are compilation errors in your workflow, the tab displays an error message instead of the graph.
You can zoom in and out and use drag-and-drop navigation in the graph.
You can filter the graph by the following properties:
- Name
- Tag
- Type
- Assertion
- Declaration
- Incremental table
- Materialized view
- Operations
- Table
- Unknown
- View
You can select multiple filters at once. Dataform applies them
with the OR
condition.
You can click an action in the graph to view its details in the following panes:
- Metadata pane: displays the full name, type, dependencies, and dependents of the selected action.
- Query pane: displays the compiled SQL query of the selected action.
Executions tab
This tab displays the logs of all the executions triggered in the workspace. You can also view actions and their details for a selected execution. For more information, see Monitor execution logs.
Version control in a workspace
In a workspace, you interact with the repository through Git-based version control. Edits you make to files in your workspace start off as uncommitted local changes that you can commit and then push to your repository so that they become available to all repository users.
If your repository is connected to a remote Git provider, you push changes from your workspace to your main development branch. You select the main branch when you connect your repository to the remote Git provider. In a non-connected Dataform repository, you push changes to the default branch.
You can also pull any changes from the Dataform repository or the remote repository and merge them with local changes in your development workspace. For more information about version control in Dataform, see Version control your code.
Before you begin
If you haven't done so already, create a Dataform repository.
Required roles
To get the permissions that
you need to create, initialize, and delete a development workspace,
ask your administrator to grant you the
Dataform Editor (roles/dataform.editor
)
IAM role on workspaces.
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Create a Dataform development workspace
When you create a development workspace in a new repository for the first time, Dataform prompts you to initialize the development workspace with a set of configuration files that are required for Dataform to work.
An initialized development workspace contains the following directories and files:
definitions/
: a directory for asset definitions, in Dataform core or JavaScript.includes/
: an empty directory for scripts and variables that you can reuse across the repository.workflow_settings.yaml
: the default Dataform configuration file containing the Google Cloud project ID and BigQuery schema to publish assets in. You can override the default settings to customize them to your needs, but customization isn't a requirement to begin using Dataform.package.json
: the default Dataform dependencies configuration file with the latest version of@dataform/core
. You can use this file to import packages.definitions/sample.sqlx
: a sample SQLX file to help you get started.
To create a new Dataform development workspace, follow these steps:
In the Google Cloud console, go to the Dataform page.
Select the repository where you want to create the new development workspace.
On your repository page, click Create development workspace.
In the Create development workspace window, do the following:
In the Workspace ID field, enter a unique ID for the development workspace.
IDs can only include numbers, letters, hyphens, and underscores.
Click Create.
Initialize the workspace in a Dataform repository
To initialize a Dataform development workspace, follow these steps:
In the Google Cloud console, go to the Dataform page.
Select the repository where you want to initialize the empty workspace.
In the Development workspaces tab, click the name of the empty workspace you want to initialize.
On the development workspace page, click Initialize workspace.
Delete a Dataform workspace
There are no costs associated with creating Dataform development workspaces. However, if you want to delete a development workspace, follow these steps:
In the Google Cloud console, go to the Dataform page.
Select the repository that contains the workspace that you want to delete.
To avoid deleting uncommitted changes, commit and push changes to your Git repository.
In the Development workspaces tab, select the development workspace that you want to delete, and then click Delete selected workspaces.
You can delete multiple workspaces at once.
To confirm, click Delete.
What's next
- To learn how to version control your Dataform code, see Version control your code.
- To learn how to create a table, see Create a table.
- To learn how to run a preview to test a SQL query, see Preview SQL queries.