This document helps you understand the concept of repositories in Dataform.
Each Dataform repository houses a collection of SQLX and JavaScript files that make up your SQL workflow, as well as Dataform configuration files and packages. You interact with the contents of your repository in a development workspace.
Dataform displays your repositories on the Dataform page in the alphabetical order of repository IDs. You can sort and filter them.
Each Dataform repository is connected to a service account. You can select a service account when you create a repository, or edit the service account later.
By default, Dataform uses a service account derived from your project number in the following format:
service-YOUR_PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com
Dataform uses Git to record changes and manage file versions. Each Dataform repository corresponds with a Git repository. After you create a Dataform repository, you can connect it to a remote GitHub, GitLab, or Bitbucket repository.
In a Dataform repository, Dataform stores the repository code. In a connected repository, the third-party repository stores the repository code. Dataform interacts with the third-party repository to allow you to edit and execute its contents in a Dataform development workspace.
A Dataform repository page consists of the following components:
- Development workspaces tab
- Displays development workspaces created in the repository.
- Release configurations tab
- Lets you inspect, create, edit, and delete releases.
- Workflow execution logs tab
- Displays Dataform workflow execution logs.
- Workflow configurations tab
- Lets you inspects, create, edit, and delete workflow configurations.
- Settings tab
- Displays the name and location of the repository. For a repository connected to a third-party Git repository, displays the third party repository source, default branch name, and secret token. Displays the buttons to connect the repository to a third-party Git repository and to edit the Git connection.
- Create development workspace button
- Lets you create a development workspace.
After you create and initialize a development workspace, you can edit your workflow settings file to configure the following Dataform settings of your repository:
- The default database (Google Cloud project ID)
- The default schema (BigQuery dataset ID)
- The default BigQuery location
- The default schema (BigQuery dataset ID) for assertions
- The warehouse, which must be set to
bigquery
- User-defined variables that are made available to project code during compilation
For more information about Dataform repository settings, see IProjectConfig in the Dataform core reference.
What's next
- To learn how to create and initialize a workspace, see Create a workspace.
- To learn how to configure Dataform repository settings, see Configure Dataform settings.
- To learn how to connect a Dataform repository to a third party Git repository, see Connect to a third-party Git repository.
- To learn how to view workflow execution logs, see Monitor execution logs.
- To learn how to create Dataform compilation releases, see Create a compilation release.
- To learn more about how repository size impacts development in Dataform, see Overview of repository size.
- To learn how to schedule Dataform executions with workflow configurations, see Schedule executions with workflow configurations.
- To learn more about splitting a repository in Dataform, see
Introduction to splitting repositories.