Create a Dataform repository

This document shows you how to create a repository, set and edit the repository service account, and delete a repository in Dataform.

When you create a Dataform repository, you need to set the following repository settings:

Repository ID
A unique ID of the repository. IDs can only include numbers, letters, hyphens, and underscores.
Region

Dataform region for storing the repository and its contents.

This storage region can be different than the processing region where Dataform processes your code and stores the output of executions. By default, the processing region is set to your default BigQuery dataset region. You can edit the processing region in the workflow settings file after creating the repository. For more information, see Configure Dataform settings.

Service account

Service account associated with the repository. You can select the default Dataform service account, a service account associated with your Google Cloud project, or manually enter a different service account. By default, Dataform uses a service account derived from your project number in the following format:

service-YOUR_PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com

Dataform uses the default service account for all repository operations. You can use a different service account to execute workflows in your repository, but the default service account is still used for all other repository operations.

Encryption

Encryption method for the repository. You can use the default encryption, a unique customer-managed Cloud KMS encryption key, or a default Dataform CMEK key. For more information about using customer-managed encryption keys (CMEK) in Dataform, see Use customer-managed encryption keys.

After you create a repository, you can connect it to GitHub or GitLab.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the BigQuery and Dataform APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the BigQuery and Dataform APIs.

    Enable the APIs

  8. To use CMEK encryption for the repository, enable CMEK encryption of Dataform repositories.

Required roles

To get the permissions that you need to create and delete a repository, ask your administrator to grant you the Dataform Admin (roles/dataform.admin) IAM role on repositories. For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

To use a service account other than the default Dataform service account, grant access to the custom service account.

After you create a Dataform repository, Dataform automatically grants you the Dataform Admin role on that repository.

Create a repository

To create a Dataform repository, follow these steps:

  1. In the Google Cloud console, go to the Dataform page.

    Go to Dataform

  2. Click Create repository.

  3. On the Create repository page, in the Repository ID field, enter a unique ID.

    IDs can only include numbers, letters, hyphens, and underscores.

  4. In the Region drop-down list, select a Dataform region for storing the repository and its contents. Select the Dataform region nearest to your location.

    For a list of available Dataform regions, see Locations. The repository region does not have to match the location of your BigQuery datasets.

    In the workflow_settings.yaml file, you can set the processing region where Dataform processes your code and stores the output of executions. The processing region has to match the location of your BigQuery datasets, but does not need to match the repository region. For more information, see Configure Dataform settings.

  5. In the Service account drop-down, select a service account for the repository.

    In the drop-down, you can select the default Dataform service account or any service account associated with your Google Cloud project that you have access to. Keep in mind that custom service accounts are used only for workflow execution. All other repository operations are still performed by the default Dataform service account.

    1. Optional: To select a service account that is not displayed in the drop-down, click Enter manually and enter a service account ID.
  6. Configure your selected encryption mechanism for the repository:

    Default CMEK key

    Dataform displays the Use the default KMS key checkbox and selects it by default.

    • To encrypt the repository with the default Dataform CMEK key, leave the Use the default KMS key checkbox selected.

    Unique CMEK key

    To encrypt the repository with a unique CMEK key, do the following:

    1. If the Use the default KMS key checkbox is selected by default, deselect the checkbox.
    2. In the Encryption section, select the Customer-managed encryption keys (CMEK) option.
    3. In the Select a customer-managed key drop-down, select a unique CMEK key.

    Encryption at rest

    • To use the default encryption, in the Encryption section, select the Google-managed encryption key option.
  7. Click Create, and then click Done.

Edit the service account

You can associate a custom service account with a Dataform repository for workflow execution. All other repository operations are still performed by the default Dataform service account.

To edit the service account for a Dataform repository, follow these steps:

  1. In the Google Cloud console, go to the Dataform page.

    Go to Dataform

  2. Select a repository, and then click Settings.

  3. By the Service account field, click Edit Service account.

  4. In the Service account drop-down, select a service account for the repository.

    In the drop-down, you can select the default Dataform service account or any service account associated with your Google Cloud project that you have access to.

    1. Optional: To select a service account that is not displayed in the drop-down, click Enter manually and enter a service account ID.
  5. Click Save.

Delete a repository

To delete a repository and all its contents, follow these steps:

  1. In the Google Cloud console, go to the Dataform page.

    Go to Dataform

  2. By the repository that you want to delete, click the More menu, and then select Delete.

  3. In the Delete repository window, enter the name of the repository to confirm deletion.

  4. Click Delete.

What's next