This document shows you how to link a Dataform repository to a third-party remote Git repository.
After you link the repositories, the changes you make in a Dataform development workspace can be pushed to and pulled from the remote Git repository.
You can link a Dataform repository to a remote Git repository hosted by the following Git providers:
Azure DevOps Services
Bitbucket Cloud
GitHub
GitLab
To link a third-party remote repository to a Dataform repository, you need to first authenticate it. You can authenticate a remote repository in Dataform through HTTPS or SSH.
For GitHub and GitLab remote repositories, you can use either HTTPS or SSH for authentication. For Azure DevOps Services and Bitbucket Cloud remote repositories, you must use SSH.
Before you begin
If you haven't done so already, create a Dataform repository. You need it later to share a secret with your Dataform service account.
Required roles
To get the permissions that you need to link a Dataform repository to a remote Git repository,
ask your administrator to grant you the
Dataform Admin (roles/dataform.admin
) IAM role on repositories.
For more information about granting roles, see Manage access.
You might also be able to get the required permissions through custom roles or other predefined roles.
Authenticate a remote repository through HTTPS
You can authenticate GitHub and GitLab repositories through HTTPS by creating a Secret Manager secret with a personal access token, and sharing the secret with your Dataform service account.
Dataform then uses the access token to sign in to your Git provider to commit changes on behalf of the developers. Dataform makes these commits using the developer's Google Cloud email address so you can tell who made each commit.
To authenticate a GitHub repository, create a classic personal access token or a fine-grained personal access token that lets you customize token permissions.
To authenticate a GitLab repository, create a classic personal access token.
To authenticate a GitHub or a GitLab repository in Dataform through HTTPS, follow these steps:
In GitHub or GitLab, create a personal access token.
When you create a GitHub personal access token, do the following:
Grant Dataform the
repo
permission.Make sure to set a token expiration time appropriate to your needs.
If your organization uses SAML single sign-on (SSO), authorize the token.
Optional: When you create a GitHub fine-grained personal access token, do the following:
Select repository access to only selected repositories, then select the repository that you want to connect to.
Grant read and write access on contents of the repository.
Make sure to set a token expiration time appropriate to your needs.
If your organization uses SAML single sign-on (SSO), authorize the token.
When you create a GitLab personal access token, do the following:
Name the token
dataform
.The GitLab personal access token must be named
dataform
.Grant Dataform the
api
,read_repository
, andwrite_repository
permissions.Make sure to set a token expiration time appropriate to your needs.
In Secret Manager, create a secret containing a personal access token for connecting to your Git provider.
Grant access to the secret to your Dataform service account.
Your Dataform service account is in the following format:
service-PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com
- When granting access, make sure to grant the
roles/secretmanager.secretAccessor
role to your Dataform service account.
- When granting access, make sure to grant the
Dataform uses the access token to sign in to your Git provider to commit changes on behalf of the developers. Dataform makes these commits using the developer's Google Cloud email address so you can tell who made each commit.
Authenticate a remote repository through SSH
You can authenticate Azure DevOps Services, Bitbucket Cloud, GitHub, and GitLab repositories through SSH by generating an SSH key and a Secret Manager secret.
The SSH key consists of a public SSH key and a private SSH key. You need to share the public SSH key with your Git provider, and create a Secret Manager secret with the private SSH key. Then, share the secret with your Dataform service account.
Dataform uses the secret with the private SSH key to sign in to your Git provider to commit changes on behalf of the developers. Dataform makes these commits using the developer's Google Cloud email address so you can tell who made each commit.
To authenticate an Azure DevOps Services, Bitbucket Cloud, GitHub, or GitLab repository in Dataform through SSH, follow these steps:
In Azure DevOps Services, Bitbucket Cloud, GitHub, or GitLab, create an SSH key.
For instructions to create an Azure DevOps Services SSH key, see Use SSH key authentication.
For instructions to create a Bitbucket Cloud SSH key, see Configure SSH.
For instructions to create a GitHub SSH key, see Generating a new SSH key.
For instructions to create a GitLab SSH key, see Generate an SSH key pair.
Upload the public SSH key to your third-party Git account.
For instructions to upload a public SSH key to Azure DevOps Services, see Use SSH key authentication.
For instructions to upload a public SSH key to Bitbucket Cloud, see Configure SSH.
For instructions to upload a public SSH key to GitHub, see Adding a new SSH key to your GitHub account.
For instructions to upload a public SSH key to GitLab, see Add an SSH key to your GitLab account.
In Secret Manager, create a secret with the private SSH key as the secret value.
Grant access to the secret to your Dataform service account.
Your Dataform service account is in the following format:
service-PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com
- When granting access, make sure to grant the
roles/secretmanager.secretAccessor
role to your Dataform service account.
- When granting access, make sure to grant the
Connect a Dataform repository
To link a Dataform repository to a remote Git repository, follow these steps:
In the Google Cloud console, go to the Dataform page.
Select the repository you want to connect.
On the repository page, click Settings > Connect with Git.
In the Link to remote repository pane, in the Remote Git repository URL field, enter the URL of the remote Git repository, ending with
.git
.For HTTPS authentication, the URL of the remote Git repository cannot contain usernames or passwords.
For SSH authentication, the URL of the remote Git repository be in one of the following formats:
- Absolute URL:
ssh://git@{host_name}[:{port}]/{repository_path}
,port
is optional. - SCP-like URL:
git@{host_name}:{repository_path}
.
- Absolute URL:
In the Default remote branch name field, enter the name of the main development branch of the remote Git repository.
In the Secret drop-down, select your secret for the remote Git repository.
If you used SSH authentication for the remote repository, in the SSH public host key value field, enter a single public host key of your Git provider.
The SSH public host key value must be in the format of a
known_hosts
file. The value must contain an algorithm and a public key encoded in thebase64
format, but without the hostname or IP, in the following format:ALGORITHM BASE64_KEY_VALUE
For the Azure DevOps Services public host key, see Use SSH key authentication.
For the Bitbucket Cloud public host key, see Configure SSH.
For the GitHub public host key, see GitHub's SSH key fingerprints.
For the GitLab public host key, see SSH
known_hosts
entries.
Click Link.
Edit the remote repository connection
To edit a connection between a Dataform repository and a remote Git repository, follow these steps:
In the Google Cloud console, go to the Dataform page.
Click the repository that you want to edit.
On the repository page, click Settings > Edit Git connection.
On the Link to remote repository pane, edit any of the following options:
In the Remote Git repository URL field, edit the URL of the linked remote Git repository.
The URL of the remote Git repository cannot contain usernames or passwords.
In the Default remote branch name field, edit the name of the main development branch of the remote Git repository.
In the Secret drop-down, select your secret for the remote Git repository.
If you used SSH authentication for the remote repository, in the SSH public host key value field, enter the public host key of your Git provider.
The SSH public host key value must be in the format of a known host file. The value must contain an algorithm and a public key encoded in the
base64
format, but without the hostname or IP, in the following format:ALGORITHM BASE64_KEY_VALUE
For the Bitbucket Cloud public host key, see Configure SSH.
For the GitHub public host key, see GitHub's SSH key fingerprints.
For the GitLab public host key, see SSH
known_hosts
entries.
Click Update.
What's next
To learn more about Dataform repositories, see Introduction to repositories.
To create a development workspace, see Create a workspace.
To learn more about Dataform, see Dataform overview.