Source Control Management in Cloud Data Fusion lets you manage pipeline versions through GitHub repositories.
By integrating Cloud Data Fusion with GitHub, you can do the following:
- Manage your pipelines in a central Git repository.
- Review and audit pipeline changes.
- Revert pipeline changes.
- Effectively collaborate with the team while ensuring central control.
Before you begin
Limitations
- Cloud Data Fusion only supports GitHub, and not other Git providers.
- OAuth is not supported.
- Source Control Management has optimal scalability and performance support in RBAC enabled instances.
- Source Control Management only supports batch pipelines.
- Source control management doesn't support pipeline configurations.
- The default size limit of the linked repository is 5 GB.
Required roles and permissions
Operation | datafusion.accessor | datafusion.viewer | datafusion.operator | datafusion.developer | datafusion.editor | datafusion.admin |
---|---|---|---|---|---|---|
Configure source control repository | No | No | Yes | No | Yes | Yes |
Push or pull pipeline from namespace | No | No | Yes | Yes | Yes | Yes |
Configure a Git repository
Cloud Data Fusion lets you configure a Git repository for each namespace. Once the Git repository is configured for a namespace, you can push deployed pipelines from Cloud Data Fusion namespace to Git repository, or pull and deploy pipelines from Git repository to Cloud Data Fusion namespace.
You can link a Git repository to multiple namespaces, but a namespace can be associated with only one Git repository.
Link a Git repository with a namespace
To link a Git repository with a namespace, follow these steps:
- In the Cloud Data Fusion web interface, click Menu.
- Click Namespace Admin.
- On the Namespace Admin page, click the Source Control Management tab.
- Click Link Repository.
Enter the following details:
- Provider (required)
- Repository URL (required)
- Default branch (optional)
- Path prefix (optional)
- Authentication type (required)
- Token name (required)
- Token (required)
- User name (optional)
For more information about creating a Git repository, see Create a repo.
For more information about personal access tokens, see Creating a personal access token and Creating a fine-grained personal access token.
Click Validate. Wait for the connection to be verified.
When the configuration is complete, click Save and Close to confirm the configuration changes.
Update the Git configuration
To update an existing Git configuration, follow these steps:
- In the Cloud Data Fusion web interface, click Menu.
- Click Namespace Admin.
- On the Namespace Admin page, click the Source Control Management tab.
- For the Git configuration you want to update, click > Edit.
- Update the Git repository details as required, and click Validate.
- Click Save and Close to save the new configuration.
Delete the Git configuration
To delete the Git configuration from a namespace, follow these steps:
- In the Cloud Data Fusion web interface, click Menu.
- Click Namespace Admin.
- On the Namespace Admin page, click the Source Control Management tab.
- For the Git configuration you want to delete, click > Delete.
Sync pipelines
After you configure a Git repository with a namespace, you can use the Sync Pipelines option to push pipelines from Cloud Data Fusion to GitHub, or pull and deploy pipelines from GitHub to Cloud Data Fusion.
Push pipelines from Cloud Data Fusion to GitHub
To sync a deployed pipeline from a namespace to GitHub, follow these steps:
- In the Cloud Data Fusion web interface, click Menu.
- Click Namespace Admin.
- On the Namespace Admin page, click the Source Control Management tab.
- Find the Git repository that you want to sync with, and click Sync Pipelines.
- Click the Local Pipelines tab.
Search for or select the pipeline that you want to push to GitHub. You can push only one pipeline at a time.
If the latest version of the pipeline is pushed to or pulled from GitHub, the Connected to Git status shows
Connected
. If the pipeline has never been pushed to GitHub, the Connected to Git status shows blank (-
).If you deploy a newer version of a pipeline that is already synced with GitHub, the Connected to Git status changes from
Connected
to blank (-
).Click Push to remote.
Enter a Commit Message, and click OK.
When the sync is complete, you see a green checkmark on the Local Pipelines page, and the Connected to Git status for the pushed pipeline shows
Connected
. The Git repository path is attached to the pipeline.If push fails, check the pipeline in GitHub to see if it's the latest version.
You can also push the deployed pipelines from a namespace to GitHub in the following ways:
From the pipeline details page
- In the Cloud Data Fusion web interface, click Menu.
- Click List.
- Click the pipeline you want to push to GitHub.
- On the pipeline details page, click Actions > Push to remote.
- Enter a Commit Message, and click OK.
From the List page
- In the Cloud Data Fusion web interface, click Menu.
- Click List.
- For the pipeline that you want to push to GitHub, click > Push to remote.
- Enter a Commit Message, and click OK.
Pull GitHub pipelines into Cloud Data Fusion
If you're managing pipeline versions in GitHub manually, you can pull and deploy GitHub pipelines into Cloud Data Fusion.
- In the Cloud Data Fusion web interface, click Menu.
- Click Namespace Admin.
- On the Namespace Admin page, click the Source Control Management tab.
- Find the Git repository that you want to sync with, and click Sync Pipelines.
- Click the Remote Pipelines tab. All of the pipelines stored in GitHub are displayed.
- Search for or select the pipeline that you want to pull from GitHub into Cloud Data Fusion. You can pull only one pipeline at a time.
Click Pull to namespace.
Cloud Data Fusion looks for JSON files under the configured path, and pulls and deploys them as pipelines to Cloud Data Fusion.
When the sync is complete, you see a green checkmark on the Remote Pipelines page. Cloud Data Fusion automatically deploys the pipeline.
To run a pipeline, go to the List page, click Deployed, and then run it.
You can also pull the remote pipelines from GitHub to a namespace using the following ways:
From the pipeline details page
- In the Cloud Data Fusion web interface, click Menu.
- Click List.
- Click the pipeline that you want to pull from GitHub.
- On the pipeline details page, click Actions > Pull to namespace.
From the List view
- In the Cloud Data Fusion web interface, click Menu.
- Click List.
- For the pipeline that you want to pull from GitHub, click > Pull to namespace.
Using the
button- In the Cloud Data Fusion web interface, click Menu.
- Click List.
- Click .
- Click Pull from remote.
- Search for or select the pipeline that you want to pull from GitHub into Cloud Data Fusion. You can pull only one pipeline at a time.
- Click Pull to namespace.