This page describes upgrading the version of your instances or batch pipelines.
Upgrade your Cloud Data Fusion instances and batch pipelines to the latest platform and plugin versions for the latest features, bug fixes, and performance improvements.
Before you begin
- Plan a scheduled downtime for the upgrade. The process takes up to an hour.
-
In the Google Cloud console, activate Cloud Shell.
Limitations
After you create a Cloud Data Fusion instance, you cannot change its edition, even through an upgrade operation.
Don't trigger an upgrade with Terraform, as it deletes and recreates the instance, instead of performing an in-place upgrade. This issue results in the loss of any existing data within the instance.
Upgrading real-time pipelines isn't supported, except in pipelines created in version 6.8.0 with a Kafka real-time source. For a workaround, see Upgrade real-time pipelines.
Cloud Data Fusion doesn't restart pipelines that stop as a result of the upgrade operation.
Upgrade Cloud Data Fusion instances
To upgrade a Cloud Data Fusion instance to a new Cloud Data Fusion version, go to the Instance details page:
In the Google Cloud console, go to the Cloud Data Fusion page.
Click Instances, and then click the instance's name to go to the Instance details page.
Then perform the upgrade using either the Google Cloud console or gcloud CLI:
Console
Click Upgrade for a list of available versions.
Select a version.
Click Upgrade.
Verify that the upgrade was successful:
Refresh the Instance details page.
Click View instance to access the upgraded instance in the Cloud Data Fusion web interface.
Click System admin in the menu bar.
The new version number appears at the top of the page.
To prevent your pipelines from getting stuck when you run them in the new version, grant the required roles in your upgraded instance.
gcloud
To upgrade to a new Cloud Data Fusion version, run the following gcloud CLI command from a local terminal Cloud Shell session:
gcloud beta data-fusion instances update INSTANCE_ID \ --project=PROJECT_ID \ --location=LOCATION_NAME \ --version=AVAILABLE_INSTANCE_VERSION
Optional: If applicable for your instance, add the
--enable_stackdriver_logging
,--enable_stackdriver_monitoring
, and--labels
flags.Optional: You can pass the CDAP properties, such as
enable.unrecoverable.reset
, as--options
.
Verify that the upgrade was successful by following these steps:
In the Google Cloud console, go to the Cloud Data Fusion Instances page.
Click View instance to access the upgraded instance in the Cloud Data Fusion web interface.
Click System Admin in the menu bar.
The new version number appears at the top of the page.
To prevent your pipelines from getting stuck when you run them in the new version, grant the required roles in your upgraded instance.
Upgrade batch pipelines
To upgrade your Cloud Data Fusion batch pipelines to use the latest plugin versions:
Recommended: Back up all pipelines. You can back up pipelines in one of two ways:
Download the zip file by following these steps:
- To trigger a zip file download, back up all pipelines with the following command:
echo $CDAP_ENDPOINT/v3/export/apps
- Copy the URL output to your browser.
- Extract the downloaded file, then confirm that all pipelines were exported. The pipelines are organized by namespace.
Back up pipelines using Source Control Management (SCM), available in version 6.9 and later. SCM provides GitHub integration, which you can use to back up pipelines.
Upgrade pipelines by following these steps:
Create a variable that points to the
pipeline_upgrade.json
file that you will create in the next step to save a list of pipelines.export PIPELINE_LIST=PATH/pipeline_upgrade.json
Replace PATH with the path to the file.
Create a list of all pipelines for an instance and namespace using the following command. The result is stored in the
$PIPELINE_LIST
file inJSON
format. You can edit the list to remove pipelines that don't need upgrades.curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" ${CDAP_ENDPOINT}/v3/namespaces/NAMESPACE_ID/apps -o $PIPELINE_LIST
Replace NAMESPACE_ID with the namespace where you want the upgrade to happen.
Upgrade the pipelines listed in
pipeline_upgrade.json
. Insert the NAMESPACE_ID of pipelines to be upgraded. The command displays a list of upgraded pipelines with their upgrade status.curl -N -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" ${CDAP_ENDPOINT}/v3/namespaces/NAMESPACE_ID/upgrade --data @$PIPELINE_LIST
Replace NAMESPACE_ID with the namespace ID of the pipelines that are getting upgraded.
To prevent your pipelines from getting stuck when you run them in the new version, grant the required roles in your upgraded instance.
Upgrade real-time pipelines
Upgrading real-time pipelines is not supported, except in pipelines created in version 6.8.0 with a Kafka real-time source.
For everything else, you instead do the following:
- Stop and export the pipelines.
- Upgrade the instance.
- Import the real-time pipelines into your upgraded instance.
Upgrade to enable Replication
Replication can be enabled in Cloud Data Fusion environments in version 6.3.0 or later. If you have version 6.2.3, upgrade to 6.3.0, then upgrade to the latest version. You can then enable Replication.
Grant roles for upgraded instances
After the upgrade completes, grant the
Cloud Data Fusion Runner role
(roles/datafusion.runner
) and
Cloud Storage Admin role
(roles/storage.admin
) to the Dataproc service account in your
project.
What's next
- Manage patch revisions for Cloud Data Fusion instances.
- Learn about versioning in Cloud Data Fusion.
- Refer to the available version and patch revision upgrades.
- Troubleshoot upgrades.