Upgrade your Cloud Data Fusion instances and batch pipelines to the latest platform and plugin versions for the latest features, bug fixes, and performance improvements. The upgrade process involves instance and pipeline downtime (see Before you start).
Before you start
Plan a scheduled downtime for the upgrade. The process takes up to an hour.
Recommended: Before you upgrade, stop any running pipelines and disable any upstream triggers, such as Cloud Composer triggers. When the upgrade begins, all running pipelines stop. If you upgrade to versions 6.3 and later, if any pipelines are running beforehand, Cloud Data Fusion doesn't restart them. In earlier versions, Cloud Data Fusion attempts to restart them.
Install
curl
.
Upgrade Cloud Data Fusion instances
To upgrade a Cloud Data Fusion instance to a new Cloud Data Fusion version, go to the Instance details page:
In the Google Cloud console, go to the Cloud Data Fusion page.
Click Instances, and then click the instance's name to go to the Instance details page.
Then perform the upgrade using either the Google Cloud console or Google Cloud CLI:
Console
Click Upgrade for a list of available versions.
Select a version.
Click Upgrade.
Click View instance to access the upgraded instance.
Verify that the upgrade was successful by reloading the Instance details page, and then clicking System admin in the menu bar. The new version number appears at the top of the page.
To prevent your pipelines from getting stuck when you run them in the new version:
Grant the required roles in your upgraded instance.
If you have upgraded to version 6.2.0 or later and your Dataproc cluster gets stuck in provisioning state, see Adding network tags.
gcloud
To upgrade to a new Cloud Data Fusion version, run the following gcloud CLI command from a local terminal Cloud Shell session. Add the --enable_stackdriver_logging, --enable_stackdriver_monitoring , and --labels flags if they apply to your instance.
gcloud beta data-fusion instances update \ --project=PROJECT_ID \ --location=REGION \ --version=NEW_VERSION_NUMBER INSTANCE_ID
After the command completes, verify that the upgrade was successful. From the Google Cloud console, reload the Instance details page, and then click System admin in the menu bar. The new version number appears at the top of the page.
To prevent your pipelines from getting stuck when you run them in the new version:
Grant the required roles in your upgraded instance.
If you have upgraded to version 6.2.0 or later and your Dataproc cluster gets stuck in provisioning state, see Adding network tags.
Upgrade batch pipelines
To upgrade your Cloud Data Fusion batch pipelines to use the latest plugin versions:
Recommended: Backup all pipelines.
To trigger the zip file download, run the following command, then copy the URL output to your browser.
echo $CDAP_ENDPOINT/v3/export/apps
Extract the downloaded file, then confirm that all pipelines were exported. The pipelines are organized by namespace.
Upgrade pipelines.
Create a variable that points to the
pipeline_upgrade.json
file that you will create in the next step to save a list of pipelines (insert the PATH to the file).export PIPELINE_LIST=PATH/pipeline_upgrade.json
Create a list of all pipelines for an instance and namespace using the following command. The result is stored in the
$PIPELINE_LIST
file inJSON
format. You can edit the list to remove pipelines that don't need to be upgraded. Set the NAMESPACE_ID field to the namespace where you want the upgrade to happen.curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" ${CDAP_ENDPOINT}/v3/namespaces/NAMESPACE_ID/apps -o $PIPELINE_LIST
Upgrade the pipelines listed in
pipeline_upgrade.json
. Insert the NAMESPACE_ID of pipelines to be upgraded. The command displays a list of upgraded pipelines with their upgrade status.curl -N -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" ${CDAP_ENDPOINT}/v3/namespaces/NAMESPACE_ID/upgrade --data @$PIPELINE_LIST
Prevent your pipelines from getting stuck when you run them in the new version:
Grant the required roles in your upgraded instance.
If you have upgraded to version 6.2.0 or later and your Dataproc cluster gets stuck in provisioning state, see Adding network tags.
Upgrade to enable Replication
Replication can be enabled in Cloud Data Fusion environments in version 6.3.0 or later. If you have version 6.2.3, upgrade to 6.3.0, and then enable Replication.
Grant roles for upgraded instances
If you upgrade an instance from Cloud Data Fusion version 6.1.x to
versions 6.2.0 or later, after the upgrade completes, grant the
Cloud Data Fusion Runner role
(roles/datafusion.runner
) and
Cloud Storage Admin role
(roles/storage.admin
) to the Dataproc service account in your
project.
Add network tags
Network tags are preserved in your compute profiles when you upgrade from Cloud Data Fusion versions 6.2.x or later to a higher version.
If you upgrade from version 6.1.x to version 6.2.0 or later, network tags are not preserved. It might cause your Dataproc cluster to get stuck in provisioning state, especially if your environment has restrictive networking and security policies.
Instead, in each updated instance, manually add your network tags to each of the compute profiles it uses.
To add the network tags to a compute profile:
In the Google Cloud console, open the Cloud Data Fusion Instances page.
Click View Instance.
Click System Admin.
Click the Configuration tab.
Expand the System Compute Profiles box.
Click Create New Profile. A page of provisioners opens.
Click Dataproc.
Enter your desired profile information, including your network tags.
Click Create.
After you add the tags, use the updated profile in your pipeline. The new tags are preserved in future releases.
Available versions for your upgrade
When you upgrade, use the latest version of Cloud Data Fusion so that your instances run in a supported environment as long as possible. For more information, see the Version support policy. Depending on your original version, upgrades to some versions might not be available. In those cases, upgrade to a version that supports upgrades to your desired version.
Cloud Data Fusion supports the following version upgrades:
Your Cloud Data Fusion version | Available upgrades |
---|---|
6.8.2 | 6.8.3 (latest) |
6.8.1 | 6.8.3 |
6.8.0 | 6.8.3 |
6.7.3 | 6.8.3 |
6.7.2 | 6.7.3 |
6.7.1 | 6.7.3 |
6.7.0 | 6.7.3 |
6.6.0 | 6.7.3, 6.8.3 |
6.5.1 | 6.6.0, 6.7.3, 6.8.3 |
6.5.0 | 6.5.1 |
6.4.1 | 6.5.1, 6.6.0, 6.7.3, 6.8.3 |
6.4.0 | 6.4.1 |
6.3.1 | 6.5.1, 6.6.0, 6.7.3, 6.8.3 |
6.3.0 | 6.3.1 |
6.2.3 | 6.5.1, 6.6.0, 6.7.3, 6.8.3 |
6.2.2 | 6.2.3 |
6.2.1 | 6.2.3 |
6.2.0 | 6.2.3 |
6.1.4 | 6.5.1, 6.6.0, 6.7.3, 6.8.3 |
6.1.3 | 6.1.4, 6.3.1 |
6.1.2 | 6.1.4 |
Troubleshooting
When you upgrade to version 6.4, there is a known issue with the Joiner plugin where you cannot see join conditions. For more information, see the Troubleshooting page.