You can upgrade your Cloud Data Fusion instances and batch pipelines to the latest platform and plugin versions to obtain the latest features, bug fixes, and performance improvements. The upgrade process involves instance and pipeline downtime (see Before you start).
Before you start
Plan a scheduled downtime for the upgrade. The process takes up to an hour.
Recommended: Before you upgrade, stop any running pipelines and disable any upstream triggers, such as Cloud Composer triggers. When the upgrade begins, all running pipelines stop. If you upgrade to versions 6.3 and above, if any pipelines are running beforehand, Cloud Data Fusion doesn't restart them. In earlier versions, Cloud Data Fusion attempts to restart them.
Install
curl
.
Upgrade Cloud Data Fusion instances
To upgrade a Cloud Data Fusion instance to a new Cloud Data Fusion version:
In the Google Cloud console, open the Instances page.
Click on
Instance Name
to open the Instance details page. This page lists instance information, including theinstance id
,region
, current Cloud Data Fusionversion
, logging and monitoring settings, and any instance labels.
Then perform the upgrade using either the Google Cloud console or Google Cloud CLI:
Console
Click Upgrade for a list of available versions.
Select the version that you prefer.
Click Upgrade.
Click View instance to access the upgraded instance.
Verify that the upgrade was successful by reloading the Instance details page, and then clicking System admin in the menu bar. The new version number appears at the top of the page.
To prevent your pipelines from getting stuck when you run them in the new version:
Grant the required roles in your upgraded instance.
If you have upgraded to version 6.2.0 or above and your Dataproc cluster gets stuck in provisioning state, see Adding network tags.
gcloud
Run the following
gcloud
command from a local terminal Cloud Shell session to upgrade to a new Cloud Data Fusion version. Add the --enable_stackdriver_logging, --enable_stackdriver_monitoring , and --labels flags if they apply to your instance.gcloud beta data-fusion instances update \ --project=PROJECT_ID \ --location=REGION \ --version=NEW_VERSION_NUMBER INSTANCE_ID
After the command completes, verify that the upgrade was successful. From the Google Cloud console, reload the Instance details page, and then click System admin in the menu bar. The new version number appears at the top of the page.
To prevent your pipelines from getting stuck when you run them in the new version:
Grant the required roles in your upgraded instance.
If you have upgraded to version 6.2.0 or above and your Dataproc cluster gets stuck in provisioning state, see Adding network tags.
Upgrade batch pipelines
To upgrade your Cloud Data Fusion batch pipelines to use the latest plugin versions:
Recommended: Backup all pipelines.
Run the following command, then copy the URL output to your browser to trigger a zip file download.
echo $CDAP_ENDPOINT/v3/export/apps
Unzip the downloaded file, then confirm that all pipelines were exported. The pipelines are organized by namespace.
Upgrade pipelines.
Create a variable that points to the
pipeline_upgrade.json
file that you will create in the next step to save a list of pipelines (insert the PATH to the file).export PIPELINE_LIST=PATH/pipeline_upgrade.json
Create a list of all of the pipelines for an instance and namespace using the following command. The result is stored in the
$PIPELINE_LIST
file inJSON
format. You can edit the list to remove pipelines that do not need to be upgraded. Set the NAMESPACE_ID field to the namespace where you want the upgrade to happen.curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" ${CDAP_ENDPOINT}/v3/namespaces/NAMESPACE_ID/apps -o $PIPELINE_LIST
Upgrade the pipelines listed in
pipeline_upgrade.json
. Insert the NAMESPACE_ID of pipelines to be upgraded. The command displays a list of upgraded pipelines with their upgrade status.curl -N -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" ${CDAP_ENDPOINT}/v3/namespaces/NAMESPACE_ID/upgrade --data @$PIPELINE_LIST
To prevent your pipelines from getting stuck when you run them in the new version:
Grant the required roles in your upgraded instance.
If you have upgraded to version 6.2.0 or above and your Dataproc cluster gets stuck in provisioning state, see Adding network tags.
Upgrade to enable Replication
Replication can be enabled in Cloud Data Fusion environments in version 6.3.0 or above. If you have version 6.2.3, upgrade to 6.3.0, and then enable Replication.
Grant roles for upgraded instances
If you upgrade an instance from Cloud Data Fusion version 6.1.x to versions 6.2.0 or above, after the upgrade completes, grant the Cloud Data Fusion runner role and Cloud Storage admin role to Dataproc service account in your project.
Add network tags
Network tags are preserved in your compute profiles when you upgrade from Cloud Data Fusion versions 6.2.x and above to a higher version.
If you upgrade from version 6.1.x to version 6.2.0 and above, network tags are not preserved. This might cause your Dataproc cluster to get stuck in provisioning state, especially if your environment has restrictive networking and security policies.
Instead, in each updated instances, manually add your network tags to each of the compute profiles it uses.
To add the network tags to a compute profile:
In the Google Cloud console, open the Cloud Data Fusion Instances page.
Click View Instance.
Click System Admin.
Click the Configuration tab.
Expand the System Compute Profiles box.
Click Create New Profile. A page of provisioners opens.
Click Dataproc.
Enter your desired profile information, including your network tags.
Click Create.
After you add the tags, use the updated profile in your pipeline. The new tags are preserved in future releases.
Available versions for your upgrade
In general, when you upgrade, we recommend using the latest version of Cloud Data Fusion environment so that your instances run in a supported environment for the longest possible time frame. For more information, see the Version support policy. Depending on your original version, upgrades to some versions might not be available. In those cases, you can upgrade to a version that supports upgrades to your desired version.
Cloud Data Fusion supports the following version upgrades:
Your Cloud Data Fusion version | Available upgrades |
---|---|
6.7.2 | 6.8.0 |
6.7.1 | 6.7.2 |
6.7.0 | 6.7.2 |
6.6.0 | 6.7.2, 6.8.0 |
6.5.1 | 6.6.0, 6.7.2, 6.8.0 |
6.5.0 | 6.5.1 |
6.4.1 | 6.5.1, 6.6.0, 6.7.2, 6.8.0 |
6.4.0 | 6.4.1 |
6.3.1 | 6.4.1, 6.5.1, 6.6.0, 6.7.2, 6.8.0 |
6.3.0 | 6.4.1 |
6.2.3 | 6.4.1, 6.5.1, 6.6.0, 6.7.2, 6.8.0 |
6.2.2 | 6.2.3 |
6.2.1 | 6.2.3 |
6.2.0 | 6.2.3 |
6.1.4 | 6.4.1, 6.5.1, 6.6.0, 6.7.2, 6.8.0 |
6.1.3 | 6.1.4, 6.3.1 |
6.1.2 | 6.1.4 |
Troubleshooting
When you upgrade to version 6.4, there is a known issue with the Joiner plugin where you cannot see join conditions. For more information, see the Troubleshooting page.