Upgrading instances and pipelines

You can upgrade your Cloud Data Fusion instances and batch pipelines to the latest platform and plugin versions to obtain the latest features, bug fixes, and performance improvements. The upgrade process involves instance and pipeline downtime (see Before you start).

Before you start

  • Plan a scheduled downtime for the upgrade. The process takes up to an hour.

  • Recommended: Before you upgrade, stop any running pipelines and disable any upstream triggers, such as Cloud Composer triggers. When the upgrade begins, all running pipelines stop. If you upgrade to versions 6.3 and above, if any pipelines are running beforehand, Cloud Data Fusion doesn't restart them. In earlier versions, Cloud Data Fusion attempts to restart them.

  • Install Cloud SDK.

  • Install curl.

Upgrading Cloud Data Fusion instances

To upgrade a Cloud Data Fusion instance to a new Cloud Data Fusion version:

  1. In the Cloud Console, open the Instances page.

    Open the Instances page

  2. Click on Instance Name to open the Instance details page. This page lists instance information, including the instance id, region, current Cloud Data Fusion version, logging and monitoring settings, and any instance labels.

Then perform the upgrade using either the Cloud Console or gcloud command-line tool:

Console

  1. Click Upgrade for a list of available versions.

  2. Select the version that you prefer.

  3. Click Upgrade.

  4. Click View instance to access the upgraded instance.

  5. Verify that the upgrade was successful by reloading the Instance details page, and then clicking System admin in the menu bar. The new version number appears at the top of the page.

  6. To prevent your pipelines from getting stuck when you run them in the new version:

    1. Grant the required roles in your upgraded instance.

    2. If you have upgraded to version 6.2.0 or above and your Dataproc cluster gets stuck in provisioning state, see Adding network tags.

gcloud

  1. Run the following gcloud command from a local terminal Cloud Shell session to upgrade to a new Cloud Data Fusion version. Add the --enable_stackdriver_logging, --enable_stackdriver_monitoring , and --labels flags if they apply to your instance.

    gcloud beta data-fusion instances update \
        --project=PROJECT_ID \
        --location=REGION \
        --version=NEW_VERSION_NUMBER INSTANCE_ID
    

  2. After the command completes, verify that the upgrade was successful. From the Cloud Console, reload the Instance details page, and then click System admin in the menu bar. The new version number appears at the top of the page.

  3. To prevent your pipelines from getting stuck when you run them in the new version:

    1. Grant the required roles in your upgraded instance.

    2. If you have upgraded to version 6.2.0 or above and your Dataproc cluster gets stuck in provisioning state, see Adding network tags.

Upgrading batch pipelines

To upgrade your Cloud Data Fusion batch pipelines to use the latest plugin versions:

  1. Set environment variables.

  2. Recommended: Backup all pipelines.

    1. Run the following command, then copy the URL output to your browser to trigger a zip file download.

      echo $CDAP_ENDPOINT/v3/export/apps
      

    2. Unzip the downloaded file, then confirm that all pipelines were exported. The pipelines are organized by namespace.

  3. Upgrade pipelines.

    1. Create a variable that points to the pipeline_upgrade.json file that you will create in the next step to save a list of pipelines (insert the PATH to the file).

      export PIPELINE_LIST=PATH/pipeline_upgrade.json
      

    2. Create a list of all of the pipelines for an instance and namespace using the following command. The result is stored in the $PIPELINE_LIST file in JSON format. You can edit the list to remove pipelines that do not need to be upgraded. Set the NAMESPACE_ID field to the namespace where you want the upgrade to happen.

      curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" ${CDAP_ENDPOINT}/v3/namespaces/NAMESPACE_ID/apps -o $PIPELINE_LIST
      

    3. Upgrade the pipelines listed in pipeline_upgrade.json. Insert the NAMESPACE_ID of pipelines to be upgraded. The command displays a list of upgraded pipelines with their upgrade status.

      curl -N -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" ${CDAP_ENDPOINT}/v3/namespaces/NAMESPACE_ID/upgrade --data @$PIPELINE_LIST
      

  4. To prevent your pipelines from getting stuck when you run them in the new version:

    1. Grant the required roles in your upgraded instance.

    2. If you have upgraded to version 6.2.0 or above and your Dataproc cluster gets stuck in provisioning state, see Adding network tags.

Upgrading to enable Replication

Replication can be enabled in Cloud Data Fusion environments in version 6.3.0 or above. If you have version 6.2.3, upgrade to 6.3.0, and then enable Replication.

Granting roles for upgraded instances

If you upgrade an instance from Cloud Data Fusion version 6.1.x to versions 6.2.0 or above, after the upgrade completes, grant the Cloud Data Fusion runner role and Cloud Storage admin role to Dataproc service account in your project.

Adding network tags

Network tags are preserved in your compute profiles when you upgrade from Cloud Data Fusion versions 6.2.x and above to a higher version.

If you upgrade from version 6.1.x to version 6.2.0 and above, network tags are not preserved. This might cause your Dataproc cluster to get stuck in provisioning state, especially if your environment has restrictive networking and security policies.

Instead, in each updated instances, manually add your network tags to each of the compute profiles it uses.

To add the network tags to a compute profile:

  1. In the Google Cloud Console, open the Cloud Data Fusion Instances page.

  2. Click View Instance.

  3. Click System Admin.

  4. Click the Configuration tab.

  5. Expand the System Compute Profiles box.

  6. Click Create New Profile. A page of provisioners opens.

  7. Click Dataproc.

  8. Enter your desired profile information, including your network tags.

  9. Click Create.

After you add the tags, use the updated profile in your pipeline. The new tags are preserved in future releases.

Available versions for your upgrade

Depending on your original version, upgrades to some versions might not be available. To get a list of available versions for your upgrade, follow the steps for upgrading your instances in the Google Cloud Console.

Troubleshooting

When you upgrade to version 6.4, there is a known issue with the Joiner plugin where you cannot see join conditions. For more information, see the Troubleshooting page.