Upgrading instances and pipelines

You can upgrade your Cloud Data Fusion instances and pipelines to the latest platform and plugin versions to obtain the latest features, bug fixes, and performance improvements. The upgrade process involves instance and pipeline downtime (see Before you start).

Before you start

  • Plan a scheduled downtime for the upgrade. The process takes up to an hour.

  • Recommended: Before you upgrade, stop any running pipelines and disable any upstream triggers, such as Cloud Composer triggers. When the upgrade begins, all running pipelines stop. If you upgrade to versions 6.3 and above, if any pipelines are running beforehand, Cloud Data Fusion doesn't restart them. In earlier versions, Cloud Data Fusion attempts to restart them.

  • Install Cloud SDK.

  • Install curl.

Upgrading Cloud Data Fusion instances

To upgrade a Cloud Data Fusion instance to a new Cloud Data Fusion version:

  1. In the Cloud Console, open the Instances page.

    Open the Instances page

  2. Click on Instance Name to open the Instance details page. This page lists instance information, including the instance id, region, current Cloud Data Fusion version, logging and monitoring settings, and any instance labels.

Then perform the upgrade using either the Cloud Console or gcloud command-line tool:

Console

  1. Click Upgrade for a list of available versions.

  2. Select the version that you prefer.

  3. Click Upgrade.

  4. Click View instance to access the upgraded instance.

  5. Verify that the upgrade was successful by reloading the Instance details page, and then clicking System admin in the menu bar. The new version number appears at the top of the page.

gcloud

  1. Run the following gcloud command from a local terminal Cloud Shell session to upgrade to a new Cloud Data Fusion version. Add the --enable_stackdriver_logging, --enable_stackdriver_monitoring , and --labels flags if they apply to your instance.

    gcloud beta data-fusion instances update \
        --project=PROJECT_ID \
        --location=REGION \
        --version=NEW_VERSION_NUMBER
    

  2. After the command completes, verify that the upgrade was successful. From the Cloud Console, reload the Instance details page, and then click System admin in the menu bar. The new version number appears at the top of the page.

Upgrading batch pipelines

To upgrade your Cloud Data Fusion batch pipelines to use the latest plugin versions:

  1. Set environment variables.

  2. Recommended: Backup all pipelines.

    1. Run the following command, then copy the URL output to your browser to trigger a zip file download.

      echo $CDAP_ENDPOINT/v3/export/apps
      

    2. Unzip the downloaded file, then confirm that all pipelines were exported. The pipelines are organized by namespace.

  3. Upgrade pipelines.

    1. Create a variable that points to the pipeline_upgrade.json file that you will create in the next step to save a list of pipelines (insert the PATH to the file).

      export PIPELINE_LIST=PATH/pipeline_upgrade.json
      

    2. Create a list of all of the pipelines for an instance and namespace using the following command. The result is stored in the $PIPELINE_LIST file in JSON format. You can edit the list to remove pipelines that do not need to be upgraded. Set the NAMESPACE_ID field to the namespace where you want the upgrade to happen.

      curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" ${CDAP_ENDPOINT}/v3/namespaces/NAMESPACE_ID/apps -o $PIPELINE_LIST
      

    3. Upgrade the pipelines listed in pipeline_upgrade.json. Insert the NAMESPACE_ID of pipelines to be upgraded. The command displays a list of upgraded pipelines with their upgrade status.

      curl -N -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" ${CDAP_ENDPOINT}/v3/namespaces/NAMESPACE_ID/upgrade --data @$PIPELINE_LIST