Reuse clusters

This page describes how to reuse Dataproc clusters for your pipeline runs in Cloud Data Fusion. For more information, see When to reuse clusters and Run a pipeline against an existing Dataproc cluster.

Before you begin

  • You must have a Cloud Data Fusion instance in version 6.5.0 or later.

Enable cluster reuse

You can reuse clusters in a new compute profile, or in one that's been used in a deployed pipeline.

Enable cluster reuse in a new profile

  1. Go to your instance:

    1. In the Google Cloud console, go to the Cloud Data Fusion page.

    2. To open the instance in the Cloud Data Fusion web interface, click Instances, and then click View instance.

      Go to Instances

  2. Click System admin > Configuration > System compute profiles.

  3. Click Create new profile.

  4. Choose the Dataproc provisioner.

  5. In the Create a profile for Dataproc window, enter the details about your cluster:

    1. In the Profile label and Profile name fields, enter a name to identify the profile—for example, execution_compute-profile.
    2. In the Description field, describe the purpose of the profile—for example, Profile used for pipeline execution.
    3. In the Max idle time field, enter a value. For more information, see Set max idle time.
    4. Set the Skip cluster delete field to True. For more information, see When to reuse clusters.
    5. Optional: configure other optional fields.
    6. Click Create.

Enable cluster reuse in a deployed pipeline

  1. Go to your instance:

    1. In the Google Cloud console, go to the Cloud Data Fusion page.

    2. To open the instance in the Cloud Data Fusion web interface, click Instances, and then click View instance.

      Go to Instances

  2. Click List.

  3. Click the Deployed tab and click a pipeline name. The deployed pipeline opens on the Studio page in the Cloud Data Fusion web interface.

  4. Click Configure.

  5. In the Compute config window, go to the chosen profile and click Customize.

  6. In the window that opens, enter the following values:

    1. In the Max Idle Time field, enter a value. For more information, see Set max idle time.
    2. Set Skip cluster delete to True. For more information, see When to reuse clusters.
  7. Click Done.

What's next