This page describes when to use static Dataproc clusters in Cloud Data Fusion. It also describes compatible versions and recommended cluster configurations.
For more information, see Manage a cluster.
When to use static clusters
By default, Cloud Data Fusion creates ephemeral clusters for each pipeline: it creates a cluster at the beginning of the pipeline run, and then deletes it after the pipeline run completes.
In the following scenarios, do not use the default. Instead, use a static cluster:
When the time it takes to create a new cluster for every pipeline is prohibitive for your use case.
When your organization requires cluster creation to be managed centrally. For example, when you want to enforce certain policies for all Dataproc clusters.
For more information, see Running a pipeline against an existing Dataproc cluster.
Problem: The version of your Cloud Data Fusion environment might not be compatible with the version of your Dataproc cluster.
The following Cloud Data Fusion versions support the corresponding Dataproc versions.
|Cloud Data Fusion version||Dataproc version|
|6.1 to 6.3*||1.3.x|
|6.4+||1.3.x and 2.0.x|
Recommended: When you create a static cluster for your pipelines, use the following configurations.
||Retains YARN logs.
||Enables YARN to check for physical memory limits and kill containers
if they go beyond physical memory.
||enables YARN to check for virtual memory limits and kill containers if
they go beyond physical memory.
- Learn how to Run a pipeline against an existing Dataproc cluster.
- Learn how to Manage a cluster