[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[[["\u003cp\u003eYou can modify the Dataproc image version used by your Cloud Data Fusion instance at the instance, namespace, or pipeline level.\u003c/p\u003e\n"],["\u003cp\u003eBefore changing the Dataproc image version, it is crucial to stop all real-time pipelines and replication jobs to ensure the changes are applied correctly.\u003c/p\u003e\n"],["\u003cp\u003eThe Dataproc image version can be set through the Cloud Data Fusion web interface, in Compute Configurations, Namespace Preferences, or Pipeline Runtime Arguments.\u003c/p\u003e\n"],["\u003cp\u003eTo ensure batch pipelines and replication jobs succeed with Dataproc image 2.2 or 2.1, verify that the JDBC drivers used by the database plugins are compatible with Java 11.\u003c/p\u003e\n"],["\u003cp\u003eIf you use existing Dataproc clusters with Cloud Data Fusion, you should recreate them with the desired image version, ensuring the cluster name remains consistent for seamless operation.\u003c/p\u003e\n"]]],[],null,["# Change the Dataproc image version in Cloud Data Fusion\n\nThis page describes how to change the Dataproc image version used\nby your Cloud Data Fusion instance. You can change the image at the\ninstance, namespace, or pipeline level.\n\nBefore you begin\n----------------\n\nStop all real-time pipelines and replication jobs in the\nCloud Data Fusion instance. If a real-time pipeline or replication is\nrunning when you change the Dataproc image version, the changes\naren't applied to the pipeline execution.\n\nFor real-time pipelines, if checkpointing is enabled, stopping the\npipelines doesn't cause any data loss. For replication jobs, as long\nas the database logs are available, stopping and starting the\nreplication job doesn't cause data loss.\n**Note:** After the configuration changes are applied, **Batch pipelines** use the following updated configurations on subsequent runs. \n\n### Console\n\n1. Go to the Cloud Data Fusion **Instances** page and open the\n instance where you need to stop a pipeline.\n\n [Go to Instances](https://console.cloud.google.com/data-fusion/locations/-/instances)\n2. Open each real-time pipeline in the Pipeline Studio and click\n **Stop**.\n\n3. Open each replication job on the **Replicate** page and\n click **Stop**.\n\n### REST API\n\n- To retrieve all pipelines, use the following REST API call:\n\n GET -H \"Authorization: Bearer ${AUTH_TOKEN}\" \\\n \"${CDAP_ENDPOINT}/v3/namespaces/\u003cvar translate=\"no\"\u003eNAMESPACE_ID\u003c/var\u003e/apps\"\n\n Replace \u003cvar translate=\"no\"\u003eNAMESPACE_ID\u003c/var\u003e with the name of your\n namespace.\n- To stop a real-time pipeline, use the following REST API call:\n\n POST -H \"Authorization: Bearer ${AUTH_TOKEN}\" \\\n \"${CDAP_ENDPOINT}/v3/namespaces/\u003cvar translate=\"no\"\u003eNAMESPACE_ID\u003c/var\u003e/apps/\u003cvar translate=\"no\"\u003ePIPELINE_NAME\u003c/var\u003e/spark/DataStreamsSparkStreaming/stop\"\n\n Replace \u003cvar translate=\"no\"\u003eNAMESPACE_ID\u003c/var\u003e with the name of your\n namespace and \u003cvar translate=\"no\"\u003ePIPELINE_NAME\u003c/var\u003e with the name of the\n real-time pipeline.\n- To stop a replication job, use the following REST API call:\n\n POST -H \"Authorization: Bearer ${AUTH_TOKEN}\" \\\n \"${CDAP_ENDPOINT}/v3/namespaces/\u003cvar translate=\"no\"\u003eNAMESPACE_ID\u003c/var\u003e/apps/\u003cvar translate=\"no\"\u003eREPLICATION_JOB_NAME\u003c/var\u003e/workers/DeltaWorker/stop\"\n\n Replace \u003cvar translate=\"no\"\u003eNAMESPACE_ID\u003c/var\u003e with the name of your\n Namespace and \u003cvar translate=\"no\"\u003eREPLICATION_JOB_NAME\u003c/var\u003e with the name\n of the replication job.\n\n For more information, see [stopping real-time pipelines](/data-fusion/docs/reference/cdap-reference#stop_a_real-time_pipeline)\n and [stopping replication jobs](/data-fusion/docs/reference/replication-ref#stop-a-replication-job).\n\nCheck and override the default version of Dataproc in Cloud Data Fusion\n-----------------------------------------------------------------------\n\n1. [Go to the Cloud Data Fusion web interface](/data-fusion/docs/create-data-pipeline#navigate_the_web_interface).\n\n2. Click **System Admin \\\u003e Configuration \\\u003e System\n Preferences**.\n\n3. If a Dataproc image is not specified in System Preferences,\n or to change the preference, click **Edit System Preferences**.\n\n 1. Enter the following text in the **Key** field:\n\n `system.profile.properties.imageVersion`\n 2. Enter the chosen Dataproc image in the **Value field** ,\n such as `2.1`.\n\n 3. Click **Save \\& Close**.\n\nThis change affects the entire Cloud Data Fusion instance, including all\nits Namespaces and pipeline runs, unless the image version property is\noverridden in a Namespace, pipeline, or Runtime Argument in your instance.\n\nChange the Dataproc image version\n---------------------------------\n\nThe image version can be set in the Cloud Data Fusion web interface in\nthe Compute Configurations, Namespace Preferences, or Pipeline Runtime\nArguments.\n| **Note:** If you haven't overridden the Dataproc image version in Namespace Preferences or Pipeline Runtime Arguments, skip these steps.\n\n### Change the image in Namespace Preferences\n\nIf you have overridden the image version in your Namespace properties,\nfollow these steps:\n\n1. [Go to the Cloud Data Fusion web interface](/data-fusion/docs/create-data-pipeline#navigate_the_web_interface).\n\n2. Click **System Admin \\\u003e Configuration \\\u003e Namespaces**.\n\n3. Open each namespace and click **Preferences**.\n\n 1. Make sure that there is no override with key\n `system.profile.properties.imageVersion` with an incorrect image\n version value.\n\n 2. Click **Finish**.\n\n### Change the image in System Compute Profiles\n\n1. [Go to the Cloud Data Fusion web interface](/data-fusion/docs/create-data-pipeline#navigate_the_web_interface).\n\n2. Click **System Admin \\\u003e Configuration**.\n\n3. Click System **Compute Profiles \\\u003e Create New Profile**.\n\n4. Select the **Dataproc** provisioner.\n\n5. Create the profile for Dataproc. In the **Image Version**\n field, enter a Dataproc image version.\n\n6. Select this compute profile while running the pipeline on the **Studio**\n page. On the pipeline run page, click **Configure \\\u003e Compute\n config** and select this profile.\n\n7. Select the Dataproc profile and click **Save**.\n\n | **Note:** For more information about using image 2.2 and 2.1, which run in Java11, see [Change the Dataproc image to version 2.2 or 2.1](#change-to-dataproc-21).\n8. Click **Finish**.\n\n### Change the image in Pipeline Runtime Arguments\n\nIf you have overridden the image version with a property in the Runtime\nArguments of your pipeline, follow these steps:\n\n1. [Go to the Cloud Data Fusion web interface](/data-fusion/docs/create-data-pipeline#navigate_the_web_interface).\n\n2. Click menu\n **Menu \\\u003e List**.\n\n3. On the **List** page, select the pipeline you want to update.\n\n The pipeline opens on the **Studio** page.\n4. To expand the **Run** options, click the arrow_drop_down expander arrow.\n\n The **Runtime Arguments** window opens.\n5. Check that there is no override with the key\n `system.profile.properties.imageVersion` with an incorrect image version\n as the value.\n\n6. Click **Save**.\n\nRecreate static Dataproc clusters used by Cloud Data Fusion with chosen image version\n-------------------------------------------------------------------------------------\n\nIf you use existing Dataproc clusters with\nCloud Data Fusion, follow the [Dataproc\nguide](/dataproc/docs/guides/recreate-cluster) to recreate the clusters with the\nchosen Dataproc image version for your Cloud Data Fusion\nversion.\n| **Important:** Keep the cluster name the same.\n| **Note:** If there are any pipelines running when the cluster is being recreated, the pipelines will fail. Subsequent runs should run on the recreated cluster.\n\nAlternatively, you can create a new Dataproc cluster with the\nchosen Dataproc image version and delete and recreate the compute\nprofile in Cloud Data Fusion with the same compute profile name and\nupdated Dataproc cluster name. This way, running batch pipelines\ncan complete execution on the existing cluster and subsequent pipeline runs take\nplace on the new Dataproc cluster. You can delete the old\nDataproc cluster after you have confirmed that all pipeline runs\nhave completed.\n\nCheck that the Dataproc image version is updated\n------------------------------------------------\n\n### Console\n\n1. In the Google Cloud console, go to the Dataproc **Clusters**\n page.\n\n [Go to Clusters](https://console.cloud.google.com/dataproc/clusters)\n2. Open the **Cluster details** page for the new cluster that\n Cloud Data Fusion created when you specified the new version.\n\n The **Image version** field has the new value that you specified in\n Cloud Data Fusion.\n\n### REST API\n\n1. Get the list of clusters with their metadata:\n\n GET -H \"Authorization: Bearer ${AUTH_TOKEN}\" \\\n https://dataproc.googleapis.com/v1/projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/regions/\u003cvar translate=\"no\"\u003eREGION_ID\u003c/var\u003e/clusters\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e with the name of your namespace\n - \u003cvar translate=\"no\"\u003eREGION_ID\u003c/var\u003e with the name of the region where your clusters are located\n2. Search for the name of your pipeline (cluster name).\n\n3. Under that JSON object, see the image in `config `\u003e`\n softwareConfig `\u003e` imageVersion`.\n\nChange the Dataproc image to version 2.2 or 2.1\n-----------------------------------------------\n\nCloud Data Fusion versions 6.9.1 and later support the\nDataproc image 2.1 Compute Engine, which runs in Java 11.\nIn versions 6.10.0 and later, image 2.1 is the default.\n\nIf you change to image 2.2 or 2.1 from an earlier image, for your batch\npipelines and replication jobs to succeed, the JDBC drivers that the\ndatabase plugins use in those instances must be compatible with Java 11.\n\nDataproc image 2.2 and 2.1 have the following limitations in\nCloud Data Fusion:\n\n- Map reduce jobs aren't supported.\n- JDBC driver versions used in the database plugins in your instance must be updated to have support for Java 11. See the following table for driver versions that work with Dataproc 2.2, 2.1, and Java 11:\n\n### Memory usage when using Dataproc 2.2 or 2.1\n\nMemory usage might increase for pipelines that use Dataproc 2.2\nor 2.1 clusters. If you upgrade your instance to version 6.10 or later, and\nprevious pipelines are failing due to memory issues, increase the driver and\nexecutor memory to 2048 MB in the `Resources` configuration for the\npipeline.\n\nAlternatively, you can override the Dataproc version by setting\nthe `system.profile.properties.imageVersion` runtime argument to `2.0-debian10`."]]