[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[],[],null,["# 2.3.x release versions\n\nImportant changes in 2.3:\n-------------------------\n\n- Version `2.3` is a lightweight image that contains only core components,\n reducing exposure to Common Vulnerabilities and Exposures (CVEs). For higher\n security compliance requirements, use the image version `2.3`or later, when\n creating a Dataproc cluster.\n\n- If you choose to install\n [optional components](/dataproc/docs/concepts/components/overview) when\n creating a Dataproc cluster with `2.3` image, they will be\n downloaded and installed during cluster creation. This might increase the\n cluster startup time. To avoid this delay, you can create a\n [custom image](/dataproc/docs/guides/dataproc-images#generate_a_custom_image)\n with the optional components pre-installed. This is achieved by running\n [`generate_custom_image.py`](https://github.com/GoogleCloudDataproc/custom-images?tab=readme-ov-file#generate-custom-image)\n with the\n [`--optional-components`](/dataproc/docs/guides/dataproc-images#run_the_code)\n flag.\n\n | **Note:** You must specify the optional components that you want to install when you create the cluster. For more information, see [Add optional components](/dataproc/docs/concepts/components/overview#add_optional_components). \n | The following example shows the Google Cloud CLI command for creating a cluster with optional components: \n |\n | ```\n | gcloud dataproc clusters create CLUSTER_NAME\n | --optional-components=COMPONENT_NAME \\\n | ... other flags\n | ```\n\nNotes:\n------\n\n- The following are the optional components in 2.3 images:\n\n - Apache Flink\n - Apache Hive WebHCat\n - Apache Hudi\n - Apache Iceberg\n - Apache Pig\n - Delta Lake\n - Docker\n - JupyterLab Notebook\n - Ranger\n - Solr\n - Zeppelin Notebook\n - Zookeeper\n- `yarn.nodemanager.recovery.enabled` and HDFS Audit Logging\n are enabled by default in 2.3 images.\n\n- micromamba, instead of conda in previous image versions, is installed as part\n of the Python installation.\n\n- Docker and Zeppelin installation issues:\n\n - Installation fails if the cluster has no public internet access. As a workaround, create a cluster that uses a custom image with optional components pre-installed. You can do this by running [`generate_custom_image.py`](https://github.com/GoogleCloudDataproc/custom-images) with the [`--optional-components` flag](/dataproc/docs/guides/dataproc-images#run_the_code).\n - Installation can fail if the cluster is pinned to an older sub-minor image version: Packages are installed on demand from public OSS repositories, and a package might not be available upstream to support the installation. As a workaround, create a cluster that uses a custom image with optional components pre-installed in the custom image. To do this, run [`generate_custom_image.py`](https://github.com/GoogleCloudDataproc/custom-images) with the [`--optional-components` flag](/dataproc/docs/guides/dataproc-images#run_the_code).\n\nImage version 2.3 machine learning (ML) components\n--------------------------------------------------\n\nThe Dataproc `2.3-ml-ubuntu` image extends the 2.3 base image\nwith ML-specific software. It supports 2.3 image optional components and other\n2.3 features, and adds the component versions listed in the following sections.\n\n### GPU-specific libraries\n\nFor Dataproc jobs that use GPU VMs,\nthe following NVIDIA driver and libraries are available in the\n`2.3-ml-ubuntu` image. You can use them to accomplish the following\ntasks:\n\n- Accelerate Spark batch workloads with the [NVIDIA Spark Rapids library](https://docs.nvidia.com/spark-rapids/index.html)\n- Train machine learning workloads\n- Run distributed batch inference using Spark\n\n### XGBoost libraries\n\nThe following [Maven package versions](https://mvnrepository.com/artifact/ml.dmlc)\nare available in `2.3-ml-ubuntu` image to let you use\n[XGBoost](https://www.nvidia.com/en-us/glossary/xgboost/) with Spark in Java or\nScala.\n\n| **Note:** You cannot use distributed Spark XGBoost on a Dataproc job that has [autoscaling](/dataproc/docs/concepts/configuring-clusters/autoscaling#enable_autoscaling%20enabled) (the default behavior) because new nodes that start elastic scaling cannot receive new tasks and remain idle. To use XGBoost with a batch workload, you can set the [`spark.dynamicAllocation.enabled = false`](/dataproc-serverless/docs/concepts/autoscaling#spark_dynamic_allocation_properties) property on a Dataproc job to disable dynamic allocation.\n\n### Python libraries\n\nThe `2.3-ml-ubuntu` image contains the following libraries, which support different\nstages in the ML lifecycle.\n\\`2.3-ml-ubuntu\\` image Python libraries\n\n### R libraries\n\nThe following R library versions are included in `2.3-ml-ubuntu` image.\n\\`2.3-ml-ubuntu\\` image R libraries"]]