Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Anda dapat menginstal komponen tambahan seperti Jupyter saat membuat cluster Dataproc menggunakan fitur
Komponen opsional. Halaman ini menjelaskan komponen Jupyter.
Komponen Jupyter
adalah notebook pengguna tunggal berbasis Web untuk analisis data interaktif dan mendukung UI Web
JupyterLab. UI Web Jupyter tersedia di port 8123 pada node master pertama cluster.
Konfigurasi Jupyter. Jupyter dapat dikonfigurasi dengan menyediakan dataproc:jupyterproperti cluster.
Untuk mengurangi risiko eksekusi kode jarak jauh melalui API server notebook yang tidak aman, setelan properti cluster dataproc:jupyter.listen.all.interfaces default adalah false, yang membatasi koneksi ke localhost (127.0.0.1) saat Gateway Komponen diaktifkan (aktivasi Gateway Komponen diperlukan saat menginstal komponen Jupyter).
Notebook Jupyter menyediakan kernel Python untuk menjalankan kode Spark, dan kernel PySpark. Secara default, notebook disimpan di Cloud Storage
di bucket staging Dataproc, yang ditentukan oleh pengguna atau
dibuat otomatis
saat cluster dibuat. Lokasi dapat diubah pada saat pembuatan cluster menggunakan properti cluster
dataproc:jupyter.notebook.gcs.dir.
Bekerja dengan file data. Anda dapat menggunakan notebook Jupyter untuk bekerja dengan file data yang telah
diupload ke Cloud Storage.
Karena konektor Cloud Storage
telah diinstal sebelumnya di cluster Dataproc, Anda dapat mereferensikan
file secara langsung di notebook Anda. Berikut adalah contoh yang mengakses file CSV di
Cloud Storage:
Untuk membuat cluster Dataproc yang menyertakan komponen Jupyter, gunakan perintah gcloud dataproc clusters createcluster-name dengan flag --optional-components.
Contoh versi gambar default terbaru
Contoh berikut menginstal komponen Jupyter
pada cluster yang menggunakan versi image default terbaru.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[[["\u003cp\u003eThe Jupyter component is a single-user, web-based notebook for interactive data analytics, accessible via port 8123 on the cluster's first master node, and it also supports the JupyterLab Web UI.\u003c/p\u003e\n"],["\u003cp\u003eTo enable multi-user notebook access, you can utilize a Dataproc-enabled Vertex AI Workbench instance or install the Dataproc JupyterLab plugin on a VM.\u003c/p\u003e\n"],["\u003cp\u003eJupyter notebooks can be configured using specific cluster properties, and by default, notebooks are saved in Cloud Storage, with the location being customizable at cluster creation.\u003c/p\u003e\n"],["\u003cp\u003eThe Jupyter component can be installed when creating a Dataproc cluster through the Google Cloud console, gcloud CLI, or REST API, but requires the Component Gateway to be enabled.\u003c/p\u003e\n"],["\u003cp\u003eJupyter notebooks support working directly with data files in Cloud Storage, and you can also attach GPUs to master and worker nodes to enhance machine learning tasks within Jupyter.\u003c/p\u003e\n"]]],[],null,["You can install additional components like Jupyter when you create a Dataproc\ncluster using the\n[Optional components](/dataproc/docs/concepts/components/overview#available_optional_components)\nfeature. This page describes the Jupyter component.\n\nThe [Jupyter](http://jupyter.org/) component\nis a Web-based **single-user** notebook for interactive data analytics and supports the\n[JupyterLab](https://jupyterlab.readthedocs.io/en/stable/index.html)\nWeb UI. The Jupyter Web UI is available on port `8123` on the cluster's first master node.\n\n**Launch notebooks for multiple users.** You can create a Dataproc-enabled\n[Vertex AI Workbench instance](/vertex-ai/docs/workbench/instances/create-dataproc-enabled)\nor [install the Dataproc JupyterLab plugin](/dataproc-serverless/docs/quickstarts/jupyterlab-sessions)\non a VM to to serve notebooks to multiple users.\n\n**Configure Jupyter.** Jupyter can be configured by providing `dataproc:jupyter`\n[cluster properties](/dataproc/docs/concepts/configuring-clusters/cluster-properties#service_properties).\nTo reduce the risk of remote code execution over unsecured notebook server\nAPIs, the default `dataproc:jupyter.listen.all.interfaces` cluster property\nsetting is `false`, which restricts connections to `localhost (127.0.0.1)` when\nthe [Component Gateway](/dataproc/docs/concepts/accessing/dataproc-gateways) is\nenabled (Component Gateway activation is required when installing the Jupyter component).\n\nThe Jupyter notebook provides a Python kernel to run [Spark](https://spark.apache.org/) code, and a\nPySpark kernel. By default, notebooks are [saved in Cloud Storage](https://github.com/src-d/jgscm)\nin the Dataproc staging bucket, which is specified by the user or\n[auto-created](/dataproc/docs/guides/create-cluster#auto-created_staging_bucket)\nwhen the cluster is created. The location can be changed at cluster creation time using the\n[`dataproc:jupyter.notebook.gcs.dir`](/dataproc/docs/concepts/configuring-clusters/cluster-properties#dataproc-properties) cluster property.\n\n**Work with data files.** You can use a Jupyter notebook to work with data files that have been\n[uploaded to Cloud Storage](/storage/docs/uploading-objects).\nSince the [Cloud Storage connector](/dataproc/docs/concepts/connectors/cloud-storage)\nis pre-installed on a Dataproc cluster, you can reference the\nfiles directly in your notebook. Here's an example that accesses CSV files in\nCloud Storage: \n\n```\ndf = spark.read.csv(\"gs://bucket/path/file.csv\")\ndf.show()\n```\n\nSee\n[Generic Load and Save Functions](https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html)\nfor PySpark examples.\n\nInstall Jupyter\n\nInstall the component when you create a Dataproc cluster.\nThe Jupyter component requires activation of the Dataproc\n[Component Gateway](/dataproc/docs/concepts/accessing/dataproc-gateways).\n**Note:** Only when using [image version 1.5](/dataproc/docs/concepts/versioning/dataproc-version-clusters#unsupported-dataproc-image-versions), installation of the Jupyter component also requires installation of the [Anaconda](/dataproc/docs/concepts/components/anaconda) component. \n\nConsole\n\n1. Enable the component.\n - In the Google Cloud console, open the Dataproc [Create a cluster](https://console.cloud.google.com/dataproc/clustersAdd) page. The **Set up cluster** panel is selected.\n - In the **Components** section:\n - Under **Optional components** , select the **Jupyter** component.\n - Under **Component Gateway** , select **Enable component gateway** (see [Viewing and Accessing Component Gateway URLs](/dataproc/docs/concepts/accessing/dataproc-gateways#viewing_and_accessing_component_gateway_urls)).\n\ngcloud CLI\n\nTo create a Dataproc cluster that includes the Jupyter component,\nuse the\n[gcloud dataproc clusters create](/sdk/gcloud/reference/dataproc/clusters/create) \u003cvar translate=\"no\"\u003ecluster-name\u003c/var\u003e command with the `--optional-components` flag.\n\n**Latest default image version example**\n\nThe following example installs the Jupyter\ncomponent on a cluster that uses the latest default image version. \n\n```\ngcloud dataproc clusters create cluster-name \\\n --optional-components=JUPYTER \\\n --region=region \\\n --enable-component-gateway \\\n ... other flags\n```\n\nREST API\n\nThe Jupyter component\ncan be installed through the Dataproc API using\n[`SoftwareConfig.Component`](/dataproc/docs/reference/rest/v1/ClusterConfig#Component)\nas part of a\n[`clusters.create`](/dataproc/docs/reference/rest/v1/projects.regions.clusters/create)\nrequest.\n\n- Set the [EndpointConfig.enableHttpPortAccess](/dataproc/docs/reference/rest/v1/ClusterConfig#endpointconfig) property to `true` as part of the `clusters.create` request to enable connecting to the Jupyter notebook Web UI using the [Component Gateway](/dataproc/docs/concepts/accessing/dataproc-gateways).\n\nOpen the Jupyter and JupyterLab UIs\n\nClick the [Google Cloud console Component Gateway links](/dataproc/docs/concepts/accessing/dataproc-gateways#viewing_and_accessing_component_gateway_urls)\nto open in your local browser the Jupyter notebook or JupyterLab UI running on\nthe cluster master node.\n\n**Select \"GCS\" or \"Local Disk\" to create a new Jupyter Notebook in\neither location.**\n\nAttach GPUs to master and worker nodes\n\nYou can [add GPUs](https://cloud.google.com/dataproc/docs/concepts/compute/gpus)\nto your cluster's master and worker nodes when using a Jupyter notebook to:\n\n1. Preprocess data in Spark, then collect a [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) onto the master and run [TensorFlow](https://www.tensorflow.org/)\n2. Use Spark to orchestrate TensorFlow runs in parallel\n3. Run [Tensorflow-on-YARN](https://github.com/criteo/tf-yarn)\n4. Use with other machine learning scenarios that use GPUs"]]