Mantieni tutto organizzato con le raccolte
Salva e classifica i contenuti in base alle tue preferenze.
Puoi installare componenti aggiuntivi come Jupyter quando crei un cluster Dataproc utilizzando la funzionalità Componenti facoltativi. Questa pagina descrive il componente Jupyter.
Il componente Jupyter è un notebook monoutente basato su web per l'analisi interattiva dei dati e supporta l'interfaccia utente web JupyterLab. L'interfaccia utente web di Jupyter è disponibile sulla porta 8123 nel primo nodo master del cluster.
Configura Jupyter. Jupyter può essere configurato fornendo dataproc:jupyterproprietà del cluster.
Per ridurre il rischio di esecuzione di codice remoto tramite API del server notebook non protette, l'impostazione predefinita della proprietà del cluster dataproc:jupyter.listen.all.interfaces è false, che limita le connessioni a localhost (127.0.0.1) quando il gateway dei componenti è attivato (l'attivazione del gateway dei componenti è necessaria durante l'installazione del componente Jupyter).
Il notebook Jupyter fornisce un kernel Python per eseguire il codice Spark e un kernel PySpark. Per impostazione predefinita, i blocchi note vengono salvati in Cloud Storage
nel bucket gestione temporanea Dataproc, specificato dall'utente o
creato automaticamente
al momento della creazione del cluster. La posizione può essere modificata al momento della creazione del cluster utilizzando la proprietà del cluster
dataproc:jupyter.notebook.gcs.dir.
Lavorare con i file di dati. Puoi utilizzare un notebook Jupyter per lavorare con i file di dati che sono stati
caricati in Cloud Storage.
Poiché il connettore Cloud Storage
è preinstallato su un cluster Dataproc, puoi fare riferimento ai
file direttamente nel notebook. Ecco un esempio di accesso ai file CSV in
Cloud Storage:
Per creare un cluster Dataproc che includa il componente Jupyter,
utilizza il
comando gcloud dataproc clusters createcluster-name con il flag --optional-components.
Esempio dell'ultima versione dell'immagine predefinita
L'esempio seguente installa il componente Jupyter
su un cluster che utilizza l'ultima versione dell'immagine predefinita.
[[["Facile da capire","easyToUnderstand","thumb-up"],["Il problema è stato risolto","solvedMyProblem","thumb-up"],["Altra","otherUp","thumb-up"]],[["Difficile da capire","hardToUnderstand","thumb-down"],["Informazioni o codice di esempio errati","incorrectInformationOrSampleCode","thumb-down"],["Mancano le informazioni o gli esempi di cui ho bisogno","missingTheInformationSamplesINeed","thumb-down"],["Problema di traduzione","translationIssue","thumb-down"],["Altra","otherDown","thumb-down"]],["Ultimo aggiornamento 2025-09-04 UTC."],[[["\u003cp\u003eThe Jupyter component is a single-user, web-based notebook for interactive data analytics, accessible via port 8123 on the cluster's first master node, and it also supports the JupyterLab Web UI.\u003c/p\u003e\n"],["\u003cp\u003eTo enable multi-user notebook access, you can utilize a Dataproc-enabled Vertex AI Workbench instance or install the Dataproc JupyterLab plugin on a VM.\u003c/p\u003e\n"],["\u003cp\u003eJupyter notebooks can be configured using specific cluster properties, and by default, notebooks are saved in Cloud Storage, with the location being customizable at cluster creation.\u003c/p\u003e\n"],["\u003cp\u003eThe Jupyter component can be installed when creating a Dataproc cluster through the Google Cloud console, gcloud CLI, or REST API, but requires the Component Gateway to be enabled.\u003c/p\u003e\n"],["\u003cp\u003eJupyter notebooks support working directly with data files in Cloud Storage, and you can also attach GPUs to master and worker nodes to enhance machine learning tasks within Jupyter.\u003c/p\u003e\n"]]],[],null,["You can install additional components like Jupyter when you create a Dataproc\ncluster using the\n[Optional components](/dataproc/docs/concepts/components/overview#available_optional_components)\nfeature. This page describes the Jupyter component.\n\nThe [Jupyter](http://jupyter.org/) component\nis a Web-based **single-user** notebook for interactive data analytics and supports the\n[JupyterLab](https://jupyterlab.readthedocs.io/en/stable/index.html)\nWeb UI. The Jupyter Web UI is available on port `8123` on the cluster's first master node.\n\n**Launch notebooks for multiple users.** You can create a Dataproc-enabled\n[Vertex AI Workbench instance](/vertex-ai/docs/workbench/instances/create-dataproc-enabled)\nor [install the Dataproc JupyterLab plugin](/dataproc-serverless/docs/quickstarts/jupyterlab-sessions)\non a VM to to serve notebooks to multiple users.\n\n**Configure Jupyter.** Jupyter can be configured by providing `dataproc:jupyter`\n[cluster properties](/dataproc/docs/concepts/configuring-clusters/cluster-properties#service_properties).\nTo reduce the risk of remote code execution over unsecured notebook server\nAPIs, the default `dataproc:jupyter.listen.all.interfaces` cluster property\nsetting is `false`, which restricts connections to `localhost (127.0.0.1)` when\nthe [Component Gateway](/dataproc/docs/concepts/accessing/dataproc-gateways) is\nenabled (Component Gateway activation is required when installing the Jupyter component).\n\nThe Jupyter notebook provides a Python kernel to run [Spark](https://spark.apache.org/) code, and a\nPySpark kernel. By default, notebooks are [saved in Cloud Storage](https://github.com/src-d/jgscm)\nin the Dataproc staging bucket, which is specified by the user or\n[auto-created](/dataproc/docs/guides/create-cluster#auto-created_staging_bucket)\nwhen the cluster is created. The location can be changed at cluster creation time using the\n[`dataproc:jupyter.notebook.gcs.dir`](/dataproc/docs/concepts/configuring-clusters/cluster-properties#dataproc-properties) cluster property.\n\n**Work with data files.** You can use a Jupyter notebook to work with data files that have been\n[uploaded to Cloud Storage](/storage/docs/uploading-objects).\nSince the [Cloud Storage connector](/dataproc/docs/concepts/connectors/cloud-storage)\nis pre-installed on a Dataproc cluster, you can reference the\nfiles directly in your notebook. Here's an example that accesses CSV files in\nCloud Storage: \n\n```\ndf = spark.read.csv(\"gs://bucket/path/file.csv\")\ndf.show()\n```\n\nSee\n[Generic Load and Save Functions](https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html)\nfor PySpark examples.\n\nInstall Jupyter\n\nInstall the component when you create a Dataproc cluster.\nThe Jupyter component requires activation of the Dataproc\n[Component Gateway](/dataproc/docs/concepts/accessing/dataproc-gateways).\n**Note:** Only when using [image version 1.5](/dataproc/docs/concepts/versioning/dataproc-version-clusters#unsupported-dataproc-image-versions), installation of the Jupyter component also requires installation of the [Anaconda](/dataproc/docs/concepts/components/anaconda) component. \n\nConsole\n\n1. Enable the component.\n - In the Google Cloud console, open the Dataproc [Create a cluster](https://console.cloud.google.com/dataproc/clustersAdd) page. The **Set up cluster** panel is selected.\n - In the **Components** section:\n - Under **Optional components** , select the **Jupyter** component.\n - Under **Component Gateway** , select **Enable component gateway** (see [Viewing and Accessing Component Gateway URLs](/dataproc/docs/concepts/accessing/dataproc-gateways#viewing_and_accessing_component_gateway_urls)).\n\ngcloud CLI\n\nTo create a Dataproc cluster that includes the Jupyter component,\nuse the\n[gcloud dataproc clusters create](/sdk/gcloud/reference/dataproc/clusters/create) \u003cvar translate=\"no\"\u003ecluster-name\u003c/var\u003e command with the `--optional-components` flag.\n\n**Latest default image version example**\n\nThe following example installs the Jupyter\ncomponent on a cluster that uses the latest default image version. \n\n```\ngcloud dataproc clusters create cluster-name \\\n --optional-components=JUPYTER \\\n --region=region \\\n --enable-component-gateway \\\n ... other flags\n```\n\nREST API\n\nThe Jupyter component\ncan be installed through the Dataproc API using\n[`SoftwareConfig.Component`](/dataproc/docs/reference/rest/v1/ClusterConfig#Component)\nas part of a\n[`clusters.create`](/dataproc/docs/reference/rest/v1/projects.regions.clusters/create)\nrequest.\n\n- Set the [EndpointConfig.enableHttpPortAccess](/dataproc/docs/reference/rest/v1/ClusterConfig#endpointconfig) property to `true` as part of the `clusters.create` request to enable connecting to the Jupyter notebook Web UI using the [Component Gateway](/dataproc/docs/concepts/accessing/dataproc-gateways).\n\nOpen the Jupyter and JupyterLab UIs\n\nClick the [Google Cloud console Component Gateway links](/dataproc/docs/concepts/accessing/dataproc-gateways#viewing_and_accessing_component_gateway_urls)\nto open in your local browser the Jupyter notebook or JupyterLab UI running on\nthe cluster master node.\n\n**Select \"GCS\" or \"Local Disk\" to create a new Jupyter Notebook in\neither location.**\n\nAttach GPUs to master and worker nodes\n\nYou can [add GPUs](https://cloud.google.com/dataproc/docs/concepts/compute/gpus)\nto your cluster's master and worker nodes when using a Jupyter notebook to:\n\n1. Preprocess data in Spark, then collect a [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) onto the master and run [TensorFlow](https://www.tensorflow.org/)\n2. Use Spark to orchestrate TensorFlow runs in parallel\n3. Run [Tensorflow-on-YARN](https://github.com/criteo/tf-yarn)\n4. Use with other machine learning scenarios that use GPUs"]]