Cloud Dataproc Jupyter Component

You can install additional components when you create a Cloud Dataproc cluster using the Optional Components feature. This page describes the Jupyter component.

The Jupyter component is a Web-based notebook for interactive data analytics. The Jupyter Web UI is available on port 8123 on the cluster's first master node.

The notebook provides a Python kernel to run Spark code, and a PySpark kernel. By default, notebooks are saved in Cloud Storage in the Cloud Dataproc staging bucket, which is specified by the user or auto-created when the cluster is created. The location can be changed at cluster creation time via the dataproc:jupyter.notebook.gcs.dir property.

Install Jupyter and Anaconda

Install the component when you create a Cloud Dataproc cluster. Components can be added to clusters created with Cloud Dataproc version 1.3 and later. The Jupyter component requires the installation of the Anaconda component (as shown in the gcloud command-line tool example, below).

See Supported Cloud Dataproc versions for the component version included in each Cloud Dataproc image release.

gcloud command

To create a Cloud Dataproc cluster that includes the Jupyter component, use the gcloud dataproc clusters create cluster-name command with the --optional-components flag (using image version 1.3 or later).

gcloud dataproc clusters create cluster-name \
    --optional-components=ANACONDA,JUPYTER \
    --image-version=1.3 \
    --enable-component-gateway \
    ... other flags

REST API

The Jupyter and Anaconda components can be specified through the Cloud Dataproc API using SoftwareConfig.Component as part of a clusters.create request.

Console

  1. Enable the component.
    • In the GCP Console, open the Cloud Dataproc Create a cluster page. Click "Advanced options" at the bottom of the page to view the Optional Components section.

    • Click "Select component" to open the Optional components selection panel. Select "Anaconda" and "Jupyter Notebook" and other optional components to install on your cluster.

  2. Enable the Component Gateway (requires image version 1.3.29 or higher) to enable easy access to the Jupyter notebook and other component web interfaces from the Google Cloud Platform Console (see Viewing and Accessing Component Gateway URLs).
    • Check the Component Gateway checkbox on the Create a cluster form.

Open the Jupyter and JupyterLab UIs

See Viewing and Accessing Component Gateway URLs to click Component Gateway links on the GCP Console to open the Jupyter notebook and JupyterLab UIs running on the cluster's master node in your local browser.

Var denne side nyttig? Giv os en anmeldelse af den:

Send feedback om...

Cloud Dataproc Documentation
Har du brug for hjælp? Besøg vores supportside.