You can install additional components when you create a Dataproc cluster using the Optional Components feature. This page describes the Jupyter component.
The Jupyter notebook provides a Python kernel to run
Spark code, and a
PySpark kernel. By default, notebooks are
saved in Cloud Storage
Dataproc staging bucket, which is specified by the user or
when the cluster is created. The location can be changed at cluster creation time via the
Install Jupyter and Anaconda
Install the component when you create a Dataproc cluster. Components can be added to clusters created with Dataproc version 1.3 and later. The Jupyter component requires the installation of the Anaconda component (as shown in the gcloud command-line tool example, below).
See Supported Dataproc versions for the component version included in each Dataproc image release.
To create a Dataproc cluster that includes the Jupyter component,
gcloud dataproc clusters create cluster-name
command with the
gcloud dataproc clusters create cluster-name \ --optional-components=ANACONDA,JUPYTER \ --region=region \ --enable-component-gateway \ ... other flags
REST APIThe Jupyter and Anaconda components can be specified through the Dataproc API using SoftwareConfig.Component as part of a clusters.create request.
- Enable the component.
- In the Cloud Console, open the Dataproc Create a cluster page. Click "Advanced options" at the bottom of the page to view the Optional Components section.
- Click "Select component" to open the Optional components selection panel. Select "Anaconda" and "Jupyter Notebook" and other optional components to install on your cluster.
- Enable the Component Gateway (requires image version 1.3.29 or higher) to
enable easy access to the Jupyter notebook and other
component web interfaces from the Google Cloud Console (see
Viewing and Accessing Component Gateway URLs).
- Check the Component Gateway checkbox on the Create a cluster form.
Open the Jupyter and JupyterLab UIs
Click the Cloud Console Component Gateway links to open in your local browser the Jupyter notebook and JupyterLab UIs running on your cluster's master node.
Attaching GPUs to Master and/or Worker Nodes
You can add GPUs) to your cluster's master and worker nodes when using a Jupyter notebook to: