Install and run a Jupyter notebook on a Cloud Dataproc cluster

Before you begin

If you haven't already done so, create a Google Cloud Platform project and a Cloud Storage bucket.

Set up your project

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Select or create a GCP project.

    Go to the Project selector page

  3. Make sure that billing is enabled for your Google Cloud Platform project.

    Learn how to enable billing

  4. Enable the Cloud Dataproc and Compute Engine APIs.

    Enable the APIs

  5. Install and initialize the Cloud SDK.

Create a Cloud Storage bucket in your project

  1. In the GCP Console, go to the Cloud Storage Browser page.

    Go to the Cloud Storage Browser page

  2. Click Create bucket.
  3. In the Create bucket dialog, specify the following attributes:
  4. Click Create.
  5. Your notebooks will be stored in Cloud Storage under gs://bucket-name/notebooks/jupyter.

Create a cluster and install the Jupyter component

gcloud Command

  1. Run the following gcloud beta dataproc clusters create command locally in a terminal window or in Cloud Shell to:

    1. create your cluster and install the Jupyter and Anaconda components on the cluster's master node
    2. enable the Component Gateway

    Insert your values for cluster-name, bucket-name, and project-id in the command, below. For bucket-name, specify the name of the bucket you created in Create a Cloud Storage bucket in your project (only specify the name of the bucket). Your notebooks will be stored in Cloud Storage under gs://bucket-name/notebooks/jupyter.

    Linux/macOS

    gcloud beta dataproc clusters create cluster-name \
        --optional-components=ANACONDA,JUPYTER \
        --image-version=1.3 \
        --enable-component-gateway \
        --bucket bucket-name \
        --project project-id
    

    Windows

    gcloud dataproc clusters create cluster-name ^
        --optional-components=ANACONDA,JUPYTER ^
        --image-version=1.3 ^
        --enable-component-gateway ^
        --bucket bucket-name ^
        --project project-id
    

Console

  1. Go to the Cloud Dataproc Cloud Dataproc Clusters page in the GCP Console.
  2. Click Create cluster to open the Create a cluster page.
  3. Enter the name of your cluster in the Name field.
  4. Select a region and zone for the cluster from the Region and Zone drop-down menus (see Available regions & zones). The default is the global region, which is a special multi-region namespace that is capable of deploying instances into all Compute Engine zones globally. You can also specify a distinct region and select "No preference" for the zone to let Cloud Dataproc pick a zone within the selected region for your cluster (see Cloud Dataproc Auto Zone Placement).
  5. Check the Component gateway checkbox.
  6. Expand the Advanced options panel.

  7. Enter the name of the bucket you created in Create a Cloud Storage in your project in the Cloud Storage staging bucket field (only specify the name of the bucket). Your notebooks will be stored in Cloud Storage under gs://bucket-name/notebooks/jupyter.
  8. Click "Select component" to open the Optional components selection panel.
  9. Select the "Anaconda" and "Jupyter Notebook" components.
  10. You can use the provided defaults for the other options.

  11. Click Create to create the cluster and install the components and component gateway on the cluster's master node.

Open the Jupyter notebook in your local browser

  1. Navigate to the Cloud Dataproc Clusters form on Google Cloud Platform Console, then select your cluster to open the Cluster details form. Click the Web Interfaces tab to display a list of Component Gateway links to the web interfaces of default and optional components installed on the cluster.

  2. Click the Jupyter link. The Jupyter notebook web UI opens in your local browser.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation
Need help? Visit our support page.