Install and run a Jupyter notebook on a Dataproc cluster

Before you begin

If you haven't already done so, create a Google Cloud Platform project and a Cloud Storage bucket.

Set up your project

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. In the Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to the project selector page

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.

  4. Enable the Dataproc and Compute Engine APIs.

    Enable the APIs

  5. Install and initialize the Cloud SDK.

Create a Cloud Storage bucket in your project

  1. In the Cloud Console, go to the Cloud Storage Browser page.

    Go to the Cloud Storage Browser page

  2. Click Create bucket.
  3. In the Create bucket dialog, specify the following attributes:
  4. Click Create.
  5. Your notebooks will be stored in Cloud Storage under gs://bucket-name/notebooks/jupyter.

Create a cluster and install the Jupyter component

gcloud Command

  1. Run the following gcloud beta dataproc clusters create command locally in a terminal window or in Cloud Shell to:

    1. create your cluster and install the Jupyter and Anaconda components on the cluster's master node
    2. enable the Component Gateway

    Insert your values for cluster-name, bucket-name, and project-id in the command, below. For bucket-name, specify the name of the bucket you created in Create a Cloud Storage bucket in your project (only specify the name of the bucket). Your notebooks will be stored in Cloud Storage under gs://bucket-name/notebooks/jupyter.

    Linux/macOS

    gcloud beta dataproc clusters create cluster-name \
        --optional-components=ANACONDA,JUPYTER \
        --image-version=1.3 \
        --enable-component-gateway \
        --bucket bucket-name \
        --project project-id
    

    Windows

    gcloud dataproc clusters create cluster-name ^
        --optional-components=ANACONDA,JUPYTER ^
        --image-version=1.3 ^
        --enable-component-gateway ^
        --bucket bucket-name ^
        --project project-id
    

Console

  1. Go to the Dataproc Dataproc Clusters page in the Cloud Console.
  2. Click Create cluster to open the Create a cluster page.
  3. Enter the name of your cluster in the Name field.
  4. Select a region and zone for the cluster from the Region and Zone drop-down menus (see Available regions and zones). You can specify a distinct region and select "No preference" for the zone to let Dataproc pick a zone within the selected region for your cluster (see Dataproc Auto Zone Placement). You can instead select a global region, which is a special multi-region namespace that is capable of deploying instances into all Compute Engine zones globally (when selecting a global region, you must also select a zone).
  5. Check the Component gateway checkbox.
  6. Expand the Advanced options panel.

  7. Enter the name of the bucket you created in Create a Cloud Storage in your project in the Cloud Storage staging bucket field (only specify the name of the bucket). Your notebooks will be stored in Cloud Storage under gs://bucket-name/notebooks/jupyter.
  8. Click "Select component" to open the Optional components selection panel.
  9. Select the "Anaconda" and "Jupyter Notebook" components.
  10. You can use the provided defaults for the other options.

  11. Click Create to create the cluster and install the components and component gateway on the cluster's master node.

Open the Jupyter notebook in your local browser

  1. Navigate to the Dataproc Clusters form on Google Cloud Console, then select your cluster to open the Cluster details form. Click the Web Interfaces tab to display a list of Component Gateway links to the web interfaces of default and optional components installed on the cluster.

  2. Click the Jupyter link. The Jupyter notebook web UI opens in your local browser.

Var denne side nyttig? Giv os en anmeldelse af den:

Send feedback om...

Cloud Dataproc Documentation
Har du brug for hjælp? Besøg vores supportside.