Before you begin
If you haven't already done so, create a Google Cloud Platform project and a Cloud Storage bucket.
Set up your project
-
Sign in to your Google Account.
If you don't already have one, sign up for a new account.
-
In the Cloud Console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.
- Enable the Dataproc and Compute Engine APIs.
- Install and initialize the Cloud SDK.
Create a Cloud Storage bucket in your project
- In the Cloud Console, go to the Cloud Storage Browser page.
- Click Create bucket.
- In the Create bucket dialog, specify the following attributes:
- A unique bucket name, subject to the bucket name requirements.
- A storage class.
- A location where bucket data will be stored.
- Click Create. Your notebooks will be stored in Cloud Storage under
gs://bucket-name/notebooks/jupyter
.Create a cluster and install the Jupyter component
gcloud Command
-
Run the following gcloud beta dataproc clusters create command locally in a terminal window or in Cloud Shell to:
- create your cluster and install the Jupyter and Anaconda components on the cluster's master node
- enable the Component Gateway
Insert your values for cluster-name, bucket-name, and project-id in the command, below. For bucket-name, specify the name of the bucket you created in Create a Cloud Storage bucket in your project (only specify the name of the bucket). Your notebooks will be stored in Cloud Storage under
gs://bucket-name/notebooks/jupyter
.Linux/macOS
gcloud beta dataproc clusters create cluster-name \ --optional-components=ANACONDA,JUPYTER \ --image-version=1.3 \ --enable-component-gateway \ --bucket bucket-name \ --project project-id
Windows
gcloud dataproc clusters create cluster-name ^ --optional-components=ANACONDA,JUPYTER ^ --image-version=1.3 ^ --enable-component-gateway ^ --bucket bucket-name ^ --project project-id
Console
- Go to the Dataproc Dataproc Clusters page in the Cloud Console.
- Click Create cluster to open the Create a cluster page.
- Enter the name of your cluster in the Name field.
- Select a region and zone for the cluster from the Region and Zone drop-down menus (see
Available regions and zones).
You can specify a distinct region and select "No preference" for the zone to
let Dataproc pick a zone within the selected region for your cluster (see
Dataproc Auto Zone Placement).
You can instead select a
global
region, which is a special multi-region namespace that is capable of deploying instances into all Compute Engine zones globally (when selecting a global region, you must also select a zone). - Check the Component gateway checkbox.
-
Expand the Advanced options panel.
- Enter the name of the bucket you created in
Create a Cloud Storage in your project in the
Cloud Storage staging bucket field (only specify the name of
the bucket). Your notebooks will be stored in Cloud Storage under
gs://bucket-name/notebooks/jupyter
. -
Click "Select component" to open the Optional components selection panel.
-
Select the "Anaconda" and "Jupyter Notebook" components.
-
You can use the provided defaults for the other options.
-
Click Create to create the cluster and install the components and component gateway on the cluster's master node.
Open the Jupyter notebook in your local browser
Navigate to the Dataproc Clusters form on Google Cloud Console, then select your cluster to open the Cluster details form. Click the Web Interfaces tab to display a list of Component Gateway links to the web interfaces of default and optional components installed on the cluster.
Click the Jupyter link. The Jupyter notebook web UI opens in your local browser.