目标
本教程介绍如何在新集群上安装 Dataproc Jupyter 组件,然后使用 Dataproc 组件网关从本地浏览器连接到在集群上运行的 Jupyter 笔记本界面。
费用
在本文档中,您将使用 Google Cloud 的以下收费组件:
准备工作
如果您尚未创建 Google Cloud 项目和 Cloud Storage 存储桶,请先创建一个。
设置项目
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Dataproc, Compute Engine, and Cloud Storage APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Dataproc, Compute Engine, and Cloud Storage APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
在您的项目中创建 Cloud Storage 存储分区,以存储您在本教程中创建的任何笔记本。
- In the Google Cloud console, go to the Cloud Storage Buckets page.
- Click Create bucket.
- On the Create a bucket page, enter your bucket information. To go to the next
step, click Continue.
- For Name your bucket, enter a name that meets the bucket naming requirements.
-
For Choose where to store your data, do the following:
- Select a Location type option.
- Select a Location option.
- For Choose a default storage class for your data, select a storage class.
- For Choose how to control access to objects, select an Access control option.
- For Advanced settings (optional), specify an encryption method, a retention policy, or bucket labels.
- Click Create. 您的笔记本将存储在 Cloud Storage 中的
gs://bucket-name/notebooks/jupyter
下。
创建集群并安装 Jupyter 组件
打开 Jupyter 和 JupyterLab 界面
点击 Google Cloud 控制台中的 Google Cloud 控制台组件网关链接,以打开集群主节点上运行的 Jupyter 笔记本或 JupyterLab 界面。
Jupyter 实例显示的顶级目录是一个虚拟目录,可让您查看 Cloud Storage 存储分区或本地文件系统的内容。您可以通过点击集群中 Cloud Storage 的 GCS 链接或集群中主节点的本地文件系统的本地磁盘来选择位置。
- 点击 GCS 链接。Jupyter 笔记本网页界面会显示存储在 Cloud Storage 存储分区中的笔记本,包括您在本教程中创建的所有笔记本。
清除数据
完成本教程后,您可以清理您创建的资源,让它们停止使用配额,以免产生费用。以下部分介绍如何删除或关闭这些资源。
删除项目
为了避免产生费用,最简单的方法是删除您为本教程创建的项目。
要删除项目,请执行以下操作:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
删除集群
- 如需删除您的集群,请输入以下命令:
gcloud dataproc clusters delete cluster-name \ --region=${REGION}
删除存储分区
- 如需删除您在准备工作第 2 步中创建的 Cloud Storage 存储分区,包括存储在存储分区中的笔记本,请运行以下命令:
gcloud storage rm gs://${BUCKET_NAME} --recursive