此页面由 Cloud Translation API 翻译。

将 Dataplex Explore 迁移到 BigQuery Studio

Dataplex 将停止支持 Explore。本文档简要介绍了将 Dataplex Explore 资源迁移到 BigQuery Studio 的步骤。您可以将 Spark SQL 和 JupyterLab 笔记本内容迁移到 BigQuery Studio，这是一个统一的数据探索平台。

已弃用的功能

Spark SQL 工作台
Notebook IDE
所有内容资源
所有环境
所有会话
用于安排在“探索”中创建的笔记本或 Spark SQL 脚本的任务功能

如有问题或需要澄清，请发送电子邮件至 dataplex-explore-support@google.com 与“探索”团队联系。

准备工作

Enable the BigQuery and BigQuery Studio APIs.
Enable the APIs

笔记本内容

如果您在“探索”中使用的是通过 JupyterLab 实例以无服务器方式运行的笔记本，那么迁移后，您将在 BigQuery Studio 中获得相同的体验。

BigQuery Studio 提供由 Colab Enterprise 提供支持的笔记本界面，与 JupyterLab 笔记本相比，具有多项优势。您仍然可以在 BigQuery Studio 中以无服务器方式编写、保存和运行笔记本。此外，您还可以获享 Colab Enterprise 集成云环境的诸多优势，包括强大的 GPU 和 TPU、通过 Google 云端硬盘进行实时协作、共享和访问控制、自动保存、预安装的库、免费使用（配有配额）、内置的 widget 和扩展程序，以及与 BigQuery 和 Cloud Storage 等其他 Google 服务的集成。

Spark SQL 内容

Dataplex Discovery 会在 BigQuery 和 Dataproc Metastore 中注册发现的表。根据表的注册位置，使用以下任一迁移选项。

表会同时注册到 Dataproc Metastore 和 BigQuery 中：如果 Spark SQL 脚本通过 Dataproc Metastore 与 Dataplex 发现的表进行交互，则您可以直接从 BigQuery 查询这些表。
表仅在 Dataproc Metastore 中注册：如果 Spark SQL 脚本与 BigQuery 中不存在的表进行交互，则您需要设置 BigQuery Studio 与 Dataproc Metastore 的集成。Dataproc Metastore 提供两种类型的端点：Thrift 和 gRPC。如需详细了解如何查找端点协议，请参阅查找端点 URI 值。然后，按照以下部分中的步骤设置 BigQuery Studio 集成。

连接到基于 Thrift 的 Dataproc Metastore

基于 Thrift 的端点以 thrift:// 开头。如需连接到基于 Thrift 的 Dataproc Metastore，请在 SparkSession 配置中传递 Thrift 端点 URI，如以下示例所示：

from pyspark.sql import SparkSession

spark = (
    SparkSession.builder.appName("Dataproc Metastore Connection")
    .config(
        "spark.hadoop.hive.metastore.uris",
        "thrift://IP_ADDRESS:9083",
    )
    .enableHiveSupport()
    .getOrCreate()
)

连接到基于 gRPC 的端点

基于 gRPC 的端点以 https:// 开头。Spark 无法直接连接到非 Thrift 端点。而是必须运行一个代理服务，将请求从 Thrift 转换为 gRPC。如需连接到基于 gRPC 的 Dataproc Metastore 服务，请在 BigQuery Studio 笔记本中按以下步骤操作：

在笔记本中运行以下命令，在笔记本运行时下载最新版本的 Hive Metastore (HMS) 代理 JAR 文件：

# Download the latest HMS Proxy jar file.
!gcloud storage cp gs://metastore-init-actions/metastore-grpc-proxy/hms-proxy-3.1.2-v0.0.46.jar

启动 HMS 代理。

%%bash
# Metastore store URI including the port number but without "https://" prefix.
METASTORE_URI=METASTORE_URI
# HMS Proxy JAR path.
JAR_PATH=JAR_PATH
# DPMS Supported Hive Version.
HIVE_VERSION=3.1.2

# Start the HMS Proxy.
java -jar ${JAR_PATH} --conf proxy.mode=thrift proxy.uri=${METASTORE_URI} thrift.listening.port=9083 hive.version=${HIVE_VERSION} google.credentials.applicationdefault.enabled=true proxy.grpc.ssl.upstream.enabled=true > /tmp/hms.logs 2>&1 &

将 Spark 会话连接到本地 HMS 代理。

from pyspark.sql import SparkSession

spark = (
  SparkSession.builder.appName("Dataproc Metastore Connection")
  .config(
      "spark.hadoop.hive.metastore.uris",
      "thrift://localhost:9083",
  )
  .enableHiveSupport()
  .getOrCreate()
)

会话资源

会话资源是指特定于用户的有效会话。不支持迁移会话资源。

环境资源

环境可为您的 Spark SQL 查询和笔记本提供无服务器计算资源，以便在数据湖中运行。由于 BigQuery Studio 提供用于运行 SQL 查询和笔记本的无服务器环境，因此不支持迁移环境资源。

安排包含内容资源的任务

您可以在 BigQuery Studio 中安排查询。