使用 Gemini 模型和 ML.GENERATE_TEXT 函数生成文本

本教程介绍了如何创建基于 gemini-2.0-flash 模型的远程模型，然后介绍了如何将该模型与 ML.GENERATE_TEXT 函数搭配使用，以便从 bigquery-public-data.imdb.reviews 公共表中提取关键字并对电影评价执行情感分析：

所需的角色

如需运行本教程，您需要拥有以下 Identity and Access Management (IAM) 角色：

创建和使用 BigQuery 数据集、连接和模型：BigQuery Admin (roles/bigquery.admin)。
向连接的服务账号授予权限：Project IAM Admin (roles/resourcemanager.projectIamAdmin)。

这些预定义角色包含执行本文档中的任务所需的权限。如需查看所需的确切权限，请展开所需权限部分：

所需权限

创建数据集：bigquery.datasets.create
创建、委托和使用连接：bigquery.connections.*
设置默认连接：bigquery.config.*
设置服务账号权限：resourcemanager.projects.getIamPolicy 和 resourcemanager.projects.setIamPolicy
创建模型并运行推断：
- bigquery.jobs.create
- bigquery.models.create
- bigquery.models.getData
- bigquery.models.updateData
- bigquery.models.updateMetadata

您也可以使用自定义角色或其他预定义角色来获取这些权限。

费用

在本文档中，您将使用 Google Cloud的以下收费组件：

BigQuery ML: You incur costs for the data that you process in BigQuery.
Vertex AI: You incur costs for calls to the Vertex AI service that's represented by the remote model.

如需根据您的预计使用量来估算费用，请使用价格计算器。

新 Google Cloud 用户可能有资格申请免费试用。

如需详细了解 BigQuery 价格，请参阅 BigQuery 文档中的 BigQuery 价格。

如需详细了解 Vertex AI 价格，请参阅 Vertex AI 价格页面。

准备工作

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector
Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.
Enable the APIs

创建数据集

创建 BigQuery 数据集以存储机器学习模型。

控制台

在 Google Cloud 控制台中，前往 BigQuery 页面。

转到 BigQuery 页面
在探索器窗格中，点击您的项目名称。
点击 查看操作 > 创建数据集
在 创建数据集 页面上，执行以下操作：
- 在数据集 ID 部分，输入 bqml_tutorial。
- 在位置类型部分，选择多区域，然后选择 US (multiple regions in United States)（美国[美国的多个区域]）。
- 保持其余默认设置不变，然后点击创建数据集。

bq

如需创建新数据集，请使用带有 --location 标志的 bq mk 命令。如需查看完整的潜在参数列表，请参阅 bq mk --dataset 命令参考文档。

创建一个名为 bqml_tutorial 的数据集，并将数据位置设置为 US，说明为 BigQuery ML tutorial dataset：
```
bq --location=US mk -d \
 --description "BigQuery ML tutorial dataset." \
 bqml_tutorial
```
该命令使用的不是 --dataset 标志，而是 -d 快捷方式。如果省略 -d 和 --dataset，该命令会默认创建一个数据集。
确认已创建数据集：
```
bq ls
```

API

使用已定义的数据集资源调用 datasets.insert 方法。

{
  "datasetReference": {
     "datasetId": "bqml_tutorial"
  }
}

BigQuery DataFrame

在尝试此示例之前，请按照《BigQuery 快速入门：使用 BigQuery DataFrames》中的 BigQuery DataFrames 设置说明进行操作。如需了解详情，请参阅 BigQuery DataFrames 参考文档。

如需向 BigQuery 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为本地开发环境设置 ADC。

import google.cloud.bigquery

bqclient = google.cloud.bigquery.Client()
bqclient.create_dataset("bqml_tutorial", exists_ok=True)

创建连接

创建 Cloud 资源连接并获取连接的服务账号。在与上一步中创建的数据集相同的位置创建连接。

如果您已配置默认连接，或者您具有 BigQuery Admin 角色，则可以跳过此步骤。

创建一个 Cloud 资源连接供远程模型使用，并获取连接的服务账号。在上一步中创建的数据集所在的位置创建连接。

从下列选项中选择一项：

控制台

前往 BigQuery 页面。

转到 BigQuery
在探索器窗格中，点击 添加数据：

系统随即会打开添加数据对话框。
在过滤条件窗格中的数据源类型部分，选择企业应用。

或者，在搜索数据源字段中，您可以输入 Vertex AI。
在精选数据源部分中，点击 Vertex AI。
点击 Vertex AI 模型：BigQuery 联合解决方案卡片。
在连接类型列表中，选择 Vertex AI 远程模型、远程函数和 BigLake（Cloud 资源）。
在连接 ID 字段中，输入连接的名称。
点击创建连接。
点击转到连接。
在连接信息窗格中，复制服务账号 ID 以在后续步骤中使用。

bq

在命令行环境中，创建连接：
```
bq mk --connection --location=REGION --project_id=PROJECT_ID \
    --connection_type=CLOUD_RESOURCE CONNECTION_ID
```
--project_id 参数会替换默认项目。

请替换以下内容：
- REGION：您的连接区域
- PROJECT_ID：您的 Google Cloud 项目 ID
- CONNECTION_ID：您的连接的 ID
当您创建连接资源时，BigQuery 会创建一个唯一的系统服务账号，并将其与该连接相关联。

问题排查：如果您收到以下连接错误，请更新 Google Cloud SDK：
```
Flags parsing error: flag --connection_type=CLOUD_RESOURCE: value should be one of...
```

检索并复制服务账号 ID 以在后续步骤中使用：

bq show --connection PROJECT_ID.REGION.CONNECTION_ID

输出类似于以下内容：

name                          properties
1234.REGION.CONNECTION_ID     {"serviceAccountId": "connection-1234-9u56h9@gcp-sa-bigquery-condel.iam.gserviceaccount.com"}

Terraform

使用 google_bigquery_connection 资源。

如需向 BigQuery 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为客户端库设置身份验证。

以下示例在 US 区域中创建一个名为 my_cloud_resource_connection 的 Cloud 资源连接：


# This queries the provider for project information.
data "google_project" "default" {}

# This creates a cloud resource connection in the US region named my_cloud_resource_connection.
# Note: The cloud resource nested object has only one output field - serviceAccountId.
resource "google_bigquery_connection" "default" {
  connection_id = "my_cloud_resource_connection"
  project       = data.google_project.default.project_id
  location      = "US"
  cloud_resource {}
}

如需在 Google Cloud 项目中应用 Terraform 配置，请完成以下部分中的步骤。

准备 Cloud Shell

启动 Cloud Shell。
设置要应用 Terraform 配置的默认 Google Cloud 项目。

您只需为每个项目运行一次以下命令，即可在任何目录中运行它。
```
export GOOGLE_CLOUD_PROJECT=PROJECT_ID
```
如果您在 Terraform 配置文件中设置显式值，则环境变量会被替换。

准备目录

每个 Terraform 配置文件都必须有自己的目录（也称为“根模块”）。

在 Cloud Shell 中，创建一个目录，并在该目录中创建一个新文件。文件名必须具有 .tf 扩展名，例如 main.tf。在本教程中，该文件称为 main.tf。
```
mkdir DIRECTORY && cd DIRECTORY && touch main.tf
```
如果您按照教程进行操作，可以在每个部分或步骤中复制示例代码。

将示例代码复制到新创建的 main.tf 中。

（可选）从 GitHub 中复制代码。如果端到端解决方案包含 Terraform 代码段，则建议这样做。
查看和修改要应用到您的环境的示例参数。
保存更改。
初始化 Terraform。您只需为每个目录执行一次此操作。
```
terraform init
```
（可选）如需使用最新的 Google 提供程序版本，请添加 -upgrade 选项：
```
terraform init -upgrade
```

应用更改

查看配置并验证 Terraform 将创建或更新的资源是否符合您的预期：
```
terraform plan
```
根据需要更正配置。
通过运行以下命令并在提示符处输入 yes 来应用 Terraform 配置：
```
terraform apply
```
等待 Terraform 显示“应用完成！”消息。
打开您的 Google Cloud 项目以查看结果。在 Google Cloud 控制台的界面中找到资源，以确保 Terraform 已创建或更新它们。

向连接的服务账号授予权限

向连接的服务账号授予 Vertex AI User 角色。您必须在您在准备工作部分创建或选择的项目中授予此角色。在其他项目中授予此角色会导致错误 bqcx-1234567890-xxxx@gcp-sa-bigquery-condel.iam.gserviceaccount.com does not have the permission to access resource。

如需授予该角色，请按以下步骤操作：

前往 IAM 和管理页面。

转到“IAM 和管理”
点击 授予访问权限。
在新的主账号字段中，输入您之前复制的服务账号 ID。
在选择角色字段中，选择 Vertex AI，然后选择 Vertex AI User 角色。
点击保存。

创建远程模型

创建一个代表托管式 Vertex AI 模型的远程模型：

在 Google Cloud 控制台中，前往 BigQuery 页面。

转到 BigQuery
在查询编辑器中，运行以下语句：

CREATE OR REPLACE MODEL `bqml_tutorial.gemini_model`
  REMOTE WITH CONNECTION `LOCATION.CONNECTION_ID`
  OPTIONS (ENDPOINT = 'gemini-2.0-flash');

替换以下内容：

LOCATION：连接位置
CONNECTION_ID：BigQuery 连接的 ID
当您在 Google Cloud 控制台中查看连接详细信息时，这是连接 ID 中显示的完全限定连接 ID 的最后一部分中的值，例如 projects/myproject/locations/connection_location/connections/myconnection

查询需要几秒钟才能完成，之后模型 gemini_model 会显示在探索器窗格的 bqml_tutorial 数据集中。由于查询使用 CREATE MODEL 语句来创建模型，因此没有查询结果。

执行关键字提取

使用远程模型和 ML.GENERATE_TEXT 函数对 IMDB 电影评价执行关键字提取：

在 Google Cloud 控制台中，前往 BigQuery 页面。

转到 BigQuery

在查询编辑器中，输入以下语句，对五项电影评论执行关键字提取：

SELECT
  ml_generate_text_result['candidates'][0]['content'] AS generated_text,
  * EXCEPT (ml_generate_text_result)
FROM
  ML.GENERATE_TEXT(
    MODEL `bqml_tutorial.gemini_model`,
    (
      SELECT
        CONCAT('Extract the key words from the text below: ', review) AS prompt,
        *
      FROM
        `bigquery-public-data.imdb.reviews`
      LIMIT 5
    ),
    STRUCT(
      0.2 AS temperature,
      100 AS max_output_tokens));

输出类似于以下内容，为清楚起见，省略了非生成的列：

+----------------------------------------+-------------------------+----------------------------+-----+
| generated_text                         | ml_generate_text_status | prompt                     | ... |
+----------------------------------------+-------------------------+----------------------------+-----+
| {"parts":[{"text":"## Key words:\n\n*  |                         | Extract the key words from |     |
| **Negative sentiment:** \"terribly     |                         | the text below: I had to   |     |
| bad acting\", \"dumb story\", \"not    |                         | see this on the British    |     |
| even a kid would enjoy this\",         |                         | Airways plane. It was      |     |
| \"something to switch off\"\n*         |                         | terribly bad acting and    |     |
| **Context:** \"British Airways plane\" |                         | a dumb story. Not even     |     |
| \n* **Genre:** \"movie\" (implied)...  |                         | a kid would enjoy this...  |     |
+----------------------------------------+-------------------------+----------------------------+-----+
| {"parts":[{"text":"## Key words:\n\n*  |                         | Extract the key words from |     |
| **Movie:** The Real Howard Spitz\n*    |                         | the text below: This is    |     |
| **Genre:** Family movie\n*             |                         | a family movie that was    |     |
| **Broadcast:** ITV station, 1.00 am\n* |                         | broadcast on my local      |     |
| **Director:** Vadim Jean\n*            |                         | ITV station at 1.00 am a   |     |
| **Main character:** Howard Spitz,      |                         | couple of nights ago.      |     |
| a children's author who hates...       |                         | This might be a strange... |     |
+----------------------------------------+-------------------------+----------------------------+-----+

结果包括以下列：

generated_text：生成的文本。
ml_generate_text_status：相应行的 API 响应状态。如果操作成功，则此值为空。
prompt：用于情感分析的提示。
bigquery-public-data.imdb.reviews 表中的所有列。

可选：与上一步中手动解析函数返回的 JSON 不同，您可以使用 flatten_json_output 参数在单独的列中返回生成的文本和安全属性。

在查询编辑器中，运行以下语句：

SELECT
  *
FROM
  ML.GENERATE_TEXT(
    MODEL `bqml_tutorial.gemini_model`,
    (
      SELECT
        CONCAT('Extract the key words from the text below: ', review) AS prompt,
        *
      FROM
        `bigquery-public-data.imdb.reviews`
      LIMIT 5
    ),
    STRUCT(
      0.2 AS temperature,
      100 AS max_output_tokens,
      TRUE AS flatten_json_output));

输出类似于以下内容，为清楚起见，省略了非生成的列：

+----------------------------------------+----------------------------------------------+-------------------------+----------------------------+-----+
| ml_generate_text_llm_result            | ml_generate_text_rai_result                  | ml_generate_text_status | prompt                     | ... |
+----------------------------------------+----------------------------------------------+-------------------------+----------------------------+-----+
| ## Keywords:                           |                                              |                         | Extract the key words from |     |
|                                        |                                              |                         | the text below: I had to   |     |
| * **Negative sentiment:**              |                                              |                         | see this on the British    |     |
| "terribly bad acting", "dumb           |                                              |                         | Airways plane. It was      |     |
| story", "not even a kid would          |                                              |                         | terribly bad acting and    |     |
| enjoy this", "switch off"              |                                              |                         | a dumb story. Not even     |     |
| * **Context:** "British                |                                              |                         | a kid would enjoy this...  |     |
+----------------------------------------+----------------------------------------------+-------------------------+----------------------------+-----+
| ## Key words:                          |                                              |                         | Extract the key words from |     |
|                                        |                                              |                         | the text below: This is    |     |
| * **Movie:** The Real Howard Spitz     |                                              |                         | a family movie that was    |     |
| * **Genre:** Family movie              |                                              |                         | broadcast on my local      |     |
| * **Broadcast:** ITV, 1.00             |                                              |                         | ITV station at 1.00 am a   |     |
| am                                     |                                              |                         | couple of nights ago.      |     |
| - ...                                  |                                              |                         | This might be a strange... |     |
+----------------------------------------+----------------------------------------------+-------------------------+----------------------------+-----+

结果包括以下列：

ml_generate_text_llm_result：生成的文本。
ml_generate_text_rai_result：安全属性，以及关于内容是否因某个屏蔽类别而被屏蔽的信息。如需详细了解安全属性，请参阅配置安全过滤器。
ml_generate_text_status：相应行的 API 响应状态。如果操作成功，则此值为空。
prompt：用于提取关键字的提示。
bigquery-public-data.imdb.reviews 表中的所有列。

执行情感分析

使用远程模型和 ML.GENERATE_TEXT 函数对 IMDB 电影评论执行情感分析：

在 Google Cloud 控制台中，前往 BigQuery 页面。

转到 BigQuery

在查询编辑器中，运行以下语句，对五项电影评价执行情感分析：

SELECT
  ml_generate_text_result['candidates'][0]['content'] AS generated_text,
  * EXCEPT (ml_generate_text_result)
FROM
  ML.GENERATE_TEXT(
    MODEL `bqml_tutorial.gemini_model`,
    (
      SELECT
        CONCAT(
          'perform sentiment analysis on the following text, return one the following categories: positive, negative: ',
          review) AS prompt,
        *
      FROM
        `bigquery-public-data.imdb.reviews`
      LIMIT 5
    ),
    STRUCT(
      0.2 AS temperature,
      100 AS max_output_tokens));

输出类似于以下内容，为清楚起见，省略了非生成的列：

+--------------------------------------------+-------------------------+----------------------------+-----+
| generated_text                             | ml_generate_text_status | prompt                     | ... |
+--------------------------------------------+-------------------------+----------------------------+-----+
| {"parts":[{"text":"## Sentiment Analysis:  |                         | perform sentiment analysis |     |
| Negative \n\nThis text expresses a         |                         | on the following text,     |     |
| strongly negative sentiment towards the    |                         | return one the following   |     |
| movie. Here's why:\n\n* **Negative         |                         | negative: I  had to see    |     |
| like \"terribly,\" \"dumb,\" and           |                         | this on the British        |     |
| \"not even\" to describe the acting...     |                         | Airways plane. It was...   |     |
+--------------------------------------------+-------------------------+----------------------------+-----+
| {"parts":[{"text":"## Sentiment Analysis:  |                         | perform sentiment analysis |     |
| Negative \n\nThis review expresses a       |                         | on the following text,     |     |
| predominantly negative sentiment towards   |                         | return one the following   |     |
| the movie \"The Real Howard Spitz.\"       |                         | categories: positive,      |     |
| Here's why:\n\n* **Criticism of the film's |                         | negative: This is a family |     |
| premise:** The reviewer finds it strange   |                         | movie that was broadcast   |     |
| that a film about a children's author...   |                         | on my local ITV station... |     |
+--------------------------------------------+-------------------------+----------------------------+-----+

结果包含执行关键字提取中记录的列。

清理

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.