使用 Gemini 模型和 ML.GENERATE_TEXT 函数生成文本

本教程介绍了如何创建基于 gemini-2.0-flash 模型远程模型,然后介绍了如何将该模型与 ML.GENERATE_TEXT 函数搭配使用,以便从 bigquery-public-data.imdb.reviews 公共表中提取关键字并对电影评价执行情感分析:

所需的角色

如需运行本教程,您需要拥有以下 Identity and Access Management (IAM) 角色:

  • 创建和使用 BigQuery 数据集、连接和模型:BigQuery Admin (roles/bigquery.admin)。
  • 向连接的服务账号授予权限:Project IAM Admin (roles/resourcemanager.projectIamAdmin)。

这些预定义角色包含执行本文档中的任务所需的权限。如需查看所需的确切权限,请展开所需权限部分:

所需权限

  • 创建数据集:bigquery.datasets.create
  • 创建、委托和使用连接:bigquery.connections.*
  • 设置默认连接:bigquery.config.*
  • 设置服务账号权限:resourcemanager.projects.getIamPolicyresourcemanager.projects.setIamPolicy
  • 创建模型并运行推断:
    • bigquery.jobs.create
    • bigquery.models.create
    • bigquery.models.getData
    • bigquery.models.updateData
    • bigquery.models.updateMetadata

您也可以使用自定义角色或其他预定义角色来获取这些权限。

费用

在本文档中,您将使用 Google Cloud的以下收费组件:

  • BigQuery ML: You incur costs for the data that you process in BigQuery.
  • Vertex AI: You incur costs for calls to the Vertex AI service that's represented by the remote model.

您可使用价格计算器根据您的预计使用情况来估算费用。

新 Google Cloud 用户可能有资格申请免费试用

如需详细了解 BigQuery 价格,请参阅 BigQuery 文档中的 BigQuery 价格

如需详细了解 Vertex AI 价格,请参阅 Vertex AI 价格页面。

准备工作

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Google Cloud project.

  3. Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.

    Enable the APIs

创建数据集

创建 BigQuery 数据集以存储机器学习模型。

控制台

  1. 在 Google Cloud 控制台中,前往 BigQuery 页面。

    转到 BigQuery 页面

  2. 探索器窗格中,点击您的项目名称。

  3. 点击 查看操作 > 创建数据集

    “创建数据集”菜单选项。

  4. 创建数据集 页面上,执行以下操作:

    • 数据集 ID 部分,输入 bqml_tutorial

    • 位置类型部分,选择多区域,然后选择 US (multiple regions in United States)(美国[美国的多个区域])。

    • 保持其余默认设置不变,然后点击创建数据集

bq

如需创建新数据集,请使用带有 --location 标志的 bq mk 命令。 如需查看完整的潜在参数列表,请参阅 bq mk --dataset 命令参考文档。

  1. 创建一个名为 bqml_tutorial 的数据集,并将数据位置设置为 US,说明为 BigQuery ML tutorial dataset

    bq --location=US mk -d \
     --description "BigQuery ML tutorial dataset." \
     bqml_tutorial

    该命令使用的不是 --dataset 标志,而是 -d 快捷方式。如果省略 -d--dataset,该命令会默认创建一个数据集。

  2. 确认已创建数据集:

    bq ls

API

使用已定义的数据集资源调用 datasets.insert 方法。

{
  "datasetReference": {
     "datasetId": "bqml_tutorial"
  }
}

BigQuery DataFrame

在尝试此示例之前,请按照《BigQuery 快速入门:使用 BigQuery DataFrames》中的 BigQuery DataFrames 设置说明进行操作。如需了解详情,请参阅 BigQuery DataFrames 参考文档

如需向 BigQuery 进行身份验证,请设置应用默认凭证。如需了解详情,请参阅为本地开发环境设置 ADC

import google.cloud.bigquery

bqclient = google.cloud.bigquery.Client()
bqclient.create_dataset("bqml_tutorial", exists_ok=True)

创建远程模型

创建一个代表托管式 Vertex AI 模型的远程模型:

  1. 在 Google Cloud 控制台中,前往 BigQuery 页面。

    转到 BigQuery

  2. 在查询编辑器中,运行以下语句:

CREATE OR REPLACE MODEL `bqml_tutorial.gemini_model`
  REMOTE WITH CONNECTION DEFAULT
  OPTIONS (ENDPOINT = 'gemini-2.0-flash');

查询需要几秒钟才能完成,之后模型 gemini_model 会显示在探索器窗格的 bqml_tutorial 数据集中。由于查询使用 CREATE MODEL 语句来创建模型,因此没有查询结果。

执行关键字提取

使用远程模型和 ML.GENERATE_TEXT 函数对 IMDB 电影评价执行关键字提取:

  1. 在 Google Cloud 控制台中,前往 BigQuery 页面。

    转到 BigQuery

  2. 在查询编辑器中,输入以下语句,对五项电影评论执行关键字提取:

    SELECT
      ml_generate_text_result['candidates'][0]['content'] AS generated_text,
      * EXCEPT (ml_generate_text_result)
    FROM
      ML.GENERATE_TEXT(
        MODEL `bqml_tutorial.gemini_model`,
        (
          SELECT
            CONCAT('Extract the key words from the text below: ', review) AS prompt,
            *
          FROM
            `bigquery-public-data.imdb.reviews`
          LIMIT 5
        ),
        STRUCT(
          0.2 AS temperature,
          100 AS max_output_tokens));

    输出类似于以下内容,为清楚起见,省略了非生成的列:

    +----------------------------------------+-------------------------+----------------------------+-----+
    | generated_text                         | ml_generate_text_status | prompt                     | ... |
    +----------------------------------------+-------------------------+----------------------------+-----+
    | {"parts":[{"text":"## Key words:\n\n*  |                         | Extract the key words from |     |
    | **Negative sentiment:** \"terribly     |                         | the text below: I had to   |     |
    | bad acting\", \"dumb story\", \"not    |                         | see this on the British    |     |
    | even a kid would enjoy this\",         |                         | Airways plane. It was      |     |
    | \"something to switch off\"\n*         |                         | terribly bad acting and    |     |
    | **Context:** \"British Airways plane\" |                         | a dumb story. Not even     |     |
    | \n* **Genre:** \"movie\" (implied)...  |                         | a kid would enjoy this...  |     |
    +----------------------------------------+-------------------------+----------------------------+-----+
    | {"parts":[{"text":"## Key words:\n\n*  |                         | Extract the key words from |     |
    | **Movie:** The Real Howard Spitz\n*    |                         | the text below: This is    |     |
    | **Genre:** Family movie\n*             |                         | a family movie that was    |     |
    | **Broadcast:** ITV station, 1.00 am\n* |                         | broadcast on my local      |     |
    | **Director:** Vadim Jean\n*            |                         | ITV station at 1.00 am a   |     |
    | **Main character:** Howard Spitz,      |                         | couple of nights ago.      |     |
    | a children's author who hates...       |                         | This might be a strange... |     |
    +----------------------------------------+-------------------------+----------------------------+-----+
    

    结果包括以下列:

    • generated_text:生成的文本。
    • ml_generate_text_status:相应行的 API 响应状态。如果操作成功,则此值为空。
    • prompt:用于情感分析的提示。
    • bigquery-public-data.imdb.reviews 表中的所有列。
  3. 可选:与上一步中手动解析函数返回的 JSON 不同,您可以使用 flatten_json_output 参数在单独的列中返回生成的文本和安全属性。

    在查询编辑器中,运行以下语句:

    SELECT
      *
    FROM
      ML.GENERATE_TEXT(
        MODEL `bqml_tutorial.gemini_model`,
        (
          SELECT
            CONCAT('Extract the key words from the text below: ', review) AS prompt,
            *
          FROM
            `bigquery-public-data.imdb.reviews`
          LIMIT 5
        ),
        STRUCT(
          0.2 AS temperature,
          100 AS max_output_tokens,
          TRUE AS flatten_json_output));

    输出类似于以下内容,为清楚起见,省略了非生成的列:

    +----------------------------------------+----------------------------------------------+-------------------------+----------------------------+-----+
    | ml_generate_text_llm_result            | ml_generate_text_rai_result                  | ml_generate_text_status | prompt                     | ... |
    +----------------------------------------+----------------------------------------------+-------------------------+----------------------------+-----+
    | ## Keywords:                           |                                              |                         | Extract the key words from |     |
    |                                        |                                              |                         | the text below: I had to   |     |
    | * **Negative sentiment:**              |                                              |                         | see this on the British    |     |
    | "terribly bad acting", "dumb           |                                              |                         | Airways plane. It was      |     |
    | story", "not even a kid would          |                                              |                         | terribly bad acting and    |     |
    | enjoy this", "switch off"              |                                              |                         | a dumb story. Not even     |     |
    | * **Context:** "British                |                                              |                         | a kid would enjoy this...  |     |
    +----------------------------------------+----------------------------------------------+-------------------------+----------------------------+-----+
    | ## Key words:                          |                                              |                         | Extract the key words from |     |
    |                                        |                                              |                         | the text below: This is    |     |
    | * **Movie:** The Real Howard Spitz     |                                              |                         | a family movie that was    |     |
    | * **Genre:** Family movie              |                                              |                         | broadcast on my local      |     |
    | * **Broadcast:** ITV, 1.00             |                                              |                         | ITV station at 1.00 am a   |     |
    | am                                     |                                              |                         | couple of nights ago.      |     |
    | - ...                                  |                                              |                         | This might be a strange... |     |
    +----------------------------------------+----------------------------------------------+-------------------------+----------------------------+-----+
    

    结果包括以下列:

    • ml_generate_text_llm_result:生成的文本。
    • ml_generate_text_rai_result:安全属性,以及关于内容是否因某个屏蔽类别而被屏蔽的信息。如需详细了解安全属性,请参阅配置安全过滤器
    • ml_generate_text_status:相应行的 API 响应状态。如果操作成功,则此值为空。
    • prompt:用于提取关键字的提示。
    • bigquery-public-data.imdb.reviews 表中的所有列。

执行情感分析

使用远程模型和 ML.GENERATE_TEXT 函数对 IMDB 电影评论进行情感分析:

  1. 在 Google Cloud 控制台中,前往 BigQuery 页面。

    转到 BigQuery

  2. 在查询编辑器中,运行以下语句,对五项电影评价执行情感分析:

    SELECT
      ml_generate_text_result['candidates'][0]['content'] AS generated_text,
      * EXCEPT (ml_generate_text_result)
    FROM
      ML.GENERATE_TEXT(
        MODEL `bqml_tutorial.gemini_model`,
        (
          SELECT
            CONCAT(
              'perform sentiment analysis on the following text, return one the following categories: positive, negative: ',
              review) AS prompt,
            *
          FROM
            `bigquery-public-data.imdb.reviews`
          LIMIT 5
        ),
        STRUCT(
          0.2 AS temperature,
          100 AS max_output_tokens));

    输出类似于以下内容,为清楚起见,省略了非生成的列:

    +--------------------------------------------+-------------------------+----------------------------+-----+
    | generated_text                             | ml_generate_text_status | prompt                     | ... |
    +--------------------------------------------+-------------------------+----------------------------+-----+
    | {"parts":[{"text":"## Sentiment Analysis:  |                         | perform sentiment analysis |     |
    | Negative \n\nThis text expresses a         |                         | on the following text,     |     |
    | strongly negative sentiment towards the    |                         | return one the following   |     |
    | movie. Here's why:\n\n* **Negative         |                         | negative: I  had to see    |     |
    | like \"terribly,\" \"dumb,\" and           |                         | this on the British        |     |
    | \"not even\" to describe the acting...     |                         | Airways plane. It was...   |     |
    +--------------------------------------------+-------------------------+----------------------------+-----+
    | {"parts":[{"text":"## Sentiment Analysis:  |                         | perform sentiment analysis |     |
    | Negative \n\nThis review expresses a       |                         | on the following text,     |     |
    | predominantly negative sentiment towards   |                         | return one the following   |     |
    | the movie \"The Real Howard Spitz.\"       |                         | categories: positive,      |     |
    | Here's why:\n\n* **Criticism of the film's |                         | negative: This is a family |     |
    | premise:** The reviewer finds it strange   |                         | movie that was broadcast   |     |
    | that a film about a children's author...   |                         | on my local ITV station... |     |
    +--------------------------------------------+-------------------------+----------------------------+-----+
    

    结果包含执行关键字提取中记录的列。

清理

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.