使用自然语言问题生成 SQL 查询

本页面介绍了如何使用 AlloyDB AI 自然语言设置、配置和生成 SQL 语句。借助自然语言，您可以使用自然语言创建面向用户的生成式 AI 应用来查询数据库。

如需启用 alloydb_ai_nl 扩展程序（即 AlloyDB for PostgreSQL 自然语言支持 API），请执行以下概要步骤：

安装 alloydb_ai_nl 扩展程序。
为应用定义自然语言配置。
注册架构。
添加上下文。
添加查询模板。
定义概念类型并创建值索引。
使用自然语言界面生成 SQL 语句。

准备工作

请求访问 AlloyDB AI 自然语言，并等到收到启用确认后再按照此页面上的说明操作。
了解如何连接到 AlloyDB 数据库并运行 PostgreSQL 命令。如需了解详情，请参阅连接概览。
使用最终用户想要访问的数据和架构填充数据库。

创建集群并启用 Vertex AI 集成

创建 AlloyDB 集群和实例。您可以使用 AlloyDB 实例来创建应用数据库和架构。
启用 Vertex AI 集成。如需了解详情，请参阅与 Vertex AI 集成。

所需的角色

如需安装 alloydb_ai_nl 扩展程序并向其他用户授予访问权限，您必须在所使用的 Google Cloud 项目中具有以下 Identity and Access Management (IAM) 角色：

roles/alloydb.admin：AlloyDB Admin 预定义 IAM 角色

如需了解详情，请参阅使用内置身份验证管理 PostgreSQL 用户。

准备环境

为了准备生成自然语言查询，您必须安装所需的扩展程序、创建配置并注册架构。

安装 `alloydb_nl_ai` 扩展程序

alloydb_ai_nl 扩展程序使用 google_ml_integration 扩展程序，该扩展程序与大语言模型 (LLM)（包括 Vertex AI 上的 Gemini 模型）互动。

如需安装 alloydb_ai_nl 扩展程序，请连接到数据库并运行以下命令：

CREATE EXTENSION alloydb_ai_nl cascade;

创建自然语言配置并注册架构

AlloyDB AI 自然语言使用 nl_config 将应用关联到特定架构、查询模板和模型端点。 nl_config 是一种配置，用于将应用关联到架构、模板和其他上下文。大型应用还可以针对应用的不同部分使用不同的配置，只要您在从应用的该部分发送问题时指定正确的配置即可。您可以注册整个架构，也可以注册特定的架构对象，例如表、视图和列。

如需创建自然语言配置，请使用以下示例：

SELECT
  alloydb_ai_nl.g_create_configuration(
    'my_app_config'        -- configuration_id
  );

gemini-2.0-flash:generateContent 是模型端点。

如需为指定的配置注册架构，请使用以下示例：

SELECT
  alloydb_ai_nl.g_manage_configuration(
    operation => 'register_schema',
    configuration_id_in => 'my_app_config',
    schema_names_in => '{my_schema}'
  );

添加上下文

上下文包含您可以用来回答最终用户问题的任何类型的信息。上下文包括架构结构和关系、列的摘要和说明、列值及其语义，以及特定于应用或网域的业务逻辑的规则或语句。

为应用特定的规则添加常规上下文

常规上下文项包括应用特定的规则、业务逻辑语句，或未与特定架构对象关联的任何应用和网域特定的术语。

如需为应用特定的规则和应用或网域特定的术语添加常规上下文，请按照以下步骤操作：

如需为指定的配置添加常规上下文项，请使用以下示例：

SELECT
  alloydb_ai_nl.g_manage_configuration(
    'add_general_context',
    'my_app_config',
    general_context_in => '{"If the user asks for a good seat, assume that means a window or aisle seat."}'
  );

上述语句可帮助 AlloyDB AI 自然语言为用户的自然语言问题提供更优质的响应。

如需查看指定配置的常规上下文，请运行以下语句：
```
SELECT alloydb_ai_nl.list_general_context(nl_config TEXT);
```

生成并查看架构上下文

架构上下文描述了架构对象，包括表、视图、物化视图和列。此上下文存储为每个架构对象的 COMMENT。

如需为架构对象生成上下文，请调用以下 API。为获得最佳效果，请确保数据库表包含代表性的数据。

-- For all schema objects (tables, views, materialized views and columns)
-- within the scope of a provided nl_config.
SELECT
  alloydb_ai_nl.generate_schema_context(
    'my_app_config' -- nl_config
  );

通过运行以下语句，查看生成的架构上下文：
```
SELECT schema_object, object_context
FROM alloydb_ai_nl.generated_schema_context_view;
```
生成的架构上下文存储在上述视图中。

可选：如需更新生成的架构上下文，请运行以下语句：

SELECT
  alloydb_ai_nl.update_generated_relation_context(
    'my_schema.my_table',
    'This table contains archival records, if you need latest records use records_new table.'
  );

SELECT
  alloydb_ai_nl.update_generated_column_context(
    'my_schema.my_table.column1',
    'The seat_class column takes single letters like "E" for economy, "P" for premium economy, "B" for business and "F" for First.'
  );

应用上下文。当您应用上下文时，它会立即生效，并从 generated_schema_context_view 中删除。运行以下命令：

-- For all schema objects (tables, views, materialized views and columns)
-- within the scope of nl_config.
SELECT
  alloydb_ai_nl.apply_generated_schema_context(
    'my_app_config' --nl_config
  );

可选：验证生成的上下文。借助以下 API，您可以检查架构上下文，在生成 SQL 语句时会使用这些上下文：

-- For table, view or materialized view.
SELECT
  alloydb_ai_nl.get_relation_context(
    'my_schema.my_table'
  );

-- For column.
SELECT
  alloydb_ai_nl.get_column_context(
    'my_schema.my_table.column1'
  );

可选：如需手动设置架构上下文，请运行以下语句：

-- For table, view or materialized view.
SELECT
  alloydb_ai_nl.set_relation_context(
    'my_schema.my_table',
    'One-to-many mapping from product to categories'
  );

-- For column.
SELECT
  alloydb_ai_nl.set_column_context(
    'my_schema.my_table.column1',
    'This column provides additional tagged info for the product in  Json format, e.g., additional color or size information of the product - tags: { "color": "red", "size": "XL"}'
  );

创建查询模板

如需提高使用 LLM 构建的生成式 AI 应用的质量，您可以添加模板。查询模板是一组精选的代表性或常见的自然语言问题，其中包含相应的 SQL 查询以及说明，以便为自然语言到 SQL (NL2SQL) 生成提供声明性依据。模板主要由应用指定，但模板也可以由 alloydb_ai_nl 扩展程序根据常用 SQL 查询自动生成。每个模板都必须与 nl_config 关联。

alloydb_ai_nl 扩展程序使用 template_store 在生成 SQL 语句的过程中动态合并相关 SQL 模板，以回答用户的问题。template_store 会识别与所问自然语言问题意图类似的模板，识别相应的参数化 SQL 语句，并通过使用自然语言问题中的值实例化参数来合成 SQL 语句。不过，如果没有与用户提出的问题具有相同意图的模板，alloydb_ai_nl 会使用每个相关的模板和上下文来编写 SQL 语句。

如需添加模板，您需要指定问题（使用名为 intent 的参数）和 SQL 查询。

如需将模板添加到模板存储区，请运行以下语句：

SELECT
  alloydb_ai_nl.add_template(
    nl_config => 'my_app_config',
    intent => 'How many accounts associated with loans are located in the Prague region?',
    sql => 'SELECT COUNT(T1.account_id)
            FROM bird_dev_financial.account AS T1
            INNER JOIN bird_dev_financial.loan AS T2
              ON T1.account_id = T2.account_id
            INNER JOIN bird_dev_financial.district AS T3
              ON T1.district_id = T3.district_id
            WHERE T3."A3" = ''Prague''',
    check_intent => TRUE
  );

当 check_intent 为 TRUE 时，alloydb_ai_nl 会执行语义检查，以确认所提供的 intent 与传入的 SQL 语句匹配。如果 intent 与 SQL 语句不匹配，系统不会添加模板。

自动生成模板

在表中添加了代表性数据集后，我们建议您运行与最终用户可能提出的常见问题相对应的 SQL 查询。请务必确保查询具有良好的查询计划，并且查询性能良好。

运行查询后，AlloyDB AI 自然语言可以根据查询历史记录自动生成模板。您可以调用以下 API 来生成模板。您需要先查看并应用生成的模板，然后才能使其生效。

模板自动生成基于查询日志 google_db_advisor_workload_statements 中最常用的查询。查询会根据以下条件进行过滤：

SELECT 语句
可执行文件：EXPLAIN 命令成功处理查询。
无重复：查询之前未用于生成模板。
所有引用的表和视图都在 nl_config 的范围内。

如需自动生成、查看和应用模板，请按照以下步骤操作：

请求 AlloyDB 根据您的查询历史记录生成模板：
```
SELECT
  alloydb_ai_nl.generate_templates(
    'my_app_config',
);
```
使用提供的视图 alloydb_ai_nl.generated_templates_view 查看 generated_templates。

以下输出显示了生成的模板数量：
```
-[ RECORD 1 ]------+--
generate_templates | 1
```

使用 generated_templates_view 视图查看生成的模板。

SELECT *
FROM alloydb_ai_nl.generated_templates_view;

以下是返回的输出示例：

-[ RECORD 1 ]----------------------------------------------------------------
id          | 1
config      | my_app_config
type        | Template
manifest    | How many clients have a birth year of a given number?
nl          | How many clients have a birth year of 1997?
sql         | select count(*) from public.client as T where
             to_char(T.birth_date::timestamp, 'YYYY') = '1997';
intent      | How many clients have a birth year of 1997?
psql        | select count(*) from public.client as T where
             to_char(T.birth_date::timestamp, 'YYYY') = $1;
pintent     | How many clients have a birth year of $1?
comment     |
explanation |
weight      | 1

返回输出中的 manifest 是一个通用模板，或对问题类型或可执行的操作的宽泛描述。pintent 是 intent 的参数化版本，它通过将特定值 (1997) 替换为占位符 ($1) 来泛化 intent。

如需更新生成的模板，请运行以下示例语句：

SELECT alloydb_ai_nl.update_generated_template(
  id => 1,
  manifest => 'How many clients are born in a given year?',
  nl => 'How many clients are born in 1997?',
  intent => 'How many clients are born in 1997?',
  pintent => 'How many clients are born in $1?'

);

应用模板。您应用的模板会立即添加到模板存储区中，并从审核视图中删除。

-- For all templates generated under the nl config.
SELECT
  alloydb_ai_nl.apply_generated_templates('my_app_config');

为自然语言配置安全性

如需为 AlloyDB AI 自然语言配置安全性，请参阅使用参数化安全视图管理数据应用安全性。

定义概念类型和值索引

您可以定义概念类型和值索引，以更深入地了解所提出的问题。概念类型是实体的类别或类，用于标识字词和短语的语义，而不仅仅是其字面形式。

例如，即使一个国家/地区名称是大写（例如 USA），而另一个国家/地区名称是小写（例如 usa），这两个国家/地区名称也可能相同。在此示例中，国家/地区名称是概念类型。概念类型的其他示例包括人员姓名、城市名称和日期。

值索引是基于与每列关联的概念类型的、对属于自然语言配置 nl_config 的列中的值进行的索引。借助值索引，可以高效地匹配所提问题的值短语和数据库中的值。

如需定义概念类型和值索引，请使用提供的示例按照以下步骤操作。这些示例将列与概念类型关联，创建和刷新值索引，并使用同义词集执行值搜索。

如需将列与概念类型关联，请运行以下查询：

SELECT
  alloydb_ai_nl.associate_concept_type(
    column_names_in => 'my_schema.country.country_name',
    concept_type_in => 'country_name',
    nl_config_id_in => 'my_app_config'
  );

如需根据属于自然语言配置且与概念类型关联的所有列创建值索引，请运行以下语句：
```
SELECT
  alloydb_ai_nl.create_value_index(
    nl_config_id_in => 'my_app_config'
  );
```

将概念类型关联到新列时，刷新值索引以反映更改。使用以下语句：

SELECT
  alloydb_ai_nl.refresh_value_index(
    nl_config_id_in => 'my_app_config'
  );

如需让 AlloyDB AI 自然语言匹配某个值的同义词，请运行以下示例语句：
```
SELECT
  alloydb_ai_nl.insert_synonym_set(
    ARRAY [
      'USA',
      'US',
      'United States',
      'United States of America'
    ]
  );
```
虽然表中的数据可能会使用特定值（例如，如果 United States 用于标识国家/地区），但您可以定义一个同义词集，其中包含 United States 的所有同义词。如果自然语言问题中出现任何同义词，AlloyDB AI 自然语言会将同义词与表中的值匹配。

在给定一系列值短语的情况下，使用以下语句执行值搜索以查找正确的数据库值：

SELECT
  alloydb_ai_nl.get_concept_and_value(
    value_phrases_in => ARRAY['United States'],
    nl_config_id_in  => 'my_app_config'
  );

例如，如果用户使用以下 get_sql 查询提出“美国的人口是多少？”之类的问题，AlloyDB AI 自然语言会将 get_concept_and_value 函数与值短语 United States 结合使用，以对值索引执行模糊搜索。模糊搜索是一种搜索技术，即使搜索查询与相应数据不完全匹配，也能找到匹配项。

自然语言会找到一个与搜索查询相近的结果（值 USA），并使用该结果生成 SQL 查询。

SELECT
  alloydb_ai_nl.get_sql(
    nl_config_id    => 'my_app_config',
    nl_question     => 'What is the population of the United States?',
    additional_info => json_build_object('enrich_nl_question', TRUE)
  ) ->> 'sql';

下表列出了 AlloyDB AI 自然语言定义的内置概念类型：

概念名称	说明
`generic_entity_name`	单个字符串类型列可用于通用实体名称。例如： SELECT alloydb_ai_nl.associate_concept_type('public.item.item_name', 'generic_entity_name')
`country_name`、`city_name`、`region_name`	国家/地区、城市和区域的名称。用法与 `generic_entity_name` 概念类型完全相同。
`full_person_name`	联系人姓名，包括名字、姓氏和中间名。最多可以使用三个字符串类型列来表示人员全名。将名称列与 `full_person_name` 关联时，可以跳过任意列。例如： SELECT alloydb_ai_nl.associate_concept_type('public.person.last_name,public.person.first_name,public.person.middle_name','full_person_name')
`ssn`	包含社会保障号的单个字符串列。例如： SELECT alloydb_ai_nl.associate_concept_type('public.person.ssn','ssn')
`date`	日期或时间戳。例如： SELECT alloydb_ai_nl.associate_concept_type('public.person.date_col','date')

通过自然语言输入生成 SQL 语句

您可以使用 AlloyDB AI 自然语言，通过自然语言输入生成 SQL 语句。当您运行生成的 SQL 语句时，它会从数据库中提供您需要的数据，来回答自然语言问题。

如需使用自然语言通过 alloydb_ai_nl.get_sql 函数从数据库中获取结果，请使用以下示例：

SELECT
  alloydb_ai_nl.get_sql(
    'my_app_config', -- nl_config
    'What is the sum that client number 4''s account has following transaction 851?' -- nl question
  );

系统会返回以下 JSON 输出：

{
  "sql": "SELECT T3.balance FROM public.client AS T1 INNER JOIN public.account AS T2 ON T1.district_id = T2.district_id INNER JOIN public.trans AS T3 ON T2.account_id = T3.account_id WHERE T1.client_id = 4 AND T3.trans_id = 851",
  "prompt": "",
  "retries": 0,
  "error_msg": "",
  "nl_question": "What is the sum that client number 4's account has following transaction 851?"
}

可选：如需将生成的 SQL 查询提取为文本字符串，请添加 ->>'sql'：
```
SELECT
  alloydb_ai_nl.get_sql(
    'my_app_config', -- nl_config
    'What is the sum that client number 4''s account has following transaction 851?' -- nl question
  ) ->> 'sql';
```
->> 运算符用于将 JSON 值提取为文本。alloydb_ai_nl.get_sql 函数会返回一个 JSON 对象，该对象是用于检索与键 sql 关联的值的语句的一部分。此值是生成的 SQL 查询。

测试并优化

如需改进自动生成的查询，请修改或添加更完善的上下文、查询模板和值索引，然后迭代，直到获得所需的结果。

后续步骤

了解 AlloyDB AI 自然语言应用场景和关键功能。
使用 AlloyDB AI 自然语言生成 SQL。
了解如何使用 AlloyDB AI 自然语言在 Google Agentspace 中搜索存储在 AlloyDB 中的关系型数据（预览版）。