此页面由 Cloud Translation API 翻译。

使用 Ranking API 提高搜索质量和 RAG 质量

在 Vertex AI Search 中体验检索增强生成 (RAG) 功能时，您可以根据查询对一组文档进行排名。

排名 API 会接收一个文档列表作为输入，然后根据这些文档与查询的相关程度对它们进行重新排名。与仅考虑文档和查询的语义相似性的嵌入相比，Ranking API 可以针对文档回答给定查询的程度提供精确的得分。在检索到一组初始候选文档后，可以使用排名 API 来提高搜索结果的质量。

排名 API 是无状态的，因此无需在调用 API 之前将文档编入索引。您只需传入查询和文档即可。因此，该 API 非常适合对来自 Vector Search 和其他搜索解决方案的文档进行重新排名。

本页介绍了如何使用排名 API 根据查询对一组文档进行排名。

使用场景

排名 API 的主要用途是提高搜索结果的质量。

不过，在需要查找哪些内容与用户查询最相关的所有场景中，排名 API 都非常有用。例如，排名 API 可帮助您完成以下任务：

找到适合提供给 LLM 以进行接地的内容
提高现有搜索体验的相关性
识别文档的相关部分

以下流程概述了如何使用排名 API 来提高分块文档的结果质量：

使用 Document AI Layout Parser API 将一组文档拆分为多个文档块。
使用嵌入 API 为每个分块创建嵌入。
将嵌入加载到 Vector Search 或其他搜索解决方案中。
查询搜索索引并检索最相关的块。
使用排名 API 对相关块进行重新排名。

输入数据

排名 API 需要以下输入：

您要对记录进行排名的查询。

例如：
```
"query": "Why is the sky blue?"
```

与查询相关的一组记录。记录以对象数组的形式提供。每条记录都可以包含唯一 ID、标题和文档内容。每条记录都应包含标题、内容或两者兼有。每条记录支持的词元数上限取决于所用的模型版本。例如，版本 003 及更早版本的模型支持 512 个 token，而版本 004 支持 1024 个 token。如果标题和内容的总长度超过模型的 token 数量上限，则会截断多余的内容。每个请求最多可包含 200 条记录。

例如，记录数组如下所示。实际上，数组中会包含更多记录，内容也会长得多：

"records": [
   {
       "id": "1",
       "title": "The Color of the Sky: A Poem",
       "content": "A canvas stretched across the day,\nWhere sunlight learns to dance and play.\nBlue, a hue of scattered light,\nA gentle whisper, soft and bright."
   },
   {
       "id": "2",
       "title": "The Science of a Blue Sky",
       "content": "The sky appears blue due to a phenomenon called Rayleigh scattering. Sunlight is comprised of all the colors of the rainbow. Blue light has shorter wavelengths than other colors, and is thus scattered more easily."
   }
]

可选：您希望排名 API 返回的记录数量上限。默认情况下，系统会返回所有记录；不过，您可以使用 topN 字段来减少返回的记录数。无论设置了什么值，所有记录都会进行排名。

例如，以下代码会返回排名前 10 的记录：
```
"topN": 10,
```
可选：一项设置，用于指定您是仅希望 API 返回记录的 ID，还是希望同时返回记录标题和内容。默认情况下，系统会返回完整记录。设置此属性的主要原因是，如果您想减小响应载荷的大小。

例如，如果将该设置设为 true，则只会返回记录 ID，而不会返回标题或内容：
```
"ignoreRecordDetailsInResponse": true,
```
可选：模型名称。用于指定对文档进行排名的模型。如果未指定模型，则使用 semantic-ranker-default@latest，它会自动指向最新的可用模型。如需指向特定模型，请指定支持的模型中列出的某个模型名称，例如 semantic-ranker-512-003。

在以下示例中，model 设置为 semantic-ranker-default@latest。这意味着，排名 API 将始终使用最新的可用模型。
```
"model": "semantic-ranker-default@latest"
```

输出数据

排名 API 会返回一个排名后的记录列表，其中包含以下输出：

得分：介于 0 到 1 之间的浮点值，用于指示记录的相关性。
ID：记录的唯一 ID。
如果请求，则返回完整对象：ID、标题和内容。

例如：

{
    "records": [
        {
            "id": "2",
            "score": 0.98,
            "title": "The Science of a Blue Sky",
            "content": "The sky appears blue due to a phenomenon called Rayleigh scattering. Sunlight is comprised of all the colors of the rainbow. Blue light has shorter wavelengths than other colors, and is thus scattered more easily."
        },
        {
            "id": "1",
            "score": 0.64,
            "title": "The Color of the Sky: A Poem",
            "content": "A canvas stretched across the day,\nWhere sunlight learns to dance and play.\nBlue, a hue of scattered light,\nA gentle whisper, soft and bright."
        }
    ]
}

根据查询对一组记录进行排名（或重新排名）

通常，您会向排名 API 提供查询和一组与该查询相关且已通过某种其他方法（例如关键字搜索或矢量搜索）进行排名的记录。然后，您可以使用排名 API 来提高排名质量，并确定一个分数来表示每条记录与查询的相关性。

获取查询和生成的记录。确保每条记录都有一个 ID，并且包含标题、内容或两者兼有。

每条记录支持的令牌数量上限取决于模型版本。版本最高为 003 的模型（例如 semantic-ranker-512-003）支持每个记录 512 个 token。从版本 004 开始，此限制增加到 1024 个令牌。如果标题和内容的总长度超过模型的 token 数量上限，则会截断多余的内容。
使用以下代码调用 rankingConfigs.rank 方法：

REST

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/rankingConfigs/default_ranking_config:rank" \
-d '{
"model": "semantic-ranker-default@latest",
"query": "QUERY",
"records": [
    {
        "id": "RECORD_ID_1",
        "title": "TITLE_1",
        "content": "CONTENT_1"
    },
    {
        "id": "RECORD_ID_2",
        "title": "TITLE_2",
        "content": "CONTENT_2"
    },
    {
        "id": "RECORD_ID_3",
        "title": "TITLE_3",
        "content": "CONTENT_3"
    }
]
}'

替换以下内容：

PROJECT_ID：您的 Google Cloud 项目的 ID。
QUERY：用于对记录进行排名和评分的查询。
RECORD_ID_n：用于标识记录的唯一字符串。
TITLE_n：记录的标题。
CONTENT_n：记录的内容。

如需了解此方法的一般信息，请参阅 rankingConfigs.rank。

点击查看 curl 命令和响应示例。

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: my-project-123" \
    "https://discoveryengine.googleapis.com/v1/projects/my-project-123/locations/global/rankingConfigs/default_ranking_config:rank" \
    -d '{
        "model": "semantic-ranker-default@latest",
        "query": "what is Google gemini?",
        "records": [
            {
                "id": "1",
                "title": "Gemini",
                "content": "The Gemini zodiac symbol often depicts two figures standing side-by-side."
            },
            {
                "id": "2",
                "title": "Gemini",
                "content": "Gemini is a cutting edge large language model created by Google."
            },
            {
                "id": "3",
                "title": "Gemini Constellation",
                "content": "Gemini is a constellation that can be seen in the night sky."
            }
        ]
    }'

{
    "records": [
        {
            "id": "2",
            "title": "Gemini",
            "content": "Gemini is a cutting edge large language model created by Google.",
            "score": 0.97
        },
        {
            "id": "3",
            "title": "Gemini Constellation",
            "content": "Gemini is a constellation that can be seen in the night sky.",
            "score": 0.18
        },
        {
            "id": "1",
            "title": "Gemini",
            "content": "The Gemini zodiac symbol often depicts two figures standing side-by-side.",
            "score": 0.05
        }
    ]
}

Python

如需了解详情，请参阅 Vertex AI Search Python API 参考文档。

如需向 Vertex AI Search 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

from google.cloud import discoveryengine_v1 as discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"

client = discoveryengine.RankServiceClient()

# The full resource name of the ranking config.
# Format: projects/{project_id}/locations/{location}/rankingConfigs/default_ranking_config
ranking_config = client.ranking_config_path(
    project=project_id,
    location="global",
    ranking_config="default_ranking_config",
)
request = discoveryengine.RankRequest(
    ranking_config=ranking_config,
    model="semantic-ranker-default@latest",
    top_n=10,
    query="What is Google Gemini?",
    records=[
        discoveryengine.RankingRecord(
            id="1",
            title="Gemini",
            content="The Gemini zodiac symbol often depicts two figures standing side-by-side.",
        ),
        discoveryengine.RankingRecord(
            id="2",
            title="Gemini",
            content="Gemini is a cutting edge large language model created by Google.",
        ),
        discoveryengine.RankingRecord(
            id="3",
            title="Gemini Constellation",
            content="Gemini is a constellation that can be seen in the night sky.",
        ),
    ],
)

response = client.rank(request=request)

# Handle the response
print(response)

支持的模型

以下模型可供使用。

模型名称	最新模型 (`semantic-ranker-default@latest`)	输入	上下文窗口	发布日期	终止日期
`semantic-ranker-default-004`	是	文字（25 种语言）	1024	2025 年 4 月 9 日	待定
`semantic-ranker-fast-004`	否	文字（25 种语言）	1024	2025 年 4 月 9 日	待定
`semantic-ranker-default-003`	否	文字（25 种语言）	512	2024 年 9 月 10 日	待定
`semantic-ranker-default-002`	否	文字（仅限英文）	512	2024 年 6 月 3 日	待定

后续步骤

了解如何将排名方法与其他 RAG API 搭配使用，以根据非结构化数据生成接地回答。