此页面由 Cloud Translation API 翻译。

解析和分块文档

本页面介绍了如何使用 Vertex AI Search 解析文档并将其分块。

您可以配置解析或分块设置，以便：

指定 Vertex AI Search 解析内容的方式。您可以指定在将非结构化内容上传到 Vertex AI Search 时如何解析该内容。Vertex AI Search 提供数字解析器、PDF OCR 解析器和布局解析器。您还可以自带已解析的文档。如果您有丰富的内容和结构化元素（例如要从文档中提取以用于搜索和生成答案的部分、段落、表格、图片和列表），建议使用布局解析器。

请参阅通过解析改进内容检测。
使用 Vertex AI Search 进行检索增强生成 (RAG)。 使用您已上传到 Vertex AI Search 应用的相关数据来改进 LLM 的输出。为此，您需要开启文档分块功能，该功能会将您的数据编入索引，以提高相关性并降低 LLM 的计算负荷。您还将开启布局解析器，该解析器可检测标题和列表等文档元素，以改进文档分块方式。

如需了解 RAG 的分块以及如何在搜索请求中返回块，请参阅为 RAG 分块文档。

解析文档

您可以通过以下方式控制内容解析：

指定解析器类型。您可以根据文件类型指定要应用的解析类型：
- 数字解析器。默认情况下，数字解析器处于开启状态，适用于所有文件类型，除非指定了其他解析器类型。如果没有为数据存储区指定其他默认解析器，或者指定解析器不支持所提取文档的文件类型，则数字解析器会处理所提取的文档。
- PDF 的 OCR 解析。如果您打算上传扫描的 PDF 或包含图片内文本的 PDF，可以开启 OCR 解析器来改进 PDF 索引。请参阅本文档的 PDF 的 OCR 解析器部分。
- 布局解析器。如果您打算将 Vertex AI Search 用于 RAG，请为 HTML、PDF 或 DOCX 文件启用布局解析器。如需了解此解析器以及如何启用它，请参阅为 RAG 分块文档。
自带已解析的文档。（已列入许可名单的预览版）如果您已解析非结构化文档，则可以将预解析的内容导入 Vertex AI Search。请参阅自带已解析的文档。

解析器可用性比较

下表列出了每种解析器在不同文档文件类型中的可用性，并显示了每种解析器可以检测和解析的元素。

文件类型	数字解析器	OCR 解析器	布局解析器
HTML	检测段落元素	不适用	检测段落、表格、图片、列表、标题和标题元素
PDF	检测段落（数字文本）元素	检测段落元素	检测段落、表格、图片、标题和标题元素
DOCX（预览版）	检测段落元素	不适用	检测段落、表格、图片、列表、标题、标题元素
PPTX（预览版）	检测段落元素	不适用	检测段落、表格、图片、列表、标题、标题元素
TXT	检测段落元素	不适用	不适用
XLSX（预览版）	检测段落元素	不适用	检测段落、表格、标题、标题元素

数字解析器

数字解析器从文档中提取机器可读的文本。它会检测文本块，但不会检测表格、列表和标题等文档元素。

如果您在创建数据存储区时未将其他解析器指定为默认解析器，或者指定的解析器不支持正在上传的文件类型，则数字解析器将用作默认解析器。

PDF 的 OCR 解析器

如果您有无法搜索的 PDF（扫描的 PDF 或包含图片内文本的 PDF，例如信息图表），Google 建议在创建数据存储区时开启光学字符识别 (OCR) 处理。这允许 Vertex AI Search 提取段落元素。

如果您有可搜索的 PDF 或其他主要由机器可读文本组成的数字格式，通常不需要使用 OCR 解析器。不过，如果您有同时包含无法搜索的文本（例如扫描的文本或信息图表）和机器可读文本的 PDF，则可以在指定 OCR 解析器时将 useNativeText 字段设置为 true。在这种情况下，系统会将机器可读文本与 OCR 解析输出合并，以提高文本提取质量。

OCR 处理功能适用于具有非结构化数据存储区的自定义搜索应用。

OCR 处理器可以解析 PDF 文件的前 500 页。超出 500 个网页限制的网页不会被处理。

布局解析器

借助布局解析，Vertex AI Search 可以检测 PDF 和 HTML 的布局。对 DOCX 文件的支持目前处于预览版阶段。然后，Vertex AI Search 可以识别文本块、表格、列表等内容元素，以及标题和标题等结构元素，并使用这些元素来定义文档的组织结构和层次结构。

您可以为所有文件类型启用布局解析，也可以指定要为哪些文件类型启用布局解析。布局解析器会检测内容元素（例如段落、表格、列表）和结构元素（例如标题、标题、页眉、脚注）。

仅当使用文档分块进行 RAG 时，布局解析器才可用。启用文档分块后，Vertex AI Search 会在注入时将文档分成块，并可以以块的形式返回文档。检测文档布局有助于实现内容感知分块，并改进与文档元素相关的搜索和回答生成功能。如需详细了解如何对文档进行分块以用于 RAG，请参阅对文档进行分块以用于 RAG。

图片注解（预览版功能）

如果启用了图片注解，当在源文档中检测到图片时，系统会将该图片的说明（注解）和图片本身分配给一个块。注解用于确定是否应在搜索结果中返回相应块。如果生成了答案，注解可以作为答案的来源。

布局解析器可以检测以下图片类型：BMP、GIF、JPEG、PNG 和 TIFF。

表格注解

如果启用了表格注解，当在源文档中检测到表格时，系统会将表格的说明（注解）和表格本身分配给一个块。注解用于确定是否应在搜索结果中返回相应块。如果生成了答案，注解可以作为答案的来源。

排除 HTML 内容

使用布局解析器处理 HTML 文档时，您可以排除 HTML 内容的特定部分，使其不被处理。为了提高搜索应用和 RAG 应用的数据质量，您可以排除样板或部分内容，例如导航菜单、标题、页脚或侧边栏。

为此，layoutParsingConfig 提供了以下字段：

excludeHtmlElements：要排除的 HTML 标记列表。这些标记中的内容会被排除。
excludeHtmlClasses：要排除的 HTML 类属性的列表。包含这些类属性的 HTML 元素及其内容会被排除。
excludeHtmlIds：要排除的 HTML 元素 ID 属性的列表。具有这些 ID 属性的 HTML 元素及其内容会被排除。

指定默认解析器

在创建数据存储区时，通过添加 documentProcessingConfig 对象，您可以为该数据存储区指定默认解析器。如果不添加 documentProcessingConfig.defaultParsingConfig，系统会使用数字解析器。如果指定解析器不适用于相应文件类型，系统也会使用数字解析器。

REST

如需指定默认解析器，请执行以下操作：

使用 API 创建搜索数据存储区时，请在数据存储区创建请求中添加 documentProcessingConfig.defaultParsingConfig。您可以指定 OCR 解析器、布局解析器或数字解析器：
- 如需为 PDF 指定 OCR 解析器，请执行以下操作：
```
"documentProcessingConfig": {
  "defaultParsingConfig": {
    "ocrParsingConfig": {
      "useNativeText": "NATIVE_TEXT_BOOLEAN"
    }
  }
}
```
  - NATIVE_TEXT_BOOLEAN 为可选项。仅当您提取 PDF 时才设置此属性。如果设置为 true，则会为 OCR 解析器启用机器可读文本处理。默认值为 false。
- 指定布局解析器：
```
"documentProcessingConfig": {
  "defaultParsingConfig": {
    "layoutParsingConfig": {}
  }
}
```
- 指定数字解析器：
  
  注意：通常不需要将数字解析器指定为 defaultParsingConfig。如果未明确指定其他解析器，系统会默认使用数字解析器。
```
 "documentProcessingConfig": {
    "defaultParsingConfig": { "digitalParsingConfig": {} }
 }
```

控制台

通过控制台创建搜索数据存储区时，您可以指定默认解析器。

示例

以下示例在创建数据存储区期间指定 OCR 解析器将成为默认解析器。由于 OCR 解析器仅适用于 PDF 文件，因此所有提取的 PDF 文件都将由 OCR 解析器处理，而任何其他文件类型都将由数字解析器处理。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1/projects/exampleproject/locations/global/collections/default_collection/dataStores?dataStoreId=datastore123" \
-d '{
  "displayName": "exampledatastore",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
  "contentConfig": "CONTENT_REQUIRED",
  "documentProcessingConfig": {
    "defaultParsingConfig": {
      "ocrParsingConfig": {
        "useNativeText": "false"
      }
    }
  }
}'

为文件类型指定解析器替换项

您可以指定特定文件类型（PDF、HTML 或 DOCX）应由不同于默认解析器的解析器进行解析。为此，请在数据存储区创建请求中添加 documentProcessingConfig 字段，并指定替换解析器。如果您未指定默认解析器，则数字解析器为默认解析器。

REST

如需指定特定于文件类型的解析器替换项，请执行以下操作：

使用 API 创建搜索数据存储区时，请在数据存储区创建请求中添加 documentProcessingConfig.defaultParsingConfig。

您可以为 pdf、html 或 docx 指定解析器：
```
"documentProcessingConfig": {
  "parsingConfigOverrides": {
    "FILE_TYPE": { PARSING_CONFIG },
  }
 }
```
替换以下内容：
- FILE_TYPE：可接受的值包括 pdf、html 和 docx。
- PARSING_CONFIG：指定要应用于文件类型的解析器配置。您可以指定 OCR 解析器、布局解析器或数字解析器：
  - 如需为 PDF 指定 OCR 解析器，请执行以下操作：
```
"ocrParsingConfig": {
  "useNativeText": "NATIVE_TEXT_BOOLEAN"
}
```
    - NATIVE_TEXT_BOOLEAN：可选。仅在您提取 PDF 时设置。如果设置为 true，则会为 OCR 解析器启用机器可读文本处理。默认值为 false。
  - 指定布局解析器：
```
"layoutParsingConfig": {}
```
  - 指定数字解析器：
```
"documentProcessingConfig": {
  "defaultParsingConfig": { "digitalParsingConfig": {} }
}
```

控制台

通过控制台创建搜索数据存储区时，您可以为特定文件类型指定解析器替换项。

示例

以下示例在创建数据存储区期间指定，PDF 文件应由 OCR 解析器处理，HTML 文件应由布局解析器处理。在这种情况下，除 PDF 和 HTML 文件之外的所有文件都将由数字解析器处理。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1/projects/exampleproject/locations/global/collections/default_collection/dataStores?dataStoreId=datastore123" \
-d '{
  "displayName": "exampledatastore",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
  "contentConfig": "CONTENT_REQUIRED",
  "documentProcessingConfig": {
    "parsingConfigOverrides": {
      "pdf": {
        "ocrParsingConfig": {
            "useNativeText": "false"
          },
      },
      "html": {
         "layoutParsingConfig": {}
      }
    }
  }
}'

修改现有数据存储区的文档解析

如果您已有数据存储区，则可以更改默认解析器并添加文件格式例外情况。不过，更新后的解析器设置仅适用于导入到数据存储区的新文档。系统不会使用新设置重新解析数据存储区中已有的文档。

如需更改数据存储区的文档解析设置，请执行以下操作：

在 Google Cloud 控制台中，前往 AI Applications 页面。

AI 应用
在导航菜单中，点击数据存储区。
在名称列中，点击要修改的数据存储区。
在处理配置标签页中，修改文档解析设置。

文档分块设置无法更改。如果数据存储区未启用文档分块，则无法选择布局解析器。
点击提交。

配置布局解析器以排除 HTML 内容

您可以在 documentProcessingConfig.defaultParsingConfig.layoutParsingConfig 中指定 excludeHtmlElements、excludeHtmlClasses 或 excludeHtmlIds，以将布局解析器配置为排除 HTML 内容。

REST

如需排除某些 HTML 内容，使其不被布局解析器处理，请按以下步骤操作：

使用 API 创建搜索数据存储区时，请在数据存储区创建请求中添加 documentProcessingConfig.defaultParsingConfig.layoutParsingConfig。

如需排除特定 HTML 标记类型，请使用：

"documentProcessingConfig": {
  "defaultParsingConfig": {
   "layoutParsingConfig": {
    "excludeHtmlElements": ["HTML_TAG_1","HTML_TAG_2","HTML_TAG_N"]
   }
  }
 }

将 HTML_TAG 变量替换为代码名称，例如 nav 和 footer。

如需排除特定的 HTML 元素类属性，请使用：

"documentProcessingConfig": {
  "defaultParsingConfig": {
   "layoutParsingConfig": {
    "excludeHtmlClasses": ["HTML_CLASS_1","HTML_CLASS_2","HTML_CLASS_N"]
   }
  }
 }

将 HTML_CLASS 变量替换为类属性，例如 overlay 和 screenreader。

如需排除特定的 HTML 元素 ID 属性，请使用：

"documentProcessingConfig": {
  "defaultParsingConfig": {
   "layoutParsingConfig": {
    "excludeHtmlIds": ["HTML_ID_1","HTML_ID_2","HTML_ID_N"]
   }
  }
 }

将 HTML_ID 变量替换为 ID 属性，例如 cookie-banner。

示例

此示例指定，当布局解析器处理 HTML 文件时，解析器会跳过以下内容：

HTML 元素标记：header、footer、nav 和 aside
类型为 overlays 和 screenreader 的 HTML 元素类属性
属性 ID 为 cookie-banner 的任何元素

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1/projects/exampleproject/locations/global/collections/default_collection/dataStores?dataStoreId=datastore123" \
-d '{
  "displayName": "exampledatastore",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
  "contentConfig": "CONTENT_REQUIRED",
  "documentProcessingConfig": {
    "defaultParsingConfig": {
      "layoutParsingConfig": {
       "excludeHtmlElements": ["header", "footer", "nav", "aside"],
       "excludeHtmlClasses": ["overlays", "screenreader"],
       "excludeHtmlIds": ["cookie-banner"]
      }
    }
  }
}'

以 JSON 格式获取已解析的文档

您可以调用 getProcessedDocument 方法并指定 PARSED_DOCUMENT 作为处理后的文档类型，以获取 JSON 格式的已解析文档。如果您需要将已解析的文档上传到其他位置，或者决定使用自带已解析的文档功能将已解析的文档重新导入到 AI 应用，那么以 JSON 格式获取已解析的文档会很有帮助。

REST

如需以 JSON 格式获取已解析的文档，请按以下步骤操作：

调用 getProcessedDocument 方法：

curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID:getProcessedDocument?processed_document_type=PARSED_DOCUMENT"

替换以下内容：

PROJECT_ID：您的项目的 ID。
DATA_STORE_ID：数据存储区的 ID。
DOCUMENT_ID：要获取的文档的 ID。

自带已解析的文档

您可以将预解析的非结构化文档导入到 Vertex AI Search 数据存储区中。例如，您可以自行解析 PDF，然后导入解析结果，而不是导入原始 PDF 文档。这样一来，您就可以以结构化的方式导入文档，确保搜索和回答生成功能能够获取有关文档布局和元素的信息。

解析后的非结构化文档由 JSON 表示，该 JSON 使用一系列文本、表格和列表块来描述非结构化文档。您可以采用与导入其他类型的非结构化文档（例如 PDF）相同的方式导入 JSON 文件，其中包含已解析的非结构化文档数据。启用此功能后，每当上传 JSON 文件并由 application/json MIME 类型或 .JSON 扩展名识别时，系统都会将其视为已解析的文档。

如需开启此功能并了解如何使用，请与您的 Google 客户支持团队联系。

为 RAG 分块文档

默认情况下，Vertex AI Search 针对文档检索进行了优化，您的搜索应用会在每次搜索结果中返回文档（例如 PDF 或网页）。

文档分块功能适用于具有非结构化数据存储区的自定义搜索应用。

不过，Vertex AI Search 可以针对 RAG 进行优化，在这种情况下，您的搜索应用主要用于使用自定义数据增强 LLM 输出。启用文档分块后，Vertex AI Search 会将文档拆分为多个块。在搜索结果中，搜索应用可以返回相关的数据块，而不是完整的文档。使用分块数据进行 RAG 可提高 LLM 回答的相关性，并减少 LLM 的计算负荷。

如需将 Vertex AI Search 用于 RAG，请执行以下操作：

创建数据存储区时，开启文档分块。

或者，如果您已将自己的文档分块，也可以上传自己的块（预览版，需加入许可名单）。
您可以通过以下方式检索和查看块：
在搜索请求中返回块。

限制

分块存在以下限制：

创建数据存储区后，便无法开启或关闭文档分块。
您可以从启用了文档分块的数据存储区中搜索文档，而不是搜索数据块。不过，启用了文档分块的数据存储区并未针对返回文档进行优化。通过将块聚合为文档来返回文档。
启用文档分块后，公开预览版支持搜索摘要和后续搜索，但正式版不支持。

文档分块选项

本部分介绍了您指定的用于启用文档分块的选项。

在创建数据存储区期间，请开启以下选项，以便 Vertex AI Search 可以将文档编入索引（以块为单位）。

可识别布局的文档分块。如需启用此选项，请在数据存储区创建请求中添加 documentProcessingConfig 字段并指定 ChunkingConfig.LayoutBasedChunkingConfig。

启用布局感知型文档分块后，Vertex AI Search 会检测文档的布局，并在分块期间考虑该布局。这样可以提高语义连贯性，并减少内容在用于检索和 LLM 生成时的噪声。一个块中的所有文本都来自同一布局实体，例如标题、副标题和列表。
布局解析。如需启用此选项，请在创建数据存储区期间指定 ParsingConfig.LayoutParsingConfig。

布局解析器可检测 PDF、HTML 和 DOCX 文件的布局。它会识别文本块、表格、列表、标题和标题等元素，并使用这些元素来定义文档的组织和层次结构。

如需详细了解布局解析，请参阅布局解析。

开启文档分块

您可以在数据存储区创建请求中添加 documentProcessingConfig 对象，并开启布局感知型文档分块和布局解析，从而启用文档分块。

REST

如需开启文档分块，请执行以下操作：

使用 API 创建搜索数据存储区时，请在数据存储区创建请求中包含 documentProcessingConfig.chunkingConfig 对象。
```
 "documentProcessingConfig": {
   "chunkingConfig": {
       "layoutBasedChunkingConfig": {
           "chunkSize": CHUNK_SIZE_LIMIT,
           "includeAncestorHeadings": HEADINGS_BOOLEAN,
       }
   },
   "defaultParsingConfig": {
     "layoutParsingConfig": {}
   }
 }
```
替换以下内容：
- CHUNK_SIZE_LIMIT：可选。每个块的令牌大小限制。默认值为 500。支持的值为 100-500（含）。
- HEADINGS_BOOLEAN：可选。确定是否在每个块中包含标题。默认值为 false。将标题和各级标题附加到文档中间部分的分块，有助于防止在检索和排名分块时丢失上下文。

控制台

通过控制台创建搜索数据存储区时，您可以开启文档分块。

自带块（预览版，需要加入许可名单）

如果您已将自己的文档分块，则可以将其上传到 Vertex AI Search，而无需启用文档分块选项。

自带块功能是一项预览版功能，需要加入许可名单才能使用。如需使用此功能，请与您的 Google 客户支持团队联系。

列出文档的块

如需列出特定文档的所有块，请调用 Chunks.list 方法。

REST

如需列出文档的块，请按以下步骤操作：

调用 Chunks.list 方法

curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID/chunks"

替换以下内容：

PROJECT_ID：您的项目的 ID。
DATA_STORE_ID：数据存储区的 ID。
DOCUMENT_ID：要列出块的文档的 ID。

从处理后的文档中获取 JSON 格式的块

您可以通过调用 getProcessedDocument 方法，以 JSON 格式获取特定文档中的所有块。如果您需要将分块上传到其他位置，或者决定使用自带分块功能将分块重新导入到 AI 应用，那么以 JSON 格式获取分块会很有帮助。

REST

如需获取文档的 JSON 块，请按以下步骤操作：

调用 getProcessedDocument 方法：

curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID:getProcessedDocument?processed_document_type=CHUNKED_DOCUMENT"

替换以下内容：

PROJECT_ID：您的项目的 ID。
DATA_STORE_ID：数据存储区的 ID。
DOCUMENT_ID：要从中获取块的文档的 ID。

获取特定块

如需获取特定块，请调用 Chunks.get 方法。

REST

如需获取特定块，请按以下步骤操作：

调用 Chunks.get 方法

curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID/chunks/CHUNK_ID"

替换以下内容：

PROJECT_ID：您的项目的 ID。
DATA_STORE_ID：数据存储区的 ID。
DOCUMENT_ID：相应块所来自的文档的 ID。
CHUNK_ID：要返回的块的 ID。

在搜索请求中返回块

确认数据已正确分块后，Vertex AI Search 可以在搜索结果中返回分块数据。

响应会返回与搜索查询相关的块。此外，您还可以选择返回源文档中相关块之前和之后出现的相邻块。相邻的块可以添加上下文和提高准确性。

REST

如需获取分块数据，请执行以下操作：

发出搜索请求时，请将 ContentSearchSpec.SearchResultMode 指定为 chunks。
```
contentSearchSpec": {
  "searchResultMode": "RESULT_MODE",
  "chunkSpec": {
       "numPreviousChunks": NUMBER_OF_PREVIOUS_CHUNKS,
       "numNextChunks": NUMBER_OF_NEXT_CHUNKS
   }
}
```
- RESULT_MODE：确定搜索结果是以完整文档的形式返回还是以块的形式返回。如需获取块，数据存储区必须开启文档分块功能。可接受的值为 documents 和 chunks。如果为数据存储区启用了文档分块，则默认值为 chunks。
- NUMBER_OF_PREVIOUS_CHUNKS：要返回的相关块之前紧邻的块数。允许的最大值为 5。
- NUMBER_OF_NEXT_CHUNKS：要返回的紧随相关块之后的块数。允许的最大值为 5。

示例

以下搜索查询请求示例将 SearchResultMode 设置为 chunks，请求一个上一个块和一个下一个块，并使用 pageSize 将结果数量限制为单个相关块。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1/projects/exampleproject/locations/global/collections/default_collection/dataStores/datastore123/servingConfigs/default_search:search" \
-d '{
  "query": "animal",
  "pageSize": 1,
  "contentSearchSpec": {
    "searchResultMode": "CHUNKS",
    "chunkSpec": {
           "numPreviousChunks": 1,
           "numNextChunks": 1
       }
  }
}'

以下示例展示了针对示例查询返回的响应。响应包含相关块、前一个块和下一个块、原始文档的元数据，以及每个块的来源文档页面的范围。

响应

{
  "results": [
    {
      "chunk": {
        "name": "projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c17",
        "id": "c17",
        "content": "\n# ESS10: Stakeholder Engagement and Information Disclosure\nReaders should also refer to ESS10 and its guidance notes, plus the template available for a stakeholder engagement plan. More detail on stakeholder engagement in projects with risks related to animal health is contained in section 4 below. The type of stakeholders (men and women) that can be engaged by the Borrower as part of the project's environmental and social assessment and project design and implementation are diverse and vary based on the type of intervention. The stakeholders can include: Pastoralists, farmers, herders, women's groups, women farmers, community members, fishermen, youths, etc. Cooperatives members, farmer groups, women's livestock associations, water user associations, community councils, slaughterhouse workers, traders, etc. Veterinarians, para-veterinary professionals, animal health workers, community animal health workers, faculties and students in veterinary colleges, etc. 8 \n# 4. Good Practice in Animal Health Risk Assessment and Management\n\n# Approach\nRisk assessment provides the transparent, adequate and objective evaluation needed by interested parties to make decisions on health-related risks associated with project activities involving live animals. As the ESF requires, it is conducted throughout the project cycle, to provide or indicate likelihood and impact of a given hazard, identify factors that shape the risk, and find proportionate and appropriate management options. The level of risk may be reduced by mitigation measures, such as infrastructure (e.g., diagnostic laboratories, border control posts, quarantine stations), codes of practice (e.g., good animal husbandry practices, on-farm biosecurity, quarantine, vaccination), policies and regulations (e.g., rules for importing live animals, ban on growth hormones and promotors, feed standards, distance required between farms, vaccination), institutional capacity (e.g., veterinary services, surveillance and monitoring), changes in individual behavior (e.g., hygiene, hand washing, care for animals). Annex 2 provides examples of mitigation practices. This list is not an exhaustive one but a compendium of most practiced interventions and activities. The cited measures should take into account social, economic, as well as cultural, gender and occupational aspects, and other factors that may affect the acceptability of mitigation practices by project beneficiaries and other stakeholders. Risk assessment is reviewed and updated through the project cycle (for example to take into account increased trade and travel connectivity between rural and urban settings and how this may affect risks of disease occurrence and/or outbreak). Projects monitor changes in risks (likelihood and impact) b               by using data, triggers or indicators. ",
        "documentMetadata": {
          "uri": "gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf",
          "title": "AnimalHealthGoodPracticeNote"
        },
        "pageSpan": {
          "pageStart": 14,
          "pageEnd": 15
        },
        "chunkMetadata": {
          "previousChunks": [
            {
              "name": "projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c16",
              "id": "c16",
              "content": "\n# ESS6: Biodiversity Conservation and Sustainable Management of Living Natural Resources\nThe risks associated with livestock interventions under ESS6 include animal welfare (in relation to housing, transport, and slaughter); diffusion of pathogens from domestic animals to wildlife, with risks for endemic species and biodiversity (e.g., sheep and goat plague in Mongolia affecting the saiga, an endemic species of wild antelope); the introduction of new breeds with potential risk of introducing exotic or new diseases; and the release of new species that are not endemic with competitive advantage, potentially putting endemic species at risk of extinction. Animal welfare relates to how an animal is coping with the conditions in which it lives. An animal is in a good state of welfare if it is healthy, comfortable, well nourished, safe, able to express innate behavior, 7 Good Practice Note - Animal Health and related risks and is not suffering from unpleasant states such as pain, fear or distress. Good animal welfare requires appropriate animal care, disease prevention and veterinary treatment; appropriate shelter, management and nutrition; humane handling, slaughter or culling. The OIE provides standards for animal welfare on farms, during transport and at the time of slaughter, for their welfare and for purposes of disease control, in its Terrestrial and Aquatic Codes. The 2014 IFC Good Practice Note: Improving Animal Welfare in Livestock Operations is another example of practical guidance provided to development practitioners for implementation in investments and operations. Pastoralists rely heavily on livestock as a source of food, income and social status. Emergency projects to restock the herds of pastoralists affected by drought, disease or other natural disaster should pay particular attention to animal welfare (in terms of transport, access to water, feed, and animal health) to avoid potential disease transmission and ensure humane treatment of animals. Restocking also entails assessing the assets of pastoralists and their ability to maintain livestock in good conditions (access to pasture and water, social relationship, technical knowledge, etc.). Pastoralist communities also need to be engaged by the project to determine the type of animals and breed and the minimum herd size to be considered for restocking. \n# Box 5. Safeguarding the welfare of animals and related risks in project activities\nIn Haiti, the RESEPAG project (Relaunching Agriculture: Strengthening Agriculture Public Services) financed housing for goats and provided technical recommendations for improving their welfare, which is critical to avoid the respiratory infections, including pneumonia, that are serious diseases for goats. To prevent these diseases, requires optimal sanitation and air quality in herd housing. This involves ensuring that buildings have adequate ventilation and dust levels are reduced to minimize the opportunity for infection. Good nutrition, water and minerals are also needed to support the goats' immune function. The project paid particular attention to: (i) housing design to ensure good ventilation; (ii) locating housing close to water sources and away from human habitation and noisy areas; (iii) providing mineral blocks for micronutrients; (iv) ensuring availability of drinking water and clean food troughs. ",
              "documentMetadata": {
                "uri": "gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf",
                "title": "AnimalHealthGoodPracticeNote"
              },
              "pageSpan": {
                "pageStart": 13,
                "pageEnd": 14
              }
            }
          ],
          "nextChunks": [
            {
              "name": "projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c18",
              "id": "c18",
              "content": "\n# Scoping of risks\nEarly scoping of risks related to animal health informs decisions to initiate more comprehensive risk assessment according to the type of livestock interventions and activities. It can be based on the following considerations: • • • • Type of livestock interventions supported by the project (such as expansion of feed resources, improvement of animal genetics, construction/upgrading and management of post-farm-gate facilities, etc. – see also Annex 2); Geographic scope and scale of the livestock interventions; Human and animal populations that are likely to be affected (farmers, women, children, domestic animals, wildlife, etc.); and Changes in the project or project context (such as emerging disease outbreak, extreme weather or climatic conditions) that would require a re-assessment of risk levels, mitigation measures and their likely effect on risk reduction. Scenario planning can also help to identify project-specific vulnerabilities, country-wide or locally, and help shape pragmatic analyses that address single or multiple hazards. In this process, some populations may be identified as having disproportionate exposure or vulnerability to certain risks because of occupation, gender, age, cultural or religious affiliation, socio-economic or health status. For example, women and children may be the main caretakers of livestock in the case of 9 Good Practice Note - Animal Health and related risks household farming, which puts them into close contact with animals and animal products. In farms and slaughterhouses, workers and veterinarians are particularly exposed, as they may be in direct contact with sick animals (see Box 2 for an illustration). Fragility, conflict, and violence (FCV) can exacerbate risk, in terms of likelihood and impact. Migrants new to a geographic area may be immunologically naïve to endemic zoonotic diseases or they may inadvertently introduce exotic diseases; and refugees or internally displaced populations may have high population density with limited infrastructure, leaving them vulnerable to disease exposure. Factors such as lack of access to sanitation, hygiene, housing, and health and veterinary services may also affect disease prevalence, contributing to perpetuation of poverty in some populations. Risk assessment should identify populations at risk and prioritize vulnerable populations and circumstances where risks may be increased. It should be noted that activities that seem minor can still have major consequences. See Box 6 for an example illustrating how such small interventions in a project may have large-scale consequences. It highlights the need for risk assessment, even for simple livestock interventions and activities, and how this can help during the project cycle (from concept to implementation). ",
              "documentMetadata": {
                "uri": "gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf",
                "title": "AnimalHealthGoodPracticeNote"
              },
              "pageSpan": {
                "pageStart": 15,
                "pageEnd": 16
              }
            }
          ]
        }
      }
    }
  ],
  "totalSize": 61,
  "attributionToken": "jwHwjgoMCICPjbAGEISp2J0BEiQ2NjAzMmZhYS0wMDAwLTJjYzEtYWQxYS1hYzNlYjE0Mzc2MTQiB0dFTkVSSUMqUMLwnhXb7Ygtq8SKLa3Eii3d7Ygtj_enIqOAlyLm7Ygtt7eMLduPmiKN96cijr6dFcXL8xfdj5oi9-yILdSynRWCspoi-eyILYCymiLk7Ygt",
  "nextPageToken": "ANxYzNzQTMiV2MjFWLhFDZh1SMjNmMtADMwATL5EmZyMDM2YDJaMQv3yagQYAsciPgIwgExEgC",
  "guidedSearchResult": {},
  "summary": {}
}

后续步骤

创建搜索数据存储区

解析和分块文档 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

解析文档

解析器可用性比较

数字解析器

PDF 的 OCR 解析器

布局解析器

图片注解（预览版功能）

表格注解

排除 HTML 内容

指定默认解析器

REST

控制台

示例

为文件类型指定解析器替换项

REST

控制台

示例

修改现有数据存储区的文档解析

配置布局解析器以排除 HTML 内容

REST

示例

以 JSON 格式获取已解析的文档

REST

自带已解析的文档

为 RAG 分块文档

限制

文档分块选项

开启文档分块

REST

控制台

自带块（预览版，需要加入许可名单）

列出文档的块

REST

从处理后的文档中获取 JSON 格式的块

REST

获取特定块

REST

在搜索请求中返回块

REST

示例

响应

后续步骤

解析和分块文档