本页介绍了如何使用 Vertex AI Search 解析和分块文档。
您可以配置解析或分块设置,以便:
指定 Vertex AI Search 解析内容的方式。您可以指定 将非结构化内容上传到 Vertex AI Search 时对其进行解析。 Vertex AI Search 提供数字解析器、适用于 PDF 文件的光学字符识别 (OCR) 解析器和布局解析器。您还可以提供自己的解析文档。如果您有富媒体内容并且 要提取的部分、段落、表格、列表等结构元素 以便生成搜索和答案。
请参阅通过解析改进内容检测。
将 Vertex AI Search 用于检索增强生成 (RAG)。 利用您上传到 Vertex AI Search 应用。为此,您需要开启文档分块功能 该方法会将您的数据以数据块的形式编入索引,以提高相关性并减少 计算负载。您还需要打开布局解析器 检测文档元素(例如标题和列表),以改进文档 分块。
如需了解如何对 RAG 进行分块以及如何在搜索请求中返回分块,请参阅对 RAG 进行分块。
解析文档
您可以通过以下方式控制内容解析:
指定解析器类型。您可以指定要应用的解析类型 (具体取决于文件类型):
- 数字解析器。默认情况下,所有文件类型的数字解析器都处于开启状态 除非指定了其他解析器类型。数字解析器处理 注入的文档(如果没有为数据指定其他默认解析器) 或者指定的解析器不支持某个文件的文件类型, 提取的文档。
- 对 PDF 文件进行 OCR 解析。如果您打算上传 您可以开启 OCR 解析器以改进 PDF 索引。 请参阅本文档的适用于 PDF 文件的 OCR 解析器部分。
- 布局解析器 -如果您打算将 Vertex AI Search 用于 RAG,请为 HTML、PDF 或 DOCX 文件启用布局解析器。请参阅 RAG 的分块文档,了解相关信息 以及如何开启它。
自带已解析的文档。(预览版,需要许可名单)如果您已解析非结构化文档,则可以将预解析的内容导入 Vertex AI Search。请参阅自带已解析的文档。
解析器可用性比较
下表按文档文件类型列出了每个解析器的可用性,并显示了每个解析器可以检测和解析哪些元素。
文件类型 | 数字解析器 | OCR 解析器 | 布局解析器 |
---|---|---|---|
HTML | 检测段落元素 | 不适用 | 检测段落、表格、列表、标题和标头元素 |
检测段落(数字文本)元素 | 检测段落元素 | 检测段落、表格、标题和标题元素 | |
DOCX(预览版) | 检测段落元素 | 不适用 | 检测段落、表格、列表、标题、标题元素 |
PPTX(预览版) | 检测段落元素 | 不适用 | 检测段落、表格、列表、标题、标题元素 |
TXT | 检测段落元素 | 不适用 | 检测段落、表格、标题、标头元素 |
XLSX(预览版) | 检测段落元素 | 不适用 | 检测段落、表格、标题、标题元素 |
数字解析器
数字解析器从文档中提取机器可读的文本。它可以检测文本块,但无法检测表格、列表和标题等文档元素。
如果您没有指定不同的 解析器设置为默认解析器,或者如果指定的解析器 不支持正在上传的文件类型。
适用于 PDF 的 OCR 解析器
如果您有不可搜索的 PDF 文件(扫描的 PDF 文件或包含图片中文本的 PDF 文件,例如信息图),Google 建议您在创建数据存储时开启光学字符识别 (OCR) 处理。这样,Vertex AI Search 便可提取段落元素。
如果您有可搜索的 PDF 或其他数字格式,并且这些格式主要由机器可读取的文本组成,则通常不需要使用光学字符识别 (OCR) 解析器。不过,
如果您的 PDF 文件中同时包含无法搜索的文本(例如扫描文本或
信息图)和机器可读的文本,则可以设置 useNativeText
字段
设为 true。在这种情况下,机器可读文本为
与 OCR 解析输出合并,以提高文本提取质量。
OCR 处理功能适用于具有非结构化特征的 和数据存储区。
OCR 处理器每个 PDF 文件最多可以解析 500 页。对于较长的 PDF 文件,OCR 处理器会解析前 500 页,默认解析器会解析其余页面。
布局解析器
借助布局解析,Vertex AI Search 可以检测 PDF 和 HTML 的布局。对 DOCX 文件的支持目前为预览版。然后,Vertex AI Search 可以识别文本块、表格、列表等内容元素,以及标题和标题等结构元素,并使用这些元素来定义文档的组织和层次结构。
您可以为所有文件类型开启布局解析,也可以指定哪个文件 以及各种类型的设备布局解析器会检测 段落、表格、列表和结构元素(如标题) 标头、脚注。
只有在为 RAG 使用文档分块时,布局解析器才可用。启用文档分块后,Vertex AI Search 会在提取时将文档拆分为分块,并可以将文档作为分块返回。检测文档布局可实现内容感知分块,并改进与文档元素相关的搜索和回答生成。如需详细了解如何为 RAG 分块文档,请参阅为 RAG 分块文档。
布局解析器支持的 PDF 文件大小上限为 40 MB。
指定默认解析器
在创建数据存储区时添加 documentProcessingConfig
对象,您可以为该数据存储区指定默认解析器。如果您不添加 documentProcessingConfig.defaultParsingConfig
,则
使用数字解析器。如果指定的解析器
无法用于某种文件类型。
REST
如需指定默认解析器,请执行以下操作:
使用该 API 创建搜索数据存储区时,请在数据存储区创建请求中添加
documentProcessingConfig.defaultParsingConfig
。您可以指定 OCR 解析器、布局解析器或 数字解析器:如需为 PDF 指定 OCR 解析器,请执行以下操作:
"documentProcessingConfig": { "defaultParsingConfig": { "ocrParsingConfig": { "useNativeText": "NATIVE_TEXT_BOOLEAN" } } }
NATIVE_TEXT_BOOLEAN
:可选。仅在提取时设置 PDF。如果设置为true
,则会开启机器可读文本 对 OCR 解析器的处理。默认值为false
。
如需指定布局解析器,请执行以下操作:
"documentProcessingConfig": { "defaultParsingConfig": { "layoutParsingConfig": {} } }
如需指定数字解析器,请执行以下操作:
"documentProcessingConfig": { "defaultParsingConfig": { "digitalParsingConfig": {} } }
示例
以下示例指定在数据存储区创建期间,OCR 解析器 将成为默认解析器。由于 OCR 解析器仅适用于 PDF 文件 OCR 解析器处理提取的所有 PDF 文件, 数字解析器则会处理其他类型的文件
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1alpha/projects/exampleproject/locations/global/collections/default_collection/dataStores?dataStoreId=datastore123" \
-d '{
"displayName": "exampledatastore",
"industryVertical": "GENERIC",
"solutionTypes": ["SOLUTION_TYPE_SEARCH"],
"contentConfig": "CONTENT_REQUIRED",
"documentProcessingConfig": {
"defaultParsingConfig": {
"ocrParsingConfig": {
"useNativeText": "false"
}
}
}
}'
为文件类型指定解析器替换项
您可以指定应使用与默认解析器不同的解析器解析特定文件类型(PDF、HTML 或 DOCX)。为此,请在数据存储区创建请求中添加 documentProcessingConfig
字段,并指定替换解析器。如果您未指定默认解析器,则默认使用数字解析器。
REST
如需指定特定于文件类型的解析器替换项,请执行以下操作:
使用 API 创建搜索数据存储区时, 在数据存储区中添加
documentProcessingConfig.defaultParsingConfig
创建请求。您可以为
pdf
、html
或docx
指定解析器:"documentProcessingConfig": { "parsingConfigOverrides": { "FILE_TYPE": { PARSING_CONFIG }, } }
替换以下内容:
FILE_TYPE
:接受的值包括pdf
、html
和docx
。PARSING_CONFIG
:指定要应用于文件类型的解析器配置。您可以指定 OCR 解析器、布局解析器或数字解析器:如需为 PDF 指定 OCR 解析器,请执行以下操作:
"ocrParsingConfig": { "useNativeText": "NATIVE_TEXT_BOOLEAN" }
NATIVE_TEXT_BOOLEAN
:可选。仅在提取 PDF 文件时设置。如果设置为true
,则会为 OCR 解析器启用机器可读文本处理。默认值为false
。
如需指定布局解析器,请执行以下操作:
"layoutParsingConfig": {}
如需指定数字解析器,请执行以下操作:
"documentProcessingConfig": { "defaultParsingConfig": { "digitalParsingConfig": {} } }
示例
以下示例在创建数据存储区时指定,PDF 文件应由光学字符识别 (OCR) 解析器处理,HTML 文件应由布局解析器处理。在这种情况下,系统会将除 PDF 和 HTML 文件以外的任何其他文件 由数字解析器处理
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1alpha/projects/exampleproject/locations/global/collections/default_collection/dataStores?dataStoreId=datastore123" \
-d '{
"displayName": "exampledatastore",
"industryVertical": "GENERIC",
"solutionTypes": ["SOLUTION_TYPE_SEARCH"],
"contentConfig": "CONTENT_REQUIRED",
"documentProcessingConfig": {
"parsingConfigOverrides": {
"pdf": {
"ocrParsingConfig": {
"useNativeText": "false"
},
},
"html": {
"layoutParsingConfig": {}
}
}
}
}'
获取已解析的 JSON 文档
您可以通过调用 getProcessedDocument
方法并将 PARSED_DOCUMENT
指定为处理的文档类型,以 JSON 格式获取解析后的文档。如果您需要将解析后的文档上传到其他位置,或者决定使用自行提供解析后的文档功能将解析后的文档重新导入 Vertex AI Agent Builder,则获取 JSON 格式的解析后文档会很有帮助。
REST
如需获取 JSON 格式的已解析文档,请按以下步骤操作:
调用
getProcessedDocument
方法:curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID:getProcessedDocument?processed_document_type=PARSED_DOCUMENT"
替换以下内容:
PROJECT_ID
:您的项目的 ID。DATA_STORE_ID
:数据存储区的 ID。DOCUMENT_ID
:要获取的文档的 ID。
自带已解析的文档
您可以将预解析的非结构化文档导入 Vertex AI Search 数据存储区。例如,您可以自行解析 PDF 文件,然后导入解析结果,而不是导入原始 PDF 文件。这样你就能以结构化方式导入文档 搜索和回答生成功能会提供有关文档布局的信息 和元素
经过解析的非结构化文档由 JSON 表示,该 JSON 使用一系列文本、表格和列表块来描述非结构化文档。您
以相同的方式导入包含解析后的非结构化文档数据的 JSON 文件
您可以导入其他类型的非结构化文档(例如 PDF)。启用此功能后,每当上传 JSON 文件并通过 application/json
MIME 类型或 .JSON 扩展名进行标识时,系统都会将其视为已解析的文档。
如需开启此功能以及了解如何使用此功能,请与您的 Google 客户支持团队联系。
对 RAG 的文档分块
默认情况下,Vertex AI Search 针对文档检索进行了优化,其中 每次搜索时,您的搜索应用都会返回一个文档,例如 PDF 或网页 结果。
文档分块功能适用于符合以下条件的一般搜索应用: 非结构化数据存储区。
Vertex AI Search 可以针对 RAG 进行优化, 应用主要用于使用您的自定义数据来增强 LLM 输出。启用文档分块后,Vertex AI Search 会将文档拆分为多个分块。在搜索结果中,您的搜索应用可以返回相关的数据块 而不是完整文档。为 RAG 使用分块数据可提高 LLM 回答的相关性,并减少 LLM 的计算负载。
如需将 Vertex AI Search 用于 RAG,请执行以下操作:
限制
以下限制适用于分块:
- 数据存储区创建后,便无法开启或关闭文档分块功能。
- 在启用文档分块的情况下,您可以对数据存储区发出文档搜索请求,而不是数据块搜索请求。不过,采用文档分块的数据存储区 不会针对返回文档进行优化。文档是由 将文本块聚合为文档的过程。
- 开启文档分块后,搜索摘要和搜索 公开预览版支持后续操作,但正式版不支持这类后续操作。
文档分块选项
本部分介绍了您需要指定哪些选项才能启用文档分块。
在创建数据存储区时,请开启以下选项,以便 Vertex AI Search 可以将文档编入索引作为分块。
感知布局的文档分块。要启用此选项,请添加
documentProcessingConfig
字段,并指定ChunkingConfig.LayoutBasedChunkingConfig
。启用布局感知型文档分块后,Vertex AI Search 会检测文档的布局,并在分块时将其考虑在内。这有助于提高内容在检索和 LLM 生成过程中的语义一致性,并减少内容中的噪声。块中的所有文本都来自 相同的布局实体,如标题、子标题和列表。
布局解析。要启用此选项,请指定
ParsingConfig.LayoutParsingConfig
(在数据存储区创建期间)。布局解析器可检测 PDF、HTML 和 DOCX 文件的布局。它能识别 文本块、表格、列表、标题和标题等元素, 定义文档的组织和层次结构。
如需详细了解布局解析,请参阅布局解析。
开启文档分块
您可以通过添加
documentProcessingConfig
对象
,并启用布局感知文档
分块和布局解析
REST
如需开启文档分块,请执行以下操作:
使用该 API 创建搜索数据存储区时,请在数据存储区创建请求中添加
documentProcessingConfig.chunkingConfig
对象。"documentProcessingConfig": { "chunkingConfig": { "layoutBasedChunkingConfig": { "chunkSize": CHUNK_SIZE_LIMIT, "includeAncestorHeadings": HEADINGS_BOOLEAN, } }, "defaultParsingConfig": { "layoutParsingConfig": {} } }
替换以下内容:
CHUNK_SIZE_LIMIT
:可选。令牌大小上限 每个数据块默认值为 500。支持的值包括 100-500 (含边界值)。HEADINGS_BOOLEAN
:可选。确定每个分块中是否包含标题。默认值为false
。正在附加标题 以及从文档中间开始的各个部分, 有助于防止数据块检索和排名中的上下文丢失。
自带分块(包含许可名单的预览版)
如果您已经将自己的文档分成了块,则可以将这些文档上传到 使用 Vertex AI Search,而不是启用文档分块选项。
“自带分块”是一项预览版,提供许可名单功能。如需使用此功能,请与您的 Google 客户支持团队联系。
列出文档的区块
如需列出特定文档的所有分块,请调用 Chunks.list
方法。
REST
如需列出文档的块,请按以下步骤操作:
调用
Chunks.list
方法curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID/chunks"
替换以下内容:
PROJECT_ID
:您的项目的 ID。DATA_STORE_ID
:您的数据存储区的 ID。DOCUMENT_ID
:要列出其分块的文档的 ID。
从已处理的文档中获取 JSON 格式的分块
您可以通过调用 getProcessedDocument
方法,以 JSON 格式获取特定文档中的所有分块。如果您符合以下情况,则获取 JSON 格式的分块会很有帮助:
或者,如果您决定将分块重新导入
Vertex AI Agent Builder。
REST
如需获取文档的 JSON 分块,请按以下步骤操作:
调用
getProcessedDocument
方法:curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID/chunks:getProcessedDocument?processed_document_type=CHUNKED_DOCUMENT"
替换以下内容:
PROJECT_ID
:您的项目的 ID。DATA_STORE_ID
:您的数据存储区的 ID。DOCUMENT_ID
:要获取分块的文档的 ID 。
获取特定分块
如需获取特定分块,请调用 Chunks.get
方法。
REST
如需获取特定分块,请按以下步骤操作:
调用
Chunks.get
方法curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID/chunks/CHUNK_ID"
替换以下内容:
PROJECT_ID
:您的项目的 ID。DATA_STORE_ID
:数据存储区的 ID。DOCUMENT_ID
:该分块所属文档的 ID。CHUNK_ID
:要返回的分块的 ID。
在搜索请求中返回数据块
确认数据已正确分块后,Vertex AI Search 便可在搜索结果中返回分块数据。
响应会返回与搜索查询相关的数据块。此外,您还可以选择返回源文档中相应区块前后显示的相邻区块。相邻的块可以增加上下文和 准确率。
REST
如需获取分块数据,请执行以下操作:
发出搜索请求时,将
ContentSearchSpec.SearchResultMode
指定为chunks
。contentSearchSpec": { "searchResultMode": "RESULT_MODE", "chunkSpec": { "numPreviousChunks": NUMBER_OF_PREVIOUS_CHUNKS, "numNextChunks": NUMBER_OF_NEXT_CHUNKS } }
RESULT_MODE
:确定搜索结果是作为完整文档还是分块返回。要获取数据块,数据存储区 您需要开启文档分块功能可接受的值为documents
和chunks
。如果您的数据存储区启用了文档分块功能, 默认值为chunks
。NUMBER_OF_PREVIOUS_CHUNKS
:要返回的紧随相关分块之前的分块数量。允许的最大值 为 5。NUMBER_OF_NEXT_CHUNKS
:要返回的分块数量 紧跟在相关文本块之后。允许的最大值为 5。
示例
以下搜索查询请求示例将 SearchResultMode
设置为
chunks
,请求上一个分块和下一个分块,并限制
使用 pageSize
将结果划分为单个相关分块。
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1alpha/projects/exampleproject/locations/global/collections/default_collection/dataStores/datastore123/servingConfigs/default_search:search" \
-d '{
"query": "animal",
"pageSize": 1,
"contentSearchSpec": {
"searchResultMode": "CHUNKS",
"chunkSpec": {
"numPreviousChunks": 1,
"numNextChunks": 1
}
}
}'
以下示例显示了针对示例查询返回的响应。 响应包含相关的块、前一个和下一个块、原始文档的元数据,以及每个块派生自的文档页面范围。
响应
{ "results": [ { "chunk": { "name": "projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c17", "id": "c17", "content": "\n# ESS10: Stakeholder Engagement and Information Disclosure\nReaders should also refer to ESS10 and its guidance notes, plus the template available for a stakeholder engagement plan. More detail on stakeholder engagement in projects with risks related to animal health is contained in section 4 below. The type of stakeholders (men and women) that can be engaged by the Borrower as part of the project's environmental and social assessment and project design and implementation are diverse and vary based on the type of intervention. The stakeholders can include: Pastoralists, farmers, herders, women's groups, women farmers, community members, fishermen, youths, etc. Cooperatives members, farmer groups, women's livestock associations, water user associations, community councils, slaughterhouse workers, traders, etc. Veterinarians, para-veterinary professionals, animal health workers, community animal health workers, faculties and students in veterinary colleges, etc. 8 \n# 4. Good Practice in Animal Health Risk Assessment and Management\n\n# Approach\nRisk assessment provides the transparent, adequate and objective evaluation needed by interested parties to make decisions on health-related risks associated with project activities involving live animals. As the ESF requires, it is conducted throughout the project cycle, to provide or indicate likelihood and impact of a given hazard, identify factors that shape the risk, and find proportionate and appropriate management options. The level of risk may be reduced by mitigation measures, such as infrastructure (e.g., diagnostic laboratories, border control posts, quarantine stations), codes of practice (e.g., good animal husbandry practices, on-farm biosecurity, quarantine, vaccination), policies and regulations (e.g., rules for importing live animals, ban on growth hormones and promotors, feed standards, distance required between farms, vaccination), institutional capacity (e.g., veterinary services, surveillance and monitoring), changes in individual behavior (e.g., hygiene, hand washing, care for animals). Annex 2 provides examples of mitigation practices. This list is not an exhaustive one but a compendium of most practiced interventions and activities. The cited measures should take into account social, economic, as well as cultural, gender and occupational aspects, and other factors that may affect the acceptability of mitigation practices by project beneficiaries and other stakeholders. Risk assessment is reviewed and updated through the project cycle (for example to take into account increased trade and travel connectivity between rural and urban settings and how this may affect risks of disease occurrence and/or outbreak). Projects monitor changes in risks (likelihood and impact) b by using data, triggers or indicators. ", "documentMetadata": { "uri": "gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf", "title": "AnimalHealthGoodPracticeNote" }, "pageSpan": { "pageStart": 14, "pageEnd": 15 }, "chunkMetadata": { "previousChunks": [ { "name": "projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c16", "id": "c16", "content": "\n# ESS6: Biodiversity Conservation and Sustainable Management of Living Natural Resources\nThe risks associated with livestock interventions under ESS6 include animal welfare (in relation to housing, transport, and slaughter); diffusion of pathogens from domestic animals to wildlife, with risks for endemic species and biodiversity (e.g., sheep and goat plague in Mongolia affecting the saiga, an endemic species of wild antelope); the introduction of new breeds with potential risk of introducing exotic or new diseases; and the release of new species that are not endemic with competitive advantage, potentially putting endemic species at risk of extinction. Animal welfare relates to how an animal is coping with the conditions in which it lives. An animal is in a good state of welfare if it is healthy, comfortable, well nourished, safe, able to express innate behavior, 7 Good Practice Note - Animal Health and related risks and is not suffering from unpleasant states such as pain, fear or distress. Good animal welfare requires appropriate animal care, disease prevention and veterinary treatment; appropriate shelter, management and nutrition; humane handling, slaughter or culling. The OIE provides standards for animal welfare on farms, during transport and at the time of slaughter, for their welfare and for purposes of disease control, in its Terrestrial and Aquatic Codes. The 2014 IFC Good Practice Note: Improving Animal Welfare in Livestock Operations is another example of practical guidance provided to development practitioners for implementation in investments and operations. Pastoralists rely heavily on livestock as a source of food, income and social status. Emergency projects to restock the herds of pastoralists affected by drought, disease or other natural disaster should pay particular attention to animal welfare (in terms of transport, access to water, feed, and animal health) to avoid potential disease transmission and ensure humane treatment of animals. Restocking also entails assessing the assets of pastoralists and their ability to maintain livestock in good conditions (access to pasture and water, social relationship, technical knowledge, etc.). Pastoralist communities also need to be engaged by the project to determine the type of animals and breed and the minimum herd size to be considered for restocking. \n# Box 5. Safeguarding the welfare of animals and related risks in project activities\nIn Haiti, the RESEPAG project (Relaunching Agriculture: Strengthening Agriculture Public Services) financed housing for goats and provided technical recommendations for improving their welfare, which is critical to avoid the respiratory infections, including pneumonia, that are serious diseases for goats. To prevent these diseases, requires optimal sanitation and air quality in herd housing. This involves ensuring that buildings have adequate ventilation and dust levels are reduced to minimize the opportunity for infection. Good nutrition, water and minerals are also needed to support the goats' immune function. The project paid particular attention to: (i) housing design to ensure good ventilation; (ii) locating housing close to water sources and away from human habitation and noisy areas; (iii) providing mineral blocks for micronutrients; (iv) ensuring availability of drinking water and clean food troughs. ", "documentMetadata": { "uri": "gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf", "title": "AnimalHealthGoodPracticeNote" }, "pageSpan": { "pageStart": 13, "pageEnd": 14 } } ], "nextChunks": [ { "name": "projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c18", "id": "c18", "content": "\n# Scoping of risks\nEarly scoping of risks related to animal health informs decisions to initiate more comprehensive risk assessment according to the type of livestock interventions and activities. It can be based on the following considerations: • • • • Type of livestock interventions supported by the project (such as expansion of feed resources, improvement of animal genetics, construction/upgrading and management of post-farm-gate facilities, etc. – see also Annex 2); Geographic scope and scale of the livestock interventions; Human and animal populations that are likely to be affected (farmers, women, children, domestic animals, wildlife, etc.); and Changes in the project or project context (such as emerging disease outbreak, extreme weather or climatic conditions) that would require a re-assessment of risk levels, mitigation measures and their likely effect on risk reduction. Scenario planning can also help to identify project-specific vulnerabilities, country-wide or locally, and help shape pragmatic analyses that address single or multiple hazards. In this process, some populations may be identified as having disproportionate exposure or vulnerability to certain risks because of occupation, gender, age, cultural or religious affiliation, socio-economic or health status. For example, women and children may be the main caretakers of livestock in the case of 9 Good Practice Note - Animal Health and related risks household farming, which puts them into close contact with animals and animal products. In farms and slaughterhouses, workers and veterinarians are particularly exposed, as they may be in direct contact with sick animals (see Box 2 for an illustration). Fragility, conflict, and violence (FCV) can exacerbate risk, in terms of likelihood and impact. Migrants new to a geographic area may be immunologically naïve to endemic zoonotic diseases or they may inadvertently introduce exotic diseases; and refugees or internally displaced populations may have high population density with limited infrastructure, leaving them vulnerable to disease exposure. Factors such as lack of access to sanitation, hygiene, housing, and health and veterinary services may also affect disease prevalence, contributing to perpetuation of poverty in some populations. Risk assessment should identify populations at risk and prioritize vulnerable populations and circumstances where risks may be increased. It should be noted that activities that seem minor can still have major consequences. See Box 6 for an example illustrating how such small interventions in a project may have large-scale consequences. It highlights the need for risk assessment, even for simple livestock interventions and activities, and how this can help during the project cycle (from concept to implementation). ", "documentMetadata": { "uri": "gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf", "title": "AnimalHealthGoodPracticeNote" }, "pageSpan": { "pageStart": 15, "pageEnd": 16 } } ] } } } ], "totalSize": 61, "attributionToken": "jwHwjgoMCICPjbAGEISp2J0BEiQ2NjAzMmZhYS0wMDAwLTJjYzEtYWQxYS1hYzNlYjE0Mzc2MTQiB0dFTkVSSUMqUMLwnhXb7Ygtq8SKLa3Eii3d7Ygtj_enIqOAlyLm7Ygtt7eMLduPmiKN96cijr6dFcXL8xfdj5oi9-yILdSynRWCspoi-eyILYCymiLk7Ygt", "nextPageToken": "ANxYzNzQTMiV2MjFWLhFDZh1SMjNmMtADMwATL5EmZyMDM2YDJaMQv3yagQYAsciPgIwgExEgC", "guidedSearchResult": {}, "summary": {} }