如果您有媒体搜索应用,则可以使用元数据过滤搜索查询。本页介绍了如何使用元数据字段将搜索范围限制为一组特定文档。
准备工作
确保您已创建媒体应用和数据存储区,并提取了数据。如需了解详情,请参阅创建媒体数据存储区和创建媒体应用。
文件示例
请查看以下媒体文件示例。在阅读本页内容时,您可以随时参考这些术语。
{"id":"172851","schemaId":"default_schema","jsonData":"{\"title\":\"Avatar: Creating the World of Pandora (2010)\",\"categories\":[\"Documentary\"],\"uri\":\"http://mytestdomain.movie/content/172851\",\"available_time\":\"2023-01-01T00:00:00Z\",\"media_type\":\"movie\"}"}
{"id":"243308","schemaId":"default_schema","jsonData":"{\"title\":\"Capturing Avatar (2010)\",\"categories\":[\"Documentary\"],\"uri\":\"http://mytestdomain.movie/content/243308\",\"available_time\":\"2023-01-01T00:00:00Z\",\"media_type\":\"movie\"}"}
{"id":"280218","schemaId":"default_schema","jsonData":"{\"title\":\"Avatar: The Way of Water (2022)\",\"categories\":[\"Action\",\"Adventure\",\"Sci-Fi\"],\"uri\":\"http://mytestdomain.movie/content/280218\",\"available_time\":\"2023-01-01T00:00:00Z\",\"media_type\":\"movie\"}"}
{"id":"72998","schemaId":"default_schema","jsonData":"{\"title\":\"Avatar (2009)\",\"categories\":[\"Action\",\"Adventure\",\"Sci-Fi\",\"IMAX\"],\"uri\":\"http://mytestdomain.movie/content/72998\",\"available_time\":\"2023-01-01T00:00:00Z\",\"media_type\":\"movie\"}"}
过滤器表达式语法
请务必了解您将用于定义搜索过滤条件的过滤条件表达式语法。过滤条件表达式语法可按以下扩展巴科斯范式总结:
# A single expression or multiple expressions that are joined by "AND" or "OR". filter = expression, { " AND " | "OR", expression }; # Expressions can be prefixed with "-" or "NOT" to express a negation. expression = [ "-" | "NOT " ], # A parenthetical expression. | "(", expression, ")" # A simple expression applying to a text field. # Function "ANY" returns true if the field contains any of the literals. ( text_field, ":", "ANY", "(", literal, { ",", literal }, ")" # A simple expression applying to a numerical field. Function "IN" returns true # if a field value is within the range. By default, lower_bound is inclusive and # upper_bound is exclusive. | numerical_field, ":", "IN", "(", lower_bound, ",", upper_bound, ")" # A simple expression that applies to a numerical field and compares with a double value. | numerical_field, comparison, double ); # Datetime field | datetime_field, comparison, literal_iso_8601_datetime_format); # A lower_bound is either a double or "*", which represents negative infinity. # Explicitly specify inclusive bound with the character 'i' or exclusive bound # with the character 'e'. lower_bound = ( double, [ "e" | "i" ] ) | "*"; # An upper_bound is either a double or "*", which represents infinity. # Explicitly specify inclusive bound with the character 'i' or exclusive bound # with the character 'e'. upper_bound = ( double, [ "e" | "i" ] ) | "*"; # Supported comparison operators. comparison = "<=" | "<" | ">=" | ">" | "="; # A literal is any double quoted string. You must escape backslash (\) and # quote (") characters. literal = double quoted string; text_field = text field - for example, category; numerical_field = numerical field - for example, score; datetime_field = field of datetime data type - for example available_time; literal_iso_8601_datetime_format = either a double quoted string representing ISO 8601 datetime or a numerical field representing microseconds from unix epoch.
过滤媒体搜索
如需使用元数据过滤媒体搜索结果,请按以下步骤操作:
找到您的数据存储区 ID。如果您已拥有数据存储区 ID,请跳至下一步。
在 Google Cloud 控制台中,前往 Agent Builder 页面,然后在导航菜单中点击 Data Stores。
点击您的数据存储区的名称。
在数据存储区的数据页面上,获取数据存储区 ID。
确定要按哪个/哪些文档字段进行过滤。例如,对于准备工作中的文档,您可以使用
categories
字段作为过滤条件。您只能在过滤条件表达式中使用可编入索引的字段。如需确定字段是否可编入索引,请执行以下操作:
在 Google Cloud 控制台中,前往 Agent Builder 页面,然后在导航菜单中点击 Data Stores。
点击您的数据存储区的名称。
在名称列中,点击相应数据存储区。
点击 Schema 标签页可查看数据存储区的架构。如果字段的可编入索引设置为:
选择了
,然后该字段即可用于过滤搜索结果;请跳过第 3 步。未选择
,然后按照第 3 步操作以启用该字段以进行编入索引。不可用
,则该字段无法编入索引。
如需使某个字段(例如
categories
字段)可过滤,请执行以下操作:在 Google Cloud 控制台中,前往 Agent Builder 页面,然后在导航菜单中点击 Apps。
点击您的媒体搜索应用。
在导航菜单中,点击数据。
点击架构标签页。此标签页会显示当前的字段设置。
点击修改。
如果尚未选中,请选中类别行中的可编入索引复选框,然后点击保存。
请等待 6 小时,以便架构修改生效。六小时后,您可以继续执行下一步。
获取搜索结果。
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/servingConfigs/default_search:search" \ -d '{ "query": "QUERY", "filter": "FILTER" }'
- PROJECT_ID:您的项目的 ID。
- DATA_STORE_ID:数据存储区的 ID。
- QUERY:要搜索的查询文本。
- FILTER:一个文本字段,用于使用过滤表达式过滤搜索结果。
例如,假设您想搜索开始前须知部分中的电影,并且希望搜索结果仅包含以下条件的电影:(1) 包含“Avatar”一词,(2) 属于“纪录片”类别。为此,您需要在调用中添加以下语句:
"query": "avatar", "filter": "categories: ANY(\"Documentary\")"
如需了解详情,请参阅
search
方法。点击查看示例回复。
如果您执行与上文中相似的搜索,则应该会收到类似于以下内容的响应。请注意,响应中仅包含《阿凡达》纪录片。
{ "results": [ { "id": "243308", "document": { "name": "projects/431678329718/locations/global/collections/default_collection/dataStores/rdds3_1698205785399/branches/0/documents/243308", "id": "243308", "structData": { "categories": [ "Documentary" ], "title": "Capturing Avatar (2010)", "uri": "http://mytestdomain.movie/content/243308", "media_type": "movie" } } }, { "id": "172851", "document": { "name": "projects/431678329718/locations/global/collections/default_collection/dataStores/rdds3_1698205785399/branches/0/documents/172851", "id": "172851", "structData": { "categories": [ "Documentary" ], "uri": "http://mytestdomain.movie/content/172851", "media_type": "movie", "title": "Avatar: Creating the World of Pandora (2010)" } } } ], "totalSize": 2, "attributionToken": "XfBcCgwIvIzJqwYQ2_qNxwMSJDY1NzEzNmY1LTAwMDAtMmFhMy05YWU3LTE0MjIzYmIwOGVkMiIFTUVESUEqII6-nRXFy_MXnIaOIsLwnhXUsp0VpovvF6OAlyKiho4i", "guidedSearchResult": {}, "summary": {} }