如果您有使用結構化資料的搜尋應用程式,或使用含有中繼資料的非結構化資料,可以運用中繼資料篩選搜尋查詢。本頁說明如何使用中繼資料欄位,將搜尋範圍限制在特定文件集。
事前準備
請確認您已建立應用程式,並擷取結構化資料或含有中繼資料的非結構化資料。詳情請參閱「建立搜尋應用程式」。
中繼資料範例
請參閱以下四個 PDF 檔案 (document_1.pdf
、document_2.pdf
、document_3.pdf
和 document_4.pdf
) 的中繼資料範例。這些中繼資料會與 PDF 檔案一起存放在 Cloud Storage bucket 的 JSON 檔案中。閱讀本頁內容時,您可以隨時返回這個範例。
{"id": "1", "structData": {"title": "Policy on accepting corrected claims", "category": ["persona_A"]}, "content": {"mimeType": "application/pdf", "uri": "gs://bucketname_87654321/data/document_1.pdf"}}
{"id": "2", "structData": {"title": "Claims documentation and reporting guidelines for commercial members", "category": ["persona_A", "persona_B"]}, "content": {"mimeType": "application/pdf", "uri": "gs://bucketname_87654321/data/document_2.pdf"}}
{"id": "3", "structData": {"title": "Claims guidelines for bundled services and supplies for commercial members", "category": ["persona_B", "persona_C"]}, "content": {"mimeType": "application/pdf", "uri": "gs://bucketname_87654321/data/document_3.pdf"}}
{"id": "4", "structData": {"title": "Advantage claims submission guidelines", "category": ["persona_A", "persona_C"]}, "content": {"mimeType": "application/pdf", "uri": "gs://bucketname_87654321/data/document_4.pdf"}}
篩選運算式語法
請務必瞭解用於定義搜尋篩選器的篩選運算式語法。篩選運算式語法可歸納為下列擴充巴科斯諾爾形式:
# A single expression or multiple expressions that are joined by "AND" or "OR". filter = expression, { " AND " | "OR", expression }; # Expressions can be prefixed with "-" or "NOT" to express a negation. expression = [ "-" | "NOT " ], # A parenthetical expression. | "(", expression, ")" # A simple expression applying to a text field. # Function "ANY" returns true if the field exactly matches any of the literals. ( text_field, ":", "ANY", "(", literal, { ",", literal }, ")" # A simple expression applying to a numerical field. Function "IN" returns true # if a field value is within the range. By default, lower_bound is inclusive and # upper_bound is exclusive. | numerical_field, ":", "IN", "(", lower_bound, ",", upper_bound, ")" # A simple expression that applies to a numerical field and compares with a double value. | numerical_field, comparison, double # An expression that applies to a geolocation field with text/street/postal address. | geolocation_field, ":", "GEO_DISTANCE(", literal, ",", distance_in_meters, ")" # An expression that applies to a geolocation field with latitude and longitude. | geolocation_field, ":", "GEO_DISTANCE(", latitude_double, ",", longitude_double, ",", distance_in_meters, ")" # Datetime field | datetime_field, comparison, literal_iso_8601_datetime_format); # A lower_bound is either a double or "*", which represents negative infinity. # Explicitly specify inclusive bound with the character 'i' or exclusive bound # with the character 'e'. lower_bound = ( double, [ "e" | "i" ] ) | "*"; # An upper_bound is either a double or "*", which represents infinity. # Explicitly specify inclusive bound with the character 'i' or exclusive bound # with the character 'e'. upper_bound = ( double, [ "e" | "i" ] ) | "*"; # Supported comparison operators. comparison = "<=" | "<" | ">=" | ">" | "="; # A literal is any double quoted string. You must escape backslash (\) and # quote (") characters. literal = double quoted string; text_field = text field - for example, category; numerical_field = numerical field - for example, score; geolocation_field = field of geolocation data type - for example home_address, location; datetime_field = field of datetime data type - for example creation_date, expires_on; literal_iso_8601_datetime_format = either a double quoted string representing ISO 8601 datetime or a numerical field representing microseconds from unix epoch.
使用中繼資料篩選器搜尋
如要使用中繼資料篩選器搜尋,請按照下列步驟操作:
決定要使用哪個中繼資料欄位來篩選搜尋查詢。舉例來說,針對「開始前」一節中的中繼資料,您可以使用
category
欄位做為搜尋篩選器。使用者可以依persona_A
、persona_B
或persona_C
篩選,將搜尋範圍限制在與感興趣的角色相關聯的文件。將中繼資料欄位設為可建立索引:
前往 Google Cloud 控制台的「AI Applications」頁面,然後在導覽選單中點選「Apps」。
按一下搜尋應用程式。
在導覽選單中,按一下「資料」。
按一下 [Schema] (結構定義) 分頁標籤。這個分頁會顯示目前的欄位設定。
按一下 [編輯]。
找出要設為可建立索引的欄位,然後選取「可建立索引」核取方塊。
按一下 [儲存]。詳情請參閱「設定欄位設定」。
找出資料儲存庫 ID。如果已有資料商店 ID,請跳到下一個步驟。
前往 Google Cloud 控制台的「AI Applications」頁面,然後在導覽選單中點選「Data Stores」。
點按資料儲存庫的名稱。
在資料儲存庫的「資料」頁面中,取得資料儲存庫 ID。
取得搜尋結果。
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/servingConfigs/default_search:search" \ -d '{ "query": "QUERY", "filter": "FILTER" }'
更改下列內容:
PROJECT_ID
:您的專案 ID。DATA_STORE_ID
:資料儲存庫的 ID。QUERY
:要搜尋的查詢文字。FILTER
:選用。文字欄位,可讓您使用篩選運算式語法,針對一組特定欄位進行篩選。預設值為空字串,表示未套用任何篩選器。
舉例來說,假設您從「事前準備」一節匯入了四個 PDF 檔案和中繼資料。您想搜尋含有「claims」一詞的文件,且只查詢
category
值為persona_A
的文件。如要這麼做,請在呼叫中加入下列陳述式:"query": "claims", "filter": "category: ANY(\"persona_A\")"
詳情請參閱「Get search results for an app with structured or unstructured data」的 REST 分頁。
按一下即可查看範例回覆。
如果您執行類似上述程序的搜尋,預期會收到類似以下的回覆。請注意,回應包含
category
值為persona_A
的三份文件。{ "results": [ { "id": "2", "document": { "name": "projects/abcdefg/locations/global/collections/default_collection/dataStores/search_store_id/branches/0/documents/2", "id": "2", "structData": { "title": "Claims documentation and reporting guidelines for commercial members", "category": [ "persona_A", "persona_B" ] }, "derivedStructData": { "link": "gs://bucketname_87654321/data/document_2.pdf", "extractive_answers": [ { "pageNumber": "1", "content": "lorem ipsum" } ] } } }, { "id": "1", "document": { "name": "projects/abcdefg/locations/global/collections/default_collection/dataStores/search_store_id/branches/0/documents/1", "id": "1", "structData": { "title": "Policy on accepting corrected claims", "category": [ "persona_A" ] }, "derivedStructData": { "extractive_answers": [ { "pageNumber": "2", "content": "lorem ipsum" } ], "link": "gs://bucketname_87654321/data/document_1.pdf" } } }, { "id": "4", "document": { "name": "projects/abcdefg/locations/global/collections/default_collection/dataStores/search_store_id/branches/0/documents/4", "id": "4", "structData": { "title": "Advantage claims submission guidelines", "category": [ "persona_A", "persona_C" ] }, "derivedStructData": { "extractive_answers": [ { "pageNumber": "47", "content": "lorem ipsum" } ], "link": "gs://bucketname_87654321/data/document_4.pdf" } } } ], "totalSize": 330, "attributionToken": "UvBRCgsI26PxpQYQs7vQZRIkNjRiYWY1MTItMDAwMC0yZWIwLTg3MTAtMTQyMjNiYzYzMWEyIgdHRU5FUklDKhSOvp0VpovvF8XL8xfC8J4V1LKdFQ", "guidedSearchResult": {}, "summary": {} }
篩選運算式範例
下表提供篩選器運算式範例。
篩選器 | 只會傳回符合下列條件的文件結果: |
---|---|
category: ANY("persona_A") |
文字欄位 category 為 persona_A |
score: IN(*, 100.0e) |
數值欄位 score 大於負無限大,且小於 100.0 |
non-smoking = "true" |
布林值 non-smoking 為 true |
pet-friendly = "false" |
布林值 pet-friendly 為 false |
manufactured_date = "2023" |
manufactured date 為 2023 年的任何時間 |
manufactured_date >= "2024-04-16" |
manufactured_date 在 2024 年 4 月 16 日當天或之後 |
manufactured_date < "2024-04-16T12:00:00-07:00" |
manufactured_date 是 2024 年 4 月 16 日太平洋夏令時間中午 12 點前 |
office.location:GEO_DISTANCE("1600 Amphitheater Pkwy, Mountain View, CA, 94043", 500) |
地理位置欄位 office.location 位於 1600 Amphitheater Pkwy 的 500 公尺範圍內 |
NOT office.location:GEO_DISTANCE("Palo Alto, CA", 1000) |
地理位置欄位 office.location 不在加州帕羅奧圖方圓 1 公里內。 |
office.location:GEO_DISTANCE(34.1829, -121.293, 500) |
地理位置欄位 office.location 位於緯度 34.1829 和經度 -121.293 的 500 公尺半徑內 |
category: ANY("persona_A") AND score: IN(*, 100.0e) |
category 為 persona_A ,且 score 小於 100 |
office.location:GEO_DISTANCE("Mountain View, CA", 500) OR office.location:GEO_DISTANCE("Palo Alto, CA", 500) |
office.location 位於山景城或帕羅奧圖 500 公尺內。 |
(price<175 AND pet-friendly = "true") OR (price<125 AND pet-friendly = "false") |
price 小於 175,我可以攜帶寵物;或 price 小於 125,我無法攜帶寵物 |
後續步驟
- 如要瞭解篩選器對搜尋品質的影響,請評估搜尋品質。詳情請參閱評估搜尋品質。