Filter generic search for structured or unstructured data

If you have a search app that uses structured data, or unstructured data with metadata, you can use the metadata to filter your search queries. This page explains how use metadata fields to restrict your search to a specific set of documents.

Before you begin

Make sure you have created an app and ingested structured data, or unstructured data with metadata. For more information, see Create a search app.

Metadata example

Review this example of metadata for four PDF files (document_1.pdf, document_2.pdf, document_3.pdf, and document_4.pdf). This metadata would be in a JSON file in a Cloud Storage bucket, along with the PDF files. You can refer back to this example as you read through this page.

{"id": "1", "structData": {"title": "Policy on accepting corrected claims", "category": ["persona_A"]}, "content": {"mimeType": "application/pdf", "uri": "gs://bucketname_87654321/data/document_1.pdf"}}
{"id": "2", "structData": {"title": "Claims documentation and reporting guidelines for commercial members", "category": ["persona_A", "persona_B"]}, "content": {"mimeType": "application/pdf", "uri": "gs://bucketname_87654321/data/document_2.pdf"}}
{"id": "3", "structData": {"title": "Claims guidelines for bundled services and supplies for commercial members", "category": ["persona_B", "persona_C"]}, "content": {"mimeType": "application/pdf", "uri": "gs://bucketname_87654321/data/document_3.pdf"}}
{"id": "4", "structData": {"title": "Advantage claims submission guidelines", "category": ["persona_A", "persona_C"]}, "content": {"mimeType": "application/pdf", "uri": "gs://bucketname_87654321/data/document_4.pdf"}}

Filter expression syntax

Make sure you understand the syntax of the filter expression that you'll use to define your search filter. The filter expression syntax can be summarized by the following Extended Backus–Naur form:

  # A single expression or multiple expressions that are joined by "AND" or "OR".
  filter = expression, { " AND " | "OR", expression };
  # Expressions can be prefixed with "-" or "NOT" to express a negation.
  expression = [ "-" | "NOT " ],
    # A parenthetical expression.
    | "(", expression, ")"
    # A simple expression applying to a text field.
    # Function "ANY" returns true if the field exactly matches any of the literals.
    ( text_field, ":", "ANY", "(", literal, { ",", literal }, ")"
    # A simple expression applying to a numerical field. Function "IN" returns true
    # if a field value is within the range. By default, lower_bound is inclusive and
    # upper_bound is exclusive.
    | numerical_field, ":", "IN", "(", lower_bound, ",", upper_bound, ")"
    # A simple expression that applies to a numerical field and compares with a double value.
    | numerical_field, comparison, double );
  # A lower_bound is either a double or "*", which represents negative infinity.
  # Explicitly specify inclusive bound with the character 'i' or exclusive bound
  # with the character 'e'.
  lower_bound = ( double, [ "e" | "i" ] ) | "*";
  # An upper_bound is either a double or "*", which represents infinity.
  # Explicitly specify inclusive bound with the character 'i' or exclusive bound
  # with the character 'e'.
  upper_bound = ( double, [ "e" | "i" ] ) | "*";
  # Supported comparison operators.
  comparison = "<=" | "<" | ">=" | ">" | "=";
  # A literal is any double quoted string. You must escape backslash (\) and
  # quote (") characters.
  literal = double quoted string;
  text_field = text field - for example, category;
  numerical_field = numerical field - for example, score;

Search using a metadata filter

To search using a metadata filter, follow these steps:

  1. Determine the metadata field to use for filtering your search queries. For example, for the metadata in Before you begin, you could use the category field as a search filter. Your users could filter on persona_A, persona_B, or persona_C, so their search is restricted to the documents associated with the persona that they're interested in.

  2. Make the metadata field indexable:

    1. In the Google Cloud console, go to the Agent Builder page and in the navigation menu, click Apps.

      Go to the Apps page

    2. Click your search app.

    3. In the navigation menu, click Data.

    4. Click the Schema tab. This tab shows current field settings.

    5. Click Edit.

    6. Select the Indexable checkbox for the field that you want to make indexable.

    7. Click Save. For more information, see Configure field settings.

  3. Find your data store ID. If you already have your data store ID, skip to the next step.

    1. In the Google Cloud console, go to the Agent Builder page and in the navigation menu, click Data stores.

      Go to the Data stores page

    2. Click the name of your data store.

    3. On the Data page for your data store, get the data store ID.

  4. Get search results.

    curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/servingConfigs/default_search:search" \
    -d '{
    "query": "QUERY",
    "filter": "FILTER"
    }'
    
    • PROJECT_ID: The ID of your project.
    • DATA_STORE_ID: The ID your data store.
    • QUERY: The query text to search.
    • FILTER: Optional. A text field that lets you to filter on a specified set of fields, using filter expression syntax. The default value is an empty string, which means no filter is applied.

    For example, say that you imported the four PDF files with metadata from Before you begin. You want to search for documents that contain the word "claims", and only query documents with a category value of persona_A. You would do that by including the following statements with your call:

    "query": "claims",
    "filter": "category: ANY(\"persona_A\") AND score: IN(*, 100.0e)"
    

    For more information, see the REST tab at Get search results for an app with structured or unstructured data.

    Click for an example response.

    If you perform a search like the one in the preceding procedure, you can expect to get a response similar to the following. Notice that the response includes the three documents that have a category value of persona_A.

    {
    "results": [
    {
      "id": "2",
      "document": {
        "name": "projects/abcdefg/locations/global/collections/default_collection/dataStores/enterprisesearch_store_id/branches/0/documents/2",
        "id": "2",
        "structData": {
          "title": "Claims documentation and reporting guidelines for commercial members",
          "category": [
            "persona_A",
            "persona_B"
          ]
        },
        "derivedStructData": {
          "link": "gs://bucketname_87654321/data/document_2.pdf",
          "extractive_answers": [
            {
              "pageNumber": "1",
              "content": "lorem ipsum"
            }
          ]
        }
      }
    },
    {
      "id": "1",
      "document": {
        "name": "projects/abcdefg/locations/global/collections/default_collection/dataStores/enterprisesearch_store_id/branches/0/documents/1",
        "id": "1",
        "structData": {
          "title": "Policy on accepting corrected claims",
          "category": [
            "persona_A"
          ]
        },
        "derivedStructData": {
          "extractive_answers": [
            {
              "pageNumber": "2",
              "content": "lorem ipsum"
            }
          ],
          "link": "gs://bucketname_87654321/data/document_1.pdf"
        }
      }
    },
    {
      "id": "4",
      "document": {
        "name": "projects/abcdefg/locations/global/collections/default_collection/dataStores/enterprisesearch_store_id/branches/0/documents/4",
        "id": "4",
        "structData": {
          "title": "Advantage claims submission guidelines",
          "category": [
            "persona_A",
            "persona_C"
          ]
        },
        "derivedStructData": {
          "extractive_answers": [
            {
              "pageNumber": "47",
              "content": "lorem ipsum"
            }
          ],
          "link": "gs://bucketname_87654321/data/document_4.pdf"
        }
      }
    }
    ],
    "totalSize": 330,
    "attributionToken": "UvBRCgsI26PxpQYQs7vQZRIkNjRiYWY1MTItMDAwMC0yZWIwLTg3MTAtMTQyMjNiYzYzMWEyIgdHRU5FUklDKhSOvp0VpovvF8XL8xfC8J4V1LKdFQ",
    "guidedSearchResult": {},
    "summary": {}
    }