Filter media search

If you have a media search app, you can use metadata to filter your search queries. This page explains how use metadata fields to restrict your search to a specific set of documents.

Before you begin

Make sure you have created a media app and data store and ingested data. For more information, see Create a media data store and Create a media app.

Example documents

Review these example media documents. You can refer back to them as you read through this page.

{"id":"172851","schemaId":"default_schema","jsonData":"{\"title\":\"Avatar: Creating the World of Pandora (2010)\",\"categories\":[\"Documentary\"],\"uri\":\"http://mytestdomain.movie/content/172851\",\"available_time\":\"2023-01-01T00:00:00Z\",\"media_type\":\"movie\"}"}
{"id":"243308","schemaId":"default_schema","jsonData":"{\"title\":\"Capturing Avatar (2010)\",\"categories\":[\"Documentary\"],\"uri\":\"http://mytestdomain.movie/content/243308\",\"available_time\":\"2023-01-01T00:00:00Z\",\"media_type\":\"movie\"}"}
{"id":"280218","schemaId":"default_schema","jsonData":"{\"title\":\"Avatar: The Way of Water (2022)\",\"categories\":[\"Action\",\"Adventure\",\"Sci-Fi\"],\"uri\":\"http://mytestdomain.movie/content/280218\",\"available_time\":\"2023-01-01T00:00:00Z\",\"media_type\":\"movie\"}"}
{"id":"72998","schemaId":"default_schema","jsonData":"{\"title\":\"Avatar (2009)\",\"categories\":[\"Action\",\"Adventure\",\"Sci-Fi\",\"IMAX\"],\"uri\":\"http://mytestdomain.movie/content/72998\",\"available_time\":\"2023-01-01T00:00:00Z\",\"media_type\":\"movie\"}"}

Filter expression syntax

Make sure you understand the filter expression syntax that you'll use to define your search filter. The filter expression syntax can be summarized by the following Extended Backus–Naur form:

  # A single expression or multiple expressions that are joined by "AND" or "OR".
  filter = expression, { " AND " | "OR", expression };
  # Expressions can be prefixed with "-" or "NOT" to express a negation.
  expression = [ "-" | "NOT " ],
    # A parenthetical expression.
    | "(", expression, ")"
    # A simple expression applying to a text field.
    # Function "ANY" returns true if the field contains any of the literals.
    ( text_field, ":", "ANY", "(", literal, { ",", literal }, ")"
    # A simple expression applying to a numerical field. Function "IN" returns true
    # if a field value is within the range. By default, lower_bound is inclusive and
    # upper_bound is exclusive.
    | numerical_field, ":", "IN", "(", lower_bound, ",", upper_bound, ")"
    # A simple expression that applies to a numerical field and compares with a double value.
    | numerical_field, comparison, double );
  # A lower_bound is either a double or "*", which represents negative infinity.
  # Explicitly specify inclusive bound with the character 'i' or exclusive bound
  # with the character 'e'.
  lower_bound = ( double, [ "e" | "i" ] ) | "*";
  # An upper_bound is either a double or "*", which represents infinity.
  # Explicitly specify inclusive bound with the character 'i' or exclusive bound
  # with the character 'e'.
  upper_bound = ( double, [ "e" | "i" ] ) | "*";
  # Supported comparison operators.
  comparison = "<=" | "<" | ">=" | ">" | "=";
  # A literal is any double quoted string. You must escape backslash (\) and
  # quote (") characters.
  literal = double quoted string;
  text_field = text field - for example, category;
  numerical_field = numerical field - for example, score;

To filter media search using metadata, follow these steps:

  1. Find your data store ID. If you already have your data store ID, skip to the next step.

    1. In the Google Cloud console, go to the Agent Builder page and in the navigation menu, click Data Stores.

      Go to the Data Stores page

    2. Click the name of your data store.

    3. On the Data page for your data store, get the data store ID.

  2. Determine the document field or fields that you want to filter on. For example, for the documents in Before you begin, you could use the categories field as a filter.

    You can only use indexable fields in filter expressions. To determine if a field is indexable, do the following:

    1. In the Google Cloud console, go to the Agent Builder page and in the navigation menu, click Data Stores.

      Go to the Data Stores page

    2. Click the name of your data store.

    3. In the Name column, click the data store.

    4. Click the Schema tab to view the schema for your data store. If Indexable for the field is:

      • Selected , then that field is ready to be filtered on for search; skip step 3.

      • Not selected , then follow step 3 to enable the field for indexing.

      • Not available , then the field can't be indexed.

  3. To make a field, such as the categories field, filterable, do the following:

    1. In the Google Cloud console, go to the Agent Builder page, and in the navigation menu, click Apps.

      Go to the Apps page

    2. Click your media search app.

    3. In the navigation menu, click Data.

    4. Click the Schema tab. This tab shows current field settings.

    5. Click Edit.

    6. If it's not already selected, select the Indexable checkbox in the categories row, and then click Save.

    7. Wait six hours to allow time for your schema edit to propagate. After six hours, you can proceed to the following step.

  4. Get search results.

    curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/servingConfigs/default_search:search" \
    -d '{
    "query": "QUERY",
    "filter": "FILTER"
    }'
    
    • PROJECT_ID: The ID of your project.
    • DATA_STORE_ID: The ID your data store.
    • QUERY: The query text to search.
    • FILTER: A text field for filtering your search using a filter expression.

    For example, suppose you want to search through the movies in the Before you begin section, and you want search results only for movies that: (1) Contain the word "avatar", and (2) are in the "Documentary" category. You would do that by including the following statements with your call:

    "query": "avatar",
    "filter": "categories: ANY(\"Documentary\")"
    

    For more information, see the search method.

    Click for an example response.

    If you perform a search like the one in the preceding procedure, you can expect to get a response similar to the following. Notice that the response includes only the Avatar documentaries.

    {
      "results": [
        {
          "id": "243308",
          "document": {
            "name": "projects/431678329718/locations/global/collections/default_collection/dataStores/rdds3_1698205785399/branches/0/documents/243308",
            "id": "243308",
            "structData": {
              "categories": [
                "Documentary"
              ],
              "title": "Capturing Avatar (2010)",
              "uri": "http://mytestdomain.movie/content/243308",
              "media_type": "movie"
            }
          }
        },
        {
          "id": "172851",
          "document": {
            "name": "projects/431678329718/locations/global/collections/default_collection/dataStores/rdds3_1698205785399/branches/0/documents/172851",
            "id": "172851",
            "structData": {
              "categories": [
                "Documentary"
              ],
              "uri": "http://mytestdomain.movie/content/172851",
              "media_type": "movie",
              "title": "Avatar: Creating the World of Pandora (2010)"
            }
          }
        }
      ],
      "totalSize": 2,
      "attributionToken": "XfBcCgwIvIzJqwYQ2_qNxwMSJDY1NzEzNmY1LTAwMDAtMmFhMy05YWU3LTE0MjIzYmIwOGVkMiIFTUVESUEqII6-nRXFy_MXnIaOIsLwnhXUsp0VpovvF6OAlyKiho4i",
      "guidedSearchResult": {},
      "summary": {}
    }