Add metadata for advanced website indexing

If advanced website indexing is enabled in your data store, you can add metadata to the schema to enrich your indexing.

Example use case

Suppose you have a large number of web pages that are relevant to various departments in your organization. You can use meta tags to label the pages that are relevant for each department. You can then use the indexed tags as filters in your queries. This lets you to restrict search results to web pages containing a label that matches any of the specified departments.

This process can be summarized as follows:

  1. Add the following meta tags to a subset of your webpages:
    • Relevant to engineering and IT departments: <meta name="department" content="eng, infotech">
    • Relevant to finance and HR departments: <meta name="department" content="finance, human resources">
  2. Recrawl the updated pages.
  3. Add department to your data store schema as an indexable array as described in the Add metadata to the store schema section.

After updating your schema, your data store is automatically reindexed. After the reindexing is complete, you can use the department filter in a filter expression to reorder or filter search results. For example, when users from the finance department issue queries, the search results can be made more relevant for them with the department filter set to finance.

Before you begin

Before you update the data store's schema, do the following:

  • Turn on advanced website indexing for the data store. For more information, see Turn on advanced website indexing.
  • Understand that after you add meta tags in your web pages, you must recrawl the pages. This might take several hours.
  • Understand that after you add metadata and update the data store schema, the website in your data store is reindexed automatically. Reindexing is a long-running operation that might take several hours.
  • Ensure that you don't use any excluded or unsupported meta tags.

Add metadata to the data store schema

To add metadata to the data store schema:

  1. Add meta tags to all the pages in your website that you that you want to enrich with metadata indexing.

    Each meta tag must have its name attribute set to the field you want to index and its content attribute to a string comprising one or more comma-separated values.

    Vertex AI Search supports all meta tags with names that match the pattern [a-zA-Z0-9][a-zA-Z0-9-_]*. Ensure that you don't use any excluded or unsupported meta tags.

  2. Recrawl the updated web pages.

  3. View the schema definition for your data store over REST API.

  4. Update the data store schema over REST API by adding the META_TAG_NAME field that has its type set to array. For more information, see About providing your own schema as a JSON object. The following is an example of a schema update for a website:

    {
      "type": "object",
      "properties": {
        "META_TAG_NAME": {
          "type": "array",
          "items": {
            "type": "string",
            "searchable": true,
            "retrievable": true,
            "indexable": true
          }
        }
      },
      "$schema": "https://json-schema.org/draft/2020-12/schema"
    }
    

    Replace META_TAG_NAME with the exact name attribute's value.

    After you update the website schema, the website is reindexed automatically. This is a long-running operation that can take multiple hours.

What's next

Use the indexed metadata for the following: