If advanced website indexing is enabled in your data store, you can use the following types of structured data to enrich your indexing:
- Predefined, Google-inferred page dates
- Custom structured data attributes
- Metadata using
meta
tags - PageMaps
- Metadata using
This page introduces both these types of structured data for your web pages and describes how to add custom structured attributes to your data store schema.
About predefined, Google-inferred page dates
When crawling through the web pages in your website data store, Google infers page data using the properties that apply to your content. Vertex AI Search adds these inferred page data properties to your schema. This inferred data includes the following predefined date properties:
datePublished
: the date and time when the page was first publisheddateModified
: the date and time when the page was most recently modified
These properties are indexed automatically. You can directly use these date properties to enrich your search without adding them to your schema. You can include these predefined date properties in your search requests, such as in filter expressions and boost specifications. For more information, see Example use case using a Google-inferred page date.
About custom structured data attributes
You can add structured data attributes as meta
tags and PageMaps to your web
pages and use these to enrich your indexing. To use custom structured attributes
for indexing, you must update your schema.
Example use case for meta
tags
Suppose you have a large number of web pages that are relevant to various
departments in your organization. You can use meta
tags to label the pages
that are relevant for each department. You can then use the indexed tags as
filters in your queries. This lets you to restrict search results to web pages
containing a label that matches any of the specified departments.
This process can be summarized as follows:
- Add the following
meta
tags to a subset of your web pages:- Relevant to engineering and IT departments:
<meta name="department" content="eng, infotech">
- Relevant to finance and HR departments:
<meta name="department" content="finance, human resources">
- Relevant to engineering and IT departments:
- Recrawl the updated pages.
- Add
department
to your data store schema as an indexable array as described in the Add custom structured data attributes to the data store schema section.
After updating your schema, your data store is automatically reindexed.
After the reindexing is complete, you can use the department
filter in a
filter expression to reorder or filter search
results. For example, when users from the finance department issue queries,
the search results can be made more relevant for them with the department
filter set to finance
.
Example use case for PageMaps
Suppose you have several web pages that contain food recipes. You can add PageMap data to each page's HTML content. You can then use the indexed PageMap attribute names as filters in your queries.
This process can be summarized as follows:
Add PageMap data similar to the following to your web pages:
<html> <head> ... <!-- <PageMap> <DataObject type="document"> <Attribute name="title">Baked potatoes</Attribute> <Attribute name="author">Dana A.</Attribute> <Attribute name="description">Homestyle baked potatoes in oven. This recipe uses Russet potatoes.</Attribute> <Attribute name="rating">4.9</Attribute> <Attribute name="last_update">2015-01-01</Attribute> </DataObject> </PageMap> --> </head> ... </html>
Recrawl the updated pages.
Add
rating
to your data store schema as an indexable array as described in the Add custom structured data attributes to the data store schema section.
After updating your schema, your data store is automatically reindexed.
After the reindexing is complete, you can use the rating
attribute in a
filter expression to reorder or filter search
results. For example, when users search for recipes, boost the search
results that are top-rated by using rating
as a custom numerical
attribute.
Before you begin
Before you update the data store schema, do the following:
- Turn on advanced website indexing for the data store. For more information, see Turn on advanced website indexing.
- Understand how structured data works.
- Understand how to use PageMaps. Review the list of recognized DataObjects that can be added to PageMap data.
- Understand how to use
meta
tags. Ensure that you don't use any excluded or unsupported meta tags. - Ensure that the attribute that needs to be indexed doesn't have any of the following values:
datePublished
dateModified
siteSearch
- Understand that after you add structured data to your web pages, you must recrawl the pages. This might take several hours.
- Understand that after you add structured data attributes to the data store schema, the web pages in your data store are reindexed automatically. Reindexing is a long-running operation that might take several hours.
Add custom structured data attributes to the data store schema
To add custom structured data attributes to the data store schema:
Add
meta
tags or PageMap data to all the pages in your website that you want to enrich with structured data indexing:- For
meta
tags:- Each
meta
tag must have itsname
attribute set to the field you want to index and itscontent
attribute to a string comprising one or more comma-separated values. - Vertex AI Search supports all
meta
tags with names that match the pattern[a-zA-Z0-9][a-zA-Z0-9-_]*
. Ensure that you don't use any excluded or unsupported meta tags.
- Each
- For PageMaps:
- PageMap data must consist of recognized DataObjects that contain Attribute names that you want to index.
- The Attribute names within the DataObjects must be set to the field you want to index.
- For
Recrawl the updated web pages.
View the schema definition for your data store over REST API.
Update the data store schema over REST API. For more information, see About providing your own schema as a JSON object.
- Add the custom attribute and set its
type
toarray
. - Add the data type of the custom attribute attribute's value.
- Specify the source where the custom attribute can be found:
meta
tag, PageMap, or both. If the field is absent or left empty, the values from both the data sources are merged in an array.
The following is an example of a schema update for a website:
{ "type": "object", "properties": { "CUSTOM_ATTRIBUTE": { "type": "array", "items": { "type": "DATA_TYPE", "searchable": true, "retrievable": true, "indexable": true, "siteSearchStructuredDataSources": ["STRUCTURED_DATA_SOURCE_1", "STRUCTURED_DATA_SOURCE_2"] } } }, "$schema": "https://json-schema.org/draft/2020-12/schema" }
Replace the following:
CUSTOM_ATTRIBUTE
: the value of thename
attribute. For example:- For a
meta
tag defined as<meta name="department" content="eng, infotech">
, usedepartment
- For a PageMap Attribute defined as
<Attribute name="rating">4.9</Attribute>
, userating
- For a
DATA_TYPE
: the data type of thename
attribute. Must be either string, number, or datetime. For example:- For a
meta
tag defined as<meta name="department" content="eng, infotech">
, usestring
- For a PageMap Attribute defined as
<Attribute name="rating">4.9</Attribute>
, usenumber
- For a PageMap Attribute defined as
<Attribute name="last_published">2015-01-01</Attribute>
, usedatetime
For more information, seeFieldType
.
- For a
STRUCTURED_DATA_SOURCE_N
: an array consisting of one or both of the following structured data sources where theCUSTOM_ATTRIBUTE
attribute can be found:- Use
METATAGS
if your attribute can be found in ameta
tag - Use
PAGEMAP
if your attribute can be found in a PageMap Attribute
If the
siteSearchStructuredDataSources
field is absent or its value is an empty array, both the data sources are merged in the array.- Use
After you update the website schema, the website is reindexed automatically. This is a long-running operation that can take multiple hours.
- Add the custom attribute and set its
What's next
Use the indexed metadata for the following:
- Serving controls, such as boost, bury, and filter
- Surfacing as facets in search results
- Filter search results
- Boost search results