About apps and data stores

This page describes Vertex AI Search apps and data stores.

With Vertex AI Search, you create a search or recommendations app and connect it to a data store. A Google Cloud project can contain multiple apps.

Relationship between apps and data stores

The relationship between apps and data stores depends on the type of app:

Custom search apps have a many-to-many relationship with data stores. When multiple data stores are connected to a single custom search app, this is referred to as blended search. For information about limitations of connecting a search app to more than one data store, see About blended search.
A custom recommendations app has a one-to-one connection with its data store.
A media app has a many-to-one relationship with its data store. An app can only connect to one data store, whereas a given data store can be connected to several apps. For example, a media search app and a media recommendations app can share a data store.
A healthcare search app has a many-to-one relationship with its data store. An app can only connect to one data store, whereas a given data store can be connected to several apps. For example, a patient-facing app and a provider-facing app can connect to the same data store.

For a batch data import of healthcare data, data is imported into a data store that's within an app. For streaming data import (Preview) of healthcare data, data is imported into an entity, which is a type of data store that's within a data connector. A data connector is also a type of data store that's within an app.

After a data store is connected to an app, it can't be disconnected.

Method of app creation and data ingestion

How you create an app and ingest data depends on the type of data you have:

For website data, you can use either the Google Cloud console or the API. To use a website data created with the API, you must attach it to an app with Enterprise features enabled in the Google Cloud console.
For structured or unstructured data, you can use either the Google Cloud console or the API.
For healthcare data, you can use either the Google Cloud console or the API.

Documents

Each data store has one or more data records, called documents. What a document represents varies depending on the type of data in the data store:

Website. A document is a web page.
Structured data. A document is a row in a table or a JSON record that follows a particular schema. You can provide this schema yourself or you can let Vertex AI Search derive the schema from the ingested data.
Structured data for media. A document is a row in a table or a JSON record that follows a schema that is specific to media. The documents are records pertaining to media content, such as videos, news articles, music files, and podcasts. A document contains information that describes the media item, at minimum: title, URI to the content location, categories, duration, and available date.
Unstructured data. A document is a file in HTML, PDF with embedded text, or TXT format. PPTX and DOCX formats are available in Preview.
Healthcare FHIR data. A document is a supported FHIR R4 resource. For a list of FHIR R4 resources that Vertex AI Search supports, see Healthcare FHIR R4 data schema reference.

Data stores and apps

In Vertex AI Search, there are various kinds of data stores. A data store can contain only one type of data.

Website data
Structured data
Structured content (media)
Unstructured data
Healthcare FHIR data

Website data

A data store with website data uses data indexed from public websites. You can provide a set of URL patterns that you want to include in your data store. The web pages that fit the URL patterns are called included web pages. You can then set up search over data crawled from the included web pages.

For example, you can provide URL patterns such as example.com/faq/* and example.com/events/* and enable search over the data crawled from these web pages that fit the pattern. This data includes text, images tagged with metadata, and other structured data such as meta tags, PageMap attributes, and schema.org data.

You can also provide URL patterns for portions of websites that you want excluded, for example, example.com/events/members-only/* or example.com/events/past-*. Excluded URLs take priority over included ones.

There are two types of website data stores:

Basic website search:
- Provides search capabilities over the existing Google Search index for the included websites.
- Doesn't require domain verification.
Advanced website indexing:
- Provides advanced search capabilities over an index that's generated based on either of the following:
  - The Vertex AI Search app owners can control which web pages are indexed by submitting sitemaps and maintaining them. For more information, see Index and refresh web pages using sitemaps. This process keeps the index fresh without manual intervention.
  - The Vertex AI Search app owners can perform an initial indexing that mirrors the Google Search index and then expand the index's coverage by recrawling the websites whenever necessary, keeping it fresh. For more information, see Refresh web pages. The advanced capabilities of advanced website indexing are listed in Advanced website indexing.
- Requires Vertex AI Search data stores owners to verify the domains to which the included websites belong. For more information, see Verify website domains.
- Provides the capability to add structured data to the data store schema. A website contains unstructured data, but you can add structured data in the form of meta tags, PageMap attributes, and schema.org data to your web pages. You can then use this structured data to edit the data store schema as explained in Use structured data for advanced website indexing.

What's next

For website search:

To understand the indexing prerequisites, see how to prepare data for website search.
Create a data store using website content.
Create a search app.

Structured data

A data store with structured data enables semantic search or recommendations over structured data. You can import data from BigQuery or Cloud Storage. You can also manually upload structured JSON data through the API.

For example, you can enable search or recommendations over a product catalog for your ecommerce experience or a directory of doctors for provider search or recommendations.

Vertex AI Search auto-detects the schema from the data that you import. Optionally, you can provide a schema for your data. Providing a schema for your data typically improves the quality of results.

What's next

For custom search:

Prepare structured data for ingestion.
Create a search data store using one of these methods:
Create a search app.

For custom recommendations:

Structured data for media

Media apps can only be connected to media data stores. Media data stores are structured data stores with a Google-defined schema or with your own custom schema that contains a specific set of five media-related fields. For more information about the schema, see About media documents and data stores.

For example, you can enable recommendations by creating a media recommendations app for a movie catalog or a news site so that your users will have suitable and personalized suggestion made for them.

In addition to media documents, media data stores also contain the user event information that allows Vertex AI Search to customize recommendations and search for your users. User events are required for media apps. For information about user events, see Record real-time user events.

What's next

Unstructured data

An unstructured data store enables semantic search over data such as documents and images.

Unstructured data stores support documents in HTML, PDF with embedded text, and TXT format. PPTX and DOCX formats are available in Preview.

Search provides results in the form of 10 URLs and summarized answers for natural language queries. Documents must be uploaded to a Cloud Storage bucket with appropriate access permissions. For example, a financial institution can enable search over their private corpus of financial research publications, or a biotech company can enable search or recommendations over their private repository of medical research.

What's next

For search:

Prepare unstructured data for ingestion.
Create a search data store using one of these methods:
Create a search data store for your unstructured data.
Create a search app.

Healthcare FHIR data

A healthcare search app uses FHIR R4 data imported from a Cloud Healthcare API FHIR store. For a list of FHIR R4 resources that Vertex AI Search supports, see Healthcare FHIR R4 data schema reference. A FHIR R4 data store must satisfy some requirements before it can be used as a data source for Vertex AI Search data store. For more information, see how to prepare healthcare FHIR data for ingestion.

What's next

About blended search

You can create a blended search app, where multiple data stores can be connected to a single custom search app. This feature lets you use one app to search across multiple sources and types of data.

To make a blended search app, select multiple data stores when creating a new custom search app. If you don't select multiple data stores during creation, then you can't add additional data stores later.

When getting search results, you can either search across all data stores, or filter for results from a single data store.

The following limitations apply:

Adding and removing data stores:
- To turn on blended search for an app, you must connect at least two data stores to it during app creation.
- You can add or remove data stores from a blended search app, but the app can't have fewer than two data stores connected to it at any time.
- If you connect a single data store to a search app during app creation, then you can't add or remove that data store.
Website data stores need to have advanced website indexing turned on in order to be used for blended search. For more information, see Advanced website indexing.
Data stores that contain unstructured data imported using BigQuery are not supported.
Blended search allows the following fields in search requests:
- boostSpec
- contentSearchSpec
- dataStoreSpecs
- facetSpecs
- filter
- languageCode
- offset
- oneBoxPageSize
- orderBy
- query
- pageSize
- pageToken
- relevanceScoreSpec
- relevanceThreshold
- session
- sessionSpec
- spellCorrectionSpec
- userInfo
- userPseudoId
Blended search allows the following fields in DataStoreSpec:
- dataStore
- boostSpec: If there are boost specs specified for both SearchRequest and dataStoreSpecs, both boost specs are applied to search results
- filter: If there are filters specified for both SearchRequest and dataStoreSpecs, both filters are applied to search results
Create, Read, Update, and Delete (CRUD) operations on serving configs are supported for blended apps. Only the following fields can be added or updated in a serving config:
- boostControlIds
- displayName
- filterControlIds
- genericConfig:
  - contentSearchSpec
- name
- solutionType
- synonymsControlIds
CRUD operations on the following controls are supported for blended search apps:
- boostAction
- synonymAction
- filterAction
There is a limit of 50 data stores per search app.
If one data store uses a CMEK configuration, all other data stores must also use the same CMEK configuration.