Try Gemini 1.5 Pro, our most advanced multimodal model in Vertex AI, and see what you can build with a 1M token context window. Try Gemini 1.5 Pro, our most advanced multimodal model in Vertex AI, and see what you can build with a 1M token context window.

Overview of Vertex AI Vector Search

Vector Search is based on vector search technology developed by Google research. With Vector Search you can leverage the same infrastructure that provides a foundation for Google products such as Google Search, YouTube, and Play.

Introduction

Vector Search can search from billions of semantically similar or semantically related items. A vector similarity-matching service has many use cases such as implementing recommendation engines, search engines, chatbots, and text classification.

One possible use case for Vector Search is an online retailer who has an inventory of hundreds of thousands of clothing items. In this scenario, the multi-modal embedding API could help them create embeddings of these items and use Vector Search to match them to text queries to the most semantically similar images. For example, they could search for "yellow summer dress" and then Vector Search would return and display the most similar items. Vector Search can search at scale, with high queries per second (QPS), high recall, low latency, and cost efficiency.

The use of embeddings is not limited to words or text. You can generate semantic embeddings for many kinds of data, including images, audio, video, and user preferences. For generating a multimodal embedding with Vertex AI, see Get multimodal embeddings.

How to use Vector Search for semantic matching

Semantic matching can be simplified into a few steps. First, you must generate embedding representations of many items (done outside of Vector Search). Secondly, you upload your embeddings to Google Cloud, and then link your data to Vector Search. After your embeddings are added to Vector Search, you can create an index to run queries to get recommendations or results.

Generate an embedding

Generate an embedding for your dataset. This involves preprocessing the data in a way that makes it efficient to search for approximate nearest neighbors (ANN). You can do this outside of Vertex AI or you can use Generative AI on Vertex AI to create an embedding. With Generative AI on Vertex AI, you can create both text and multimodal embeddings.

Add your embedding to Cloud Storage

Upload your embedding to Cloud Storage so that you can call it from the Vector Search service.

Upload to Vector Search

Connect your embeddings to Vector Search to perform nearest neighbor search. You create an index from your embedding, which you can deploy to an index endpoint to query. The query returns the approximate nearest neighbors. To create an index, see Manage indexes. To deploy your index to an endpoint, see Deploy and manage index endpoints.

Evaluate the results

After you have the approximate nearest neighbor results, you can evaluate them to see how well they meet your needs. If the results are not accurate enough, you adjust the parameters of the algorithm or enable scaling to support more queries per second. This is done by updating your configuration file, which configures your index. To learn more, see Configure index parameters.

Vector Search terminology

This list contains some important terminology that you'll need to understand to use Vector Search:

Vector: A vector is a list of float values that has magnitude and direction. It can be used to represent any kind of data, such as numbers, points in space, and directions.
Embedding: An embedding is a type of vector that's used to represent data in a way that captures its semantic meaning. Embeddings are typically created using machine learning techniques, and they are often used in natural language processing (NLP) and other machine learning applications.
Index: A collection of vectors deployed together for similarity search. Vectors can be added to or removed from an index. Similarity search queries are issued to a specific index and search the vectors in that index.
Ground truth: A term that refers to verifying machine learning for accuracy against the real world, like a ground truth dataset.
Recall: The percentage of nearest neighbors returned by the index that are actually true nearest neighbors. For example, if a nearest neighbor query for 20 nearest neighbors returned 19 of the ground truth nearest neighbors, the recall is 19/20x100 = 95%.
Restrict: Functionality that limits searches to a subset of the index by using Boolean rules. Restrict is also referred to as "filtering". With Vector Search, you can use numeric filtering and text attribute filtering.

What's next

Get started in under an hour with the Vector Search quickstart
Review prerequisites and embeddings in Before you begin
Learn how to configure Input data format and structure
See other Vector Search notebook tutorials in the Tutorials overview
Learn about exporting embeddings from Spanner to Vector Search