Overview of Vertex Matching Engine

Vertex Matching Engine allows you to perform vector similarity search so that you can perform efficient, accurate searches on large amounts of data. ML models transform data inputs, such as text and images, into embeddings that represent high-dimensional vectors. You can bring your own pre-trained embeddings or use built-in models to train a custom embedding without the need to write training code. Vertex Matching Engine offers the Two-Tower built-in algorithm, a Google-developed, supervised approach to match pairs of relevant items (such as user profiles, search queries, text documents, or images). With a trained embedding, Matching Engine lets you query your data for both exact matches and semantically similar matches — embeddings in your data that are similar to the one you query. Given a vector, in a matter of milliseconds, Matching Engine helps you find the most similar vectors from a large corpus of vectors.

Vector similarity search solutions are also known as k-Nearest Neighbor (kNN), Approximate Nearest Neighbor (ANN), and Embedding Vector Matching. Matching Engine uses a new type of vector quantization developed by Google Research: Accelerating Large-Scale Inference with Anisotropic Vector Quantization. Learn more about how this works by reading this blog post about ScANN (Scalable Nearest Neighbors)

Vector similarity search is a fundamental part of many applications that entail computing and using semantic embeddings.

For example it is used in:

  • The candidate generation phase of recommendation engines, or ad targeting engines
  • One-shot or Few-shot image classification or image search
  • NLP applications that perform semantic search on text embeddings (that may be produced by using algorithms such as BERT)

Improved scale and recall, at lower cost

Matching Engine delivers similarity search at scale, with low QPS, high recall and cost efficiency.

  • Scales to billions of embedding vectors
  • Results are served with 50th percentile latencies as low as 5ms, even when the QPS is in the hundreds of thousands
  • Delivers industry leading recall. Recall measures the percentage of actual neighbors returned for each vector search call
  • In most cases, it utilizes less CPU and memory than other known alternatives

Valuable capabilities that simplify real-world architectures

Key user journeys

  • Create and deploy an index from a user provided set of embedding vectors
  • Update a live index with a user provided set of embedding vectors
  • Low latency online querying to get the nearest neighbors of a query embedding vector

Useful terminology

Index: A collection of vectors deployed together for similarity search. Vectors can be added to an index or removed from an index. Similarity search queries are issued to a specific index, and will search over the vectors in that index.

Recall: The percentage of true nearest neighbors returned by the index. For example, if a nearest neighbor query for 20 nearest neighbors returned 19 of the "ground truth" nearest neighbors, the recall will be 19/20*100 = 95%

Restricts: Functionality to "restrict" searches to a subset of the index using boolean rules.

What's next