Work with vector embeddings

This page shows you how to use AlloyDB as a large language model (LLM) tool and query engine using these techniques:

  • Generate and store vector embeddings based on an LLM.

  • Index and query embeddings using the pgvector extension.

For more information about using ML models with AlloyDB, see Build generative AI applications.

AlloyDB lets you use an LLM hosted by Vertex AI to translate a text string into an embedding, which is the model's representation of the given text's semantic meaning as a numeric vector. For more information about Vertex AI support for text embeddings, see Text embeddings.

AlloyDB implements embeddings as arrays of real values. This lets you use generated embeddings as inputs for pgvector extension functions.

Before you begin

Some requirements differ depending on whether you want to use AlloyDB to generate embeddings, or whether you only need to work with embeddings stored in your database from another source.

Requirements for generating embeddings

To let AlloyDB generate embeddings, make sure you meet the following requirements:

Regional restrictions

To generate embeddings with AlloyDB, your cluster must reside in the region us-central1. This is required because the Vertex AI model that AlloyDB can use for embeddings, textembedding-gecko, is located in that region.

To work around this regional restriction, you can instead invoke predictions, post-processing the JSON output into vectors.

Required database extensions

To work with embeddings, you need the google_ml_integration extension, version 1.2 or later, installed on your AlloyDB database.

Optionally, if you want to use pgvector functions and operators with your embeddings, then you also need the vector extension, version 0.5.0.google-1 or later. This is a version of pgvector that Google has extended with optimizations specific to AlloyDB.

Both of these extensions are included with AlloyDB. You can install them on any database in your cluster. For more information about installing and managing extensions, see Configure database extensions.

Set up model access

Before you can generate embeddings from an AlloyDB database, you must configure AlloyDB to work with a text embedding model.

To work with the cloud-based textembeddings-gecko model, you need to integrate your database with with Vertex AI.

Grant database users access to generate embeddings

Grant permission for database users to execute the embedding function to run predictions:

  1. Connect a psql client to the cluster's primary instance, as described in Connect a psql client to an instance.

  2. At the psql command prompt, connect to the database and grant permissions:

    \c DB_NAME
    
    GRANT EXECUTE ON FUNCTION embedding TO USER_NAME;
    

    Replace the following:

    • DB_NAME: the name of the database on which the permissions should be granted

    • USER_NAME: the name of the user for whom the permissions should be granted

Embedding generation

AlloyDB provides a function that lets you translate text into a vector embedding. You can then store that embedding in your database as vector data, and optionally use pgvector functions to base queries on it.

Generate an embedding

To generate an embedding using AlloyDB, use the embedding() function provided by the google_ml_integration extension:

SELECT embedding( 'MODEL_IDVERSION_TAG', 'TEXT');

Replace the following:

  • TEXT: the text to translate into a vector embedding.

The following example uses version 001 of the textembedding-gecko LLM to generate an embedding based on a provided literal string:

SELECT embedding( 'textembedding-gecko@001', 'AlloyDB is a managed, cloud-hosted SQL database service.');

Store a generated embedding

The return value of the embedding() function is an array of real values. To store this value in a table, add a real[] column:

ALTER TABLE TABLE ADD COLUMN EMBEDDING_COLUMN real[DIMENSIONS];

Replace the following:

  • TABLE: the table name

  • EMBEDDING_COLUMN: the name of the new embedding column

  • DIMENSIONS: the number of dimensions that the model supports.

    If you are using the Vertex AI Model Garden or the AlloyDB Omni Technology Preview, then specify one of the following values for DIMENSIONS:

    If you are using one of the textembedding-gecko models with Vertex AI, specify 768.

Optionally, if you have the pgvector extension installed, then you can instead store embeddings as vector values:

ALTER TABLE TABLE ADD COLUMN EMBEDDING_COLUMN vector(DIMENSIONS);

After you create a column to store embeddings, you can populate it based on the values already stored in another column in the same table:

UPDATE TABLE SET EMBEDDING_COLUMN = embedding('MODEL_IDVERSION_TAG', SOURCE_TEXT_COLUMN);

Replace the following:

  • TABLE: the table name

  • EMBEDDING_COLUMN: the name of the embedding column

  • SOURCE_TEXT_COLUMN: the name of the column storing the text to translate into embeddings

The previous command works for both real[] and vector embedding columns. If your embedding column is of type vector, then AlloyDB implicitly casts the return value of embedding() from a real array to a vector value.

The following example uses version 001 of the textembedding-gecko LLM to populate the column messages.message_vector with embeddings based on the content of the messages.message column:

UPDATE messages SET message_vector = embedding( 'textembedding-gecko@001', message);

Query and index embeddings using pgvector

The pgvector PostgreSQL extension lets you use vector-specific operators and functions when you store, index and query text embeddings in your database. AlloyDB has its own optimizations for working with pgvector, letting you create indexes that can significantly speed up certain queries that involve embeddings.

Create an optimized nearest-neighbor index

Stock pgvector supports approximate nearest-neighbor searching through indexing. AlloyDB adds to this support with a scalar quantization feature that you can specify when you create an index. When enabled, scalar quantization can significantly speed up queries that have larger dimensional vectors, and lets you store vectors with up to 8,000 dimensions.

To enable scalar quantization on a pgvector-based index, specify ivf as the index method, and SQ8 as the quantizer:

CREATE INDEX ON TABLE
  USING ivf (EMBEDDING_COLUMN DISTANCE_FUNCTION)
  WITH (lists = LIST_COUNT, quantizer = 'SQ8');

Replace the following:

  • TABLE: the table to add the index to

  • EMBEDDING_COLUMN: a column that stores vector data

  • DISTANCE_FUNCTION: the distance function to use with this index. Choose one of the following:

    • L2 distance: vector_l2_ops

    • Inner product: vector_ip_ops

    • Cosine distance: vector_cosine_ops

  • LIST_COUNT: the number of lists to use with this index

To create this index on an embedding column that uses the real[] data type instead of vector, cast the column into the vector data type:

CREATE INDEX ON TABLE
  USING ivf ((CAST(EMBEDDING_COLUMN AS vector(DIMENSIONS)))'}} DISTANCE_FUNCTION)
  WITH (lists = LIST_COUNT, quantizer = 'SQ8');

Replace DIMENSIONS with the dimensional width of the embedding column.

The next section demonstrates an example of this kind of index.

Make a nearest-neighbor query with given text

After you have stored and indexed embeddings in your database, the full range of pgvector query functionality becomes available to you.

To find the nearest semantic neighbors to a given piece of text, you can use the embedding() function to translate the text into a vector. In the same query, you apply this vector to the pgvector nearest-neighbor operator, <->, to find the database rows with the most semantically similar embeddings.

Because embedding() returns a real array, you must explicitly cast the embedding() call to vector in order to use these values with pgvector operators.

  SELECT RESULT_COLUMNS FROM TABLE
    ORDER BY EMBEDDING_COLUMN
    <-> embedding('MODEL_IDVERSION_TAG', 'TEXT')::vector
    LIMIT ROW_COUNT

Replace the following:

  • RESULT_COLUMNS: the columns to display from semantically similar rows.

  • TABLE: the table containing the embedding to compare the text to.

  • EMBEDDING_COLUMN: the column containing the stored embeddings.

  • TEXT: the text you want to find the nearest stored semantic neighbors of.

  • ROW_COUNT: the number of rows to return.

    Specify 1 if you want only the single best match.

To run this query with a stored embedding column that uses the real[] data type instead of vector, cast the column into the vector data type as well:

  SELECT RESULT_COLUMNS::vector FROM TABLE
    ORDER BY EMBEDDING_COLUMN
    <-> embedding('MODEL_IDVERSION_TAG', 'TEXT')::vector
    LIMIT ROW_COUNT

Use model version tags to avoid errors

Google strongly recommends that you always use a stable version of your chosen embeddings model. For most models, this means explicitly setting a version tag.

Calling the embedding() function without specifying the version tag of the model is syntactically valid, but it is also error-prone.

If you omit the version tag when using a model in the Vertex AI Model Garden, then Vertex AI uses the latest version of the model. This might not be the latest stable version. For more information about available Vertex AI model versions, see Model versions.

A given Vertex AI model version always return the same embedding() response to given text input. If you don't specify model versions in your calls to embedding(), then a new published model version can abruptly change the returned vector for a given input, causing errors or other unexpected behavior in your applications.

To avoid these problems, always specify the model version.

What's next