This page shows you how to use AlloyDB as a large language model (LLM) tool and query engine using these techniques:
Generate and store vector embeddings based on an LLM.
Index and query embeddings using the
pgvector
extension.
For more information about using ML models with AlloyDB, see Build generative AI applications.
AlloyDB lets you use an LLM hosted by Vertex AI to translate a text string into an embedding, which is the model's representation of the given text's semantic meaning as a numeric vector. For more information about Vertex AI support for text embeddings, see Text embeddings.
AlloyDB implements embeddings as arrays of real
values. This
lets you use generated
embeddings as inputs for pgvector
extension functions.
Before you begin
Some requirements differ depending on whether you want to use AlloyDB to generate embeddings, or whether you only need to work with embeddings stored in your database from another source.
Requirements for generating embeddings
To let AlloyDB generate embeddings, make sure you meet the following requirements:
Regional restrictions
You can generate embeddings in regions where Generative AI on Vertex AI is available. For a list of regions, see Generative AI on Vertex AI locations .
For AlloyDB, ensure that both the AlloyDB cluster and the Vertex AI model you are querying are in the same region.
Required database extensions
To work with embeddings, you need the google_ml_integration
extension, version 1.2
or later, installed on your
AlloyDB database.
Optionally, if you want to use pgvector
functions and operators with your embeddings,
then you also need the vector
extension, version 0.5.0.google-1
or later. This is a
version of pgvector
that Google has extended with optimizations specific to AlloyDB.
Both of these extensions are included with AlloyDB. You can install them on any database in your cluster. For more information about installing and managing extensions, see Configure database extensions.
Set up model access
Before you can generate embeddings from an AlloyDB database, you must configure AlloyDB to work with a text embedding model.
To work with the cloud-based textembeddings-gecko
model, you need to
integrate your database with Vertex AI.
Grant database users access to generate embeddings
Grant permission for database users to execute the embedding
function to run predictions:
Connect a
psql
client to the cluster's primary instance, as described in Connect apsql
client to an instance.At the psql command prompt, connect to the database and grant permissions:
\c DB_NAME GRANT EXECUTE ON FUNCTION embedding TO USER_NAME;
Replace the following:
DB_NAME: the name of the database on which the permissions should be granted
USER_NAME: the name of the user for whom the permissions should be granted
Embedding generation
AlloyDB provides a function that lets you translate text into a
vector embedding. You can then store that embedding in your database as vector
data, and optionally use pgvector
functions to base queries on it.
Generate an embedding
To generate an embedding using AlloyDB, use the embedding()
function provided by the google_ml_integration
extension:
SELECT embedding( 'MODEL_IDVERSION_TAG', 'TEXT');
Replace the following:
MODEL_ID
: the ID of the model to query.If you are using the Vertex AI Model Garden, then specify
textembedding-gecko
ortextembedding-gecko-multilingual
here. These are the cloud-based models that AlloyDB can use for text embeddings. For more information, see Text embeddings.Optional:
VERSION_TAG
: the version tag of the model to query. Prepend the tag with@
.If you are using one of the
textembedding-gecko
models with Vertex AI, then specify one of the version tags listed in Model versions.Google strongly recommends always specifying the version tag. Not doing so risks unexpected results.
TEXT
: the text to translate into a vector embedding.
The following example uses version 001
of the textembedding-gecko
LLM to generate an embedding
based on a provided literal string:
SELECT embedding( 'textembedding-gecko@001', 'AlloyDB is a managed, cloud-hosted SQL database service.');
Store a generated embedding
The return value of the embedding()
function is an array of real
values.
To store this value in a table, add a real[]
column:
ALTER TABLE TABLE ADD COLUMN EMBEDDING_COLUMN real[DIMENSIONS];
Replace the following:
TABLE
: the table nameEMBEDDING_COLUMN
: the name of the new embedding columnDIMENSIONS
: the number of dimensions that the model supports.If you are using one of the
textembedding-gecko
models with Vertex AI, specify768
.
Optionally, if you have the pgvector
extension installed, then you can instead
store embeddings as vector
values:
ALTER TABLE TABLE ADD COLUMN EMBEDDING_COLUMN vector(DIMENSIONS);
After you create a column to store embeddings, you can populate it based on the values already stored in another column in the same table:
UPDATE TABLE SET EMBEDDING_COLUMN = embedding('MODEL_IDVERSION_TAG', SOURCE_TEXT_COLUMN);
Replace the following:
TABLE
: the table nameEMBEDDING_COLUMN
: the name of the embedding column
MODEL_ID
: the ID of the model to query.If you are using the Vertex AI Model Garden, then specify
textembedding-gecko
ortextembedding-gecko-multilingual
here. These are the cloud-based models that AlloyDB can use for text embeddings. For more information, see Text embeddings.Optional:
VERSION_TAG
: the version tag of the model to query. Prepend the tag with@
.If you are using one of the
textembedding-gecko
models with Vertex AI, then specify one of the version tags listed in Model versions.Google strongly recommends always specifying the version tag. Not doing so risks unexpected results.
SOURCE_TEXT_COLUMN
: the name of the column storing the text to translate into embeddings
The previous command works for both real[]
and vector
embedding columns. If your
embedding column is of type vector
, then AlloyDB implicitly
casts the return value of embedding()
from a real
array to a vector
value.
The following example uses version 001
of the textembedding-gecko
LLM to populate the column
messages.message_vector
with embeddings based on the content of the messages.message
column:
UPDATE messages SET message_vector = embedding( 'textembedding-gecko@001', message);
Query and index embeddings using pgvector
The pgvector
PostgreSQL extension lets you use vector-specific operators and functions
when you store, index and query text
embeddings in your database. AlloyDB has its own optimizations
for working with pgvector
, letting you create indexes that can significantly
speed up certain queries that involve embeddings.
Create an optimized nearest-neighbor index
Stock pgvector
supports approximate nearest-neighbor
searching through
indexing. AlloyDB adds to this support with a scalar quantization
feature that you can specify when you create an index. When enabled, scalar
quantization can significantly speed up queries that have larger dimensional
vectors, and lets you store vectors with up to 8,000 dimensions.
To enable scalar quantization on a pgvector
-based index, specify ivf
as the
index method, and SQ8
as the quantizer:
CREATE INDEX ON TABLE
USING ivf (EMBEDDING_COLUMN DISTANCE_FUNCTION)
WITH (lists = LIST_COUNT, quantizer = 'SQ8');
Replace the following:
TABLE
: the table to add the index toEMBEDDING_COLUMN
: a column that storesvector
dataDISTANCE_FUNCTION
: the distance function to use with this index. Choose one of the following:L2 distance:
vector_l2_ops
Inner product:
vector_ip_ops
Cosine distance:
vector_cosine_ops
LIST_COUNT
: the number of lists to use with this index
To create this index on an embedding column that uses the real[]
data type
instead of vector
, cast the column into the vector
data type:
CREATE INDEX ON TABLE
USING ivf ((CAST(EMBEDDING_COLUMN AS vector(DIMENSIONS)))'}} DISTANCE_FUNCTION)
WITH (lists = LIST_COUNT, quantizer = 'SQ8');
Replace DIMENSIONS
with the dimensional width of the embedding column.
The next section demonstrates an example of this kind of index.
Make a nearest-neighbor query with given text
After you have stored and indexed embeddings in your database, the full range of
pgvector
query
functionality
becomes available to you.
To find the nearest semantic neighbors to a given piece of text, you can use
the embedding()
function to translate the text into a vector. In the same query, you
apply this vector to the pgvector
nearest-neighbor operator, <->
, to find
the database rows with the most semantically similar embeddings.
Because embedding()
returns a real
array, you must explicitly cast the embedding()
call to vector
in order to use
these values with pgvector
operators.
SELECT RESULT_COLUMNS FROM TABLE
ORDER BY EMBEDDING_COLUMN
<-> embedding('MODEL_IDVERSION_TAG', 'TEXT')::vector
LIMIT ROW_COUNT
Replace the following:
RESULT_COLUMNS
: the columns to display from semantically similar rows.TABLE
: the table containing the embedding to compare the text to.EMBEDDING_COLUMN
: the column containing the stored embeddings.
MODEL_ID
: the ID of the model to query.If you are using the Vertex AI Model Garden, then specify
textembedding-gecko
ortextembedding-gecko-multilingual
here. These are the cloud-based models that AlloyDB can use for text embeddings. For more information, see Text embeddings.Optional:
VERSION_TAG
: the version tag of the model to query. Prepend the tag with@
.If you are using one of the
textembedding-gecko
models with Vertex AI, then specify one of the version tags listed in Model versions.Google strongly recommends always specifying the version tag. Not doing so risks unexpected results.
TEXT
: the text you want to find the nearest stored semantic neighbors of.ROW_COUNT
: the number of rows to return.Specify
1
if you want only the single best match.
To run this query with a stored embedding column that uses the real[]
data type
instead of vector
, cast the column into the vector
data type as well:
SELECT RESULT_COLUMNS::vector FROM TABLE
ORDER BY EMBEDDING_COLUMN
<-> embedding('MODEL_IDVERSION_TAG', 'TEXT')::vector
LIMIT ROW_COUNT
Preview the ScaNN algorithm
AlloyDB Omni version 15.5.1 and later includes a
Technology Preview of postgres_ann
, a PostgreSQL extension developed by Google
that implements a highly efficient nearest-neighbor index powered by the ScaNN
algorithm.
You can experiment with the scann
index type as an alternative to
index types provided by the pgvector
extension, including the ivf
index type whose use is described in Create an optimized nearest-neighbor
index. The scann
index type is compatible with columns using the
pgvector
-provided vector
data type.
Because the scann
index type is available as a Preview, we don't recommend
applying it to production workloads.
To enable use of the scann
index type, run the following CREATE EXTENSION
DDL queries on
a database running in an AlloyDB Omni cluster:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS postgres_ann;
To apply an index using the ScaNN algorithm to a column containing stored vector embeddings, run the following DDL query:
CREATE INDEX ON TABLE
USING scann (EMBEDDING_COLUMN DISTANCE_FUNCTION)
WITH (num_leaves=LEAVES_COUNT, quantizer='sq8');
Replace the following:
TABLE
: the table to add the index to.EMBEDDING_COLUMN
: a column that storesvector
data.DISTANCE_FUNCTION
: the distance function to use with this index. Choose one of the following:L2 distance:
l2
Dot product:
dot_product
Cosine distance:
cosine
LEAVES_COUNT
: the number of partitions to apply to this index. For information on finding an optimal value for this parameter, see Tune ascann
index.
To create this index on an embedding column that uses the real[]
data type
instead of vector
, cast the column into the vector
data type:
CREATE INDEX ON TABLE
USING scann (CAST(EMBEDDING_COLUMN AS vector(DIMENSIONS)) DISTANCE_FUNCTION)
WITH (num_leaves=LEAVES_COUNT, quantizer='sq8');
Replace DIMENSIONS
with the dimensional width of the embedding column.
The samples for building a scann
index require 50 * LEAVES_COUNT
* DIMENSIONS
* 4 bytes of memory. This is because each partition of a scann
index contains 50 samples, and the index stores its own data in 4-byte floats. Before building a scann index, make sure that your database's maintenance_work_mem
flag is set to a value sufficient for the memory required.
Optionally, you can set the following database flags to tune the behavior
of the postgres_ann
extension:
scann.num_leaves_to_search
- To improve the accuracy of queries made using the
index, at the cost of query speed, increase this value prior to calling the queries. The maximum meaningful value is the value
of
num_leaves
that you specify when creating the index. The default value is1
. scann.pre_reordering_num_neighbors
- To help your query achieve higher recall, set this flag to a value greater than the number of neighbors returned by the query. Increasing this value does carry a performance cost.
To set the value of these flags, use the SET
PostgreSQL command.
After you create the index, you can run nearest-neighbor search queries that make use of the index by following the instructions in Make a nearest-neighbor query with given text.
Tune a scann
index
To achieve both a high query-per-second rate (QPS)
and a high recall with your nearest-neighbor queries, you must partition
the tree of your scann
index in a way that is most appropriate to your data
and your queries. You do by this by adjusting the values of the num_leaves
index parameter
and the scann.num_leaves_to_search
database flag.
Though iterative tuning, you can find the optimal values of these variables for your workload. To tune your index, you build, test, adjust, and rebuild the index until your test queries strike the right balance of recall and QPS.
The following general recommendations apply when tuning your scann
index parameters:
- Creating more partitions—also known as leaves—provides better recall. It also increases the size of the tree, and therefore the time required to build the index.
- For partitioning stability, each partition should have at least 100 data points, on average.
For this reason, you shouldn't set the value of
num_leaves
to a number greater than the number of indexed rows divided by 100. - For better performance, a nearest-neighbor query should spend most of its search time
in leaf processing. To help ensure this, set
num_leaves
to a multiple of the square root of the indexed table's row count, as described by the following procedure.
To apply these recommendations to help you find the optimal values of num_leaves
and num_leaves_to_search
for your dataset,
follow these steps:
- Create the
scann
index withnum_leaves
set to the square root of the indexed table's row count. - Run your test queries, increasing the value of
scann.num_of_leaves_to_search
, until you achieve your target recall range–for example, 95%. - Take note of the ratio between
scann.num_leaves_to_search
andnum_leaves
. If your QPS is too low once your queries achieve a target recall, then follow these steps:
- Recreate the index, increasing the value of
num_leaves
andscann.num_leaves_to_search
according to the following guidance:- Set
num_leaves
to a larger factor of the square root of your row count. For example, if the index hasnum_leaves
set to the square root of your row count, try setting it to double the square root. If it is already double, then try setting it to triple the square root. - Increase
scann.num_leaves_to_search
as needed in order to maintain its ratio withnum_leaves
, which you noted in an earlier step. - Don't set
num_leaves
to a value greater than the row count divided by 100.
- Set
Run the test queries again.
While doing so, you can experiment with reducing
scann.num_leaves_to_search
, finding a value that increases QPS while keeping your recall high. You can try different values ofscann.num_leaves_to_search
without rebuilding the index.Repeat this step until both the QPS and the recall range have reached acceptable values.
- Recreate the index, increasing the value of
Use model version tags to avoid errors
Google strongly recommends that you always use a stable version of your chosen embeddings model. For most models, this means explicitly setting a version tag.
Calling the embedding()
function without specifying the version tag of
the model is syntactically valid, but it is also error-prone.
If you omit the version tag when using a model in the Vertex AI Model Garden, then Vertex AI uses the latest version of the model. This might not be the latest stable version. For more information about available Vertex AI model versions, see Model versions.
A given Vertex AI model version always return the same
embedding()
response to given text input. If you don't specify model
versions in your calls to embedding()
, then a new published model
version can abruptly change the returned vector for a given input,
causing errors or other unexpected behavior in your applications.
To avoid these problems, always specify the model version.