This section describes how to invoke predictions and query and index embeddings using the pgvector extension. These machine learning-powered AI functions are available through AlloyDB AI, which is a suite of AlloyDB for PostgreSQL features that let you apply the semantic and predictive power of machine learning (ML) models to your data.
Learn more about AlloyDB AI at https://cloud.google.com//alloydb/docs/ai.
Invoke predictions
To integrate Vertex AI with AlloyDB Omni and run predictions on Vertex AI stored models, follow these steps.
Before you begin
- Enable Vertex AI online predictions in GDC.
Create a Kubernetes secret based on the service account key downloaded in the preceding steps by running the following command. Ensure that you create the Kubernetes secret in the same namespace as your
DBCluster
resource.kubectl create secret generic SECRET_NAME \ --from-file=PATH_TO_SERVICE_ACCOUNT_KEY/private-key.json \ -n NAMESPACE
Replace the following:
SECRET_NAME
: the name of the secret used when you create aDBCluster
manifest to enable AlloyDB Omni to access Distributed Cloud AI features. For example,vertex-ai-key-alloydb
.PATH_TO_SERVICE_ACCOUNT_KEY
: the path to the location where you downloaded theprivate-key.json
service account key.NAMESPACE
: the namespace of the database cluster.
Install the AlloyDB Omni operator using steps listed in Choose a database engine type and create a database cluster.
Create a database cluster with AlloyDB AI and set
vertexAIKeyRef
to the Kubernetes secret created in the preceding steps in thegoogleMLExtension
field in theDBCluster
manifest.apiVersion: v1 kind: Secret metadata: name: db-pw-DBCLUSTER_NAME namespace: USER_PROJECT type: Opaque data: DBCLUSTER_NAME: "BASE64_PASSWORD" --- apiVersion: DBENGINE_NAME.dbadmin.gdc.goog/v1 kind: DBCluster metadata: name: DBCLUSTER_NAME namespace: USER_PROJECT spec: primarySpec: adminUser: passwordRef: name: db-pw-DBCLUSTER_NAME features: googleMLExtension: config: vertexAIKeyRef: SECRET_NAME version: "DB_VERSION" resources: memory: DB_MEMORY cpu: DB_CPU disks: - name: DataDisk size: DB_DATA_DISK
Replace the following variables:
DBCLUSTER_NAME
: the name of the database cluster.USER_PROJECT
: the name of the user project where the database cluster will be created.BASE64_PASSWORD
: the base64 encoding of the database's administrator password.DBENGINE_NAME
: the name of the database engine. Set toalloydbomni
.DB_VERSION
: the version of the database engine.DB_MEMORY
: the amount of memory allocated to the database cluster, for example5Gi
.DB_CPU
: the amount of CPUs allocated to the database cluster, for example2
.DB_DATA_DISK
: the amount of space allocated to the database cluster, for example10 Gi
.
Apply the manifest.
kubectl apply -f DB_CLUSTER_YAML
Replace the following:
DB_CLUSTER_YAML
: the name of this database cluster manifest file—for example,alloydb-omni-db-cluster.yaml
.
Install the
google_ml_integration
extension.CREATE EXTENSION google_ml_integration CASCADE;
Invoke a prediction
Invoke an online prediction using a Vertex AI model endpoint by running
the following ml_predict_row()
SQL function:
SELECT ml_predict_row('PREDICTION_ENDPOINT/PROJECT_NAMESPACE/ORGANIZATION/ZONE/DNS/DNS_SUFFIX', '{ "instances": [ INSTANCES ], "parameters":
PARAMETERS');
Replace the following:
PREDICTION_ENDPOINT
: the Vertex AI endpoint qualified namePROJECT_NAMESPACE
: the namespace in which the Vertex AI endpoint is deployedORGANIZATION
: the name of the organization in which the Vertex AI endpoint is deployedZONE
: the zone in which your Vertex AI endpoint is deployedDNS
: the DNS for your organizationDNS_SUFFIX
: the suffix of the DNS objectINSTANCES
: the inputs to the prediction call, in JSON formatPARAMETERS
: the parameters to the prediction call, in JSON format
Query and index embeddings using pgvector
The pgvector
PostgreSQL extension lets you use vector-specific operators and
functions when you store, index, and query text embeddings in your database.
AlloyDB provides optimizations for working with pgvector
,
which let you create indexes that can speed up certain queries that
involve embeddings.
Learn more about using AlloyDB as an LLM, and generating and storing vector embeddings based on an LLM at https://cloud.google.com/alloydb/docs/ai/work-with-embeddings#index.
Create indexes and query vectors using ScaNN
This section shows you how to use stored embeddings to generate indexes and query
embeddings. You can create ScaNN
indexes with AlloyDB.
Before you begin
Before you can start creating indexes, you must complete the following prerequisites.
Embedding vectors are added to a table in your AlloyDB database.
The
vector
extension version0.5.0
or later that is based onpgvector
, extended by Google for AlloyDB is installed.CREATE EXTENSION IF NOT EXISTS vector;
To generate
ScaNN
indexes, install thepostgres_ann
extension in addition to thevector
extension.CREATE EXTENSION IF NOT EXISTS postgres_ann;
Create a ScaNN
index
You can create a ScaNN
index for tables in your database.
AlloyDB postgres_ann
, a
PostgreSQL extension developed by Google that implements a highly
efficient nearest-neighbor index powered by the ScaNN
algorithm.
The ScaNN
index is a tree-based quantization index for approximate
nearest neighbor search. It provides a low index building time and small
memory footprint. In addition, it provides fast QPS based on the workload.
Two-level tree ScaNN
index
To apply a two-level tree index using the ScaNN algorithm to a column containing stored vector embeddings, run the following DDL query:
CREATE INDEX INDEX_NAME ON TABLE
USING scann (EMBEDDING_COLUMN DISTANCE_FUNCTION)
WITH (num_leaves=NUM_LEAVES_VALUE);
Replace the following:
INDEX_NAME
: the name of the index you want to create—for example,my-scann-index
. The index names are shared across your database. Ensure that each index name is unique to each table in your database.TABLE
: the table to add the index to.EMBEDDING_COLUMN
: a column that storesvector
data.DISTANCE_FUNCTION
: the distance function to use with this index. Choose one of the following:L2 distance:
l2
Dot product:
dot_product
Cosine distance:
cosine
NUM_LEAVES_VALUE
: the number of partitions to apply to this index. Set to any value between 1 to 1048576.
Three-level tree ScaNN
index
To create a three-level tree index using the ScaNN algorithm to a column containing stored vector embeddings, run the following DDL query:
CREATE INDEX INDEX_NAME ON TABLE
USING scann (EMBEDDING_COLUMN DISTANCE_FUNCTION)
WITH (num_leaves=NUM_LEAVES_VALUE, max_num_levels = MAX_NUM_LEVELS);
Replace the following:
MAX_NUM_LEVELS
: the maximum number of levels of the K-means clustering tree. Set to1
(default) for two-level tree-based quantization and to2
for three-level tree-based quantization.
After you create the index, you can run nearest-neighbor search queries that make use of the index by following the instructions in Make a nearest-neighbor query with given text.
The index parameters must be set to strike a right balance between QPS and recall.
To create this index on an embedding column that uses the real[]
data type
instead of vector
, cast the column into the vector
data type:
CREATE INDEX INDEX_NAME ON TABLE
USING scann (CAST(EMBEDDING_COLUMN AS vector(DIMENSIONS)) DISTANCE_FUNCTION)
WITH (num_leaves=NUM_LEAVES_VALUE, max_num_levels = MAX_NUM_LEVELS);
Replace DIMENSIONS
with the dimensional width of the
embedding column.
To view the indexing progress, use the pg_stat_progress_create_index
view:
SELECT * FROM pg_stat_progress_create_index;
The phase
column shows the current state of your index creation, and the
building index: tree training
phase disappears after the index is created.
Run a query
After you have stored and indexed embeddings in your database, you can start
querying using the pgvector
query
functionality. You cannot run
bulk search queries using the postgres_ann
extension.
To find the nearest semantic neighbors for an embedding vector, you can run the following example query, where you set the same distance function that you used during the index creation.
SELECT * FROM TABLE
ORDER BY EMBEDDING_COLUMN DISTANCE_FUNCTION_QUERY ['EMBEDDING']
LIMIT ROW_COUNT
Replace the following:
TABLE
: the table containing the embedding to compare the text to.INDEX_NAME
: the name of the index you want to use—for example,my-scann-index
.EMBEDDING_COLUMN
: the column containing the stored embeddings.DISTANCE_FUNCTION_QUERY
: the distance function to use with this query. Choose one of the following based on the distance function used while creating the index:L2 distance:
<->
Inner product:
<#>
Cosine distance:
<=>
EMBEDDING
: the embedding vector you want to find the nearest stored semantic neighbors of.ROW_COUNT
: the number of rows to return.Specify
1
if you want only the single best match.
You can use also use the embedding()
function to translate the
text into a vector. You apply the vector to one of the
pgvector
nearest-neighbor operator, <->
for L2 distance, to find the database rows with the
most semantically similar embeddings. Note that you must first register the text embedding Gecko model to use this function.
Because embedding()
returns a real
array, you must explicitly cast the
embedding()
call to vector
in order to use these values with pgvector
operators.
CREATE EXTENSION google_ml_integration;
CREATE EXTENSION IF NOT EXISTS vector;
SELECT * FROM TABLE
ORDER BY EMBEDDING_COLUMN::vector
<-> embedding('MODEL_IDVERSION_TAG', 'TEXT')
LIMIT ROW_COUNT
Replace the following:
MODEL_ID
: the ID of the model to query.If you are using the Vertex AI Model Garden, then specify
textembedding-gecko@003
as the model ID. These are the cloud-based models that Distributed Cloud can use for text embeddings.Optional:
VERSION_TAG
: the version tag of the model to query. Prepend the tag with@
.If you are using one of the
textembedding-gecko
English models with Vertex AI, then specify one of the version tags—for example,textembedding-gecko@003
.Google strongly recommends that you always specify the version tag. If you don't specify the version tag, then AlloyDB always uses the latest model version, which might lead to unexpected results.
TEXT
: the text to translate into a vector embedding.
Vector index metrics
This section lists the metrics related to the vector indexes that you generate in AlloyDB. You can view these metrics using the pg_stat_ann_indexes
view that is available when you install the postgres_ann
extension.
Usability metrics
The usability metrics include metrics that help you understand the state of index utilization with metrics such as, index configuration and number of index scans.
Metric name | Data type | Description |
---|---|---|
relid |
OID |
Unique identifier of the table that contains the vector index. |
indexrelid |
OID |
Unique identifier of the vector index. |
schemaname |
NAME |
Name of the schema to which index belongs. |
relname |
NAME |
Name of the table that contains the index. |
indexrelname |
NAME |
Name of the index. |
indextype |
NAME |
Type of the index. This value is always set to postgres_ann . |
indexconfig |
TEXT[] |
Configuration, such as leaves count and quantizer, defined for the index when it was created. |
indexsize |
TEXT |
Size of the index. |
indexscan |
BIGINT |
Number of index scans initiated on the index. |
Tuning metrics
Tuning metrics provide insights into your current index optimization, allowing you to apply recommendations for faster query performance.
Metric name | Data type | Description |
---|---|---|
insertcount |
BIGINT |
Number of insert operations on the index. This metric also includes any number of rows that existed before the index was created. |
updatecount |
BIGINT |
Number of update operations on the index. This metric doesn't take into account any HOT updates. |
deletecount |
BIGINT |
Number of delete operations on the index. |
distribution |
JSONB |
Vector distributions across all partitions for the index. The following fields show the distribution:
Note: Due to the inherent characteristics of the K-means clustering algorithm, there will always be some degree of variance in the distribution of vectors across partitions, even when the index is initially created. |
Tuning recommendation based on the metrics
- Mutation
- The
insertcount
,updatecount
, anddeletecount
metrics together show the changes or mutations in the vector for the index. - The index is created with a specific number of vectors and partitions. When operations such as insert, update, or delete are performed on the vector index, it only affects the initial set of partitions where the vectors reside. Consequently, the number of vectors in each partition fluctuates over time, potentially impacting recall, QPS, or both.
- If you encounter slowness or accuracy issues such as low QPS or poor recall, in your ANN search queries over time, then consider reviewing these metrics. A high number of mutations relative to the total number of vectors could indicate the need for reindexing.
- Distribution
- The
distribution
metric shows the vector distributions across all partitions. - When you create an index, it is created with a specific number of vectors and fixed partitions. The partitioning process and subsequent distribution occurs based on this consideration. If additional vectors are added, they are partitioned among the existing partitions, resulting in a different distribution compared to the distribution when the index was created. Since the final distribution does not consider all vectors simultaneously, the recall, QPS, or both might be affected.
- If you observe a gradual decline in the performance of your ANN search queries, such as slower response times or reduced accuracy in the results (measured by QPS or recall), then consider checking this metric and reindexing.