Build generative AI applications using AlloyDB AI

This section describes how to invoke predictions and query and index embeddings using the pgvector extension. These machine learning-powered AI functions are available through AlloyDB AI, which is a suite of AlloyDB for PostgreSQL features that let you apply the semantic and predictive power of machine learning (ML) models to your data.

Learn more about AlloyDB AI at https://cloud.google.com//alloydb/docs/ai.

Invoke predictions

To integrate Vertex AI with AlloyDB Omni and run predictions on Vertex AI stored models, follow these steps.

Before you begin

  1. Enable Vertex AI online predictions in GDC.
  2. Create a Kubernetes secret based on the service account key downloaded in the preceding steps by running the following command. Ensure that you create the Kubernetes secret in the same namespace as your DBCluster resource.

    kubectl create secret generic SECRET_NAME \
    --from-file=PATH_TO_SERVICE_ACCOUNT_KEY/private-key.json \
    -n NAMESPACE

    Replace the following:

    • SECRET_NAME: the name of the secret used when you create a DBCluster manifest to enable AlloyDB Omni to access Distributed Cloud AI features. For example, vertex-ai-key-alloydb.

    • PATH_TO_SERVICE_ACCOUNT_KEY: the path to the location where you downloaded the private-key.json service account key.

    • NAMESPACE: the namespace of the database cluster.

  3. Install the AlloyDB Omni operator using steps listed in Choose a database engine type and create a database cluster.

  4. Create a database cluster with AlloyDB AI and set vertexAIKeyRef to the Kubernetes secret created in the preceding steps in the googleMLExtension field in the DBCluster manifest.

    apiVersion: v1
    kind: Secret
    metadata:
      name: db-pw-DBCLUSTER_NAME
      namespace: USER_PROJECT
    type: Opaque
    data:
      DBCLUSTER_NAME: "BASE64_PASSWORD"
    ---
    apiVersion: DBENGINE_NAME.dbadmin.gdc.goog/v1
    kind: DBCluster
    metadata:
      name: DBCLUSTER_NAME
      namespace: USER_PROJECT
    spec:
      primarySpec:
        adminUser:
          passwordRef:
            name: db-pw-DBCLUSTER_NAME
        features:
          googleMLExtension:
            config:
              vertexAIKeyRef: SECRET_NAME
        version: "DB_VERSION"
        resources:
          memory: DB_MEMORY
          cpu: DB_CPU
          disks:
          - name: DataDisk
            size: DB_DATA_DISK
    

    Replace the following variables:

    • DBCLUSTER_NAME: the name of the database cluster.
    • USER_PROJECT: the name of the user project where the database cluster will be created.
    • BASE64_PASSWORD: the base64 encoding of the database's administrator password.
    • DBENGINE_NAME: the name of the database engine. Set to alloydbomni.
    • DB_VERSION: the version of the database engine.
    • DB_MEMORY: the amount of memory allocated to the database cluster, for example 5Gi.
    • DB_CPU: the amount of CPUs allocated to the database cluster, for example 2.
    • DB_DATA_DISK: the amount of space allocated to the database cluster, for example 10 Gi.

    Apply the manifest.

    kubectl apply -f DB_CLUSTER_YAML

    Replace the following:

    • DB_CLUSTER_YAML: the name of this database cluster manifest file—for example, alloydb-omni-db-cluster.yaml.
  5. Install the google_ml_integration extension.

    CREATE EXTENSION google_ml_integration CASCADE;
    

Invoke a prediction

Invoke an online prediction using a Vertex AI model endpoint by running the following ml_predict_row() SQL function:

  SELECT ml_predict_row('PREDICTION_ENDPOINT/PROJECT_NAMESPACE/ORGANIZATION/ZONE/DNS/DNS_SUFFIX', '{ "instances": [ INSTANCES ], "parameters":
  PARAMETERS');

Replace the following:

  • PREDICTION_ENDPOINT: the Vertex AI endpoint qualified name

  • PROJECT_NAMESPACE: the namespace in which the Vertex AI endpoint is deployed

  • ORGANIZATION: the name of the organization in which the Vertex AI endpoint is deployed

  • ZONE: the zone in which your Vertex AI endpoint is deployed

  • DNS: the DNS for your organization

  • DNS_SUFFIX: the suffix of the DNS object

  • INSTANCES: the inputs to the prediction call, in JSON format

  • PARAMETERS: the parameters to the prediction call, in JSON format

Query and index embeddings using pgvector

The pgvector PostgreSQL extension lets you use vector-specific operators and functions when you store, index, and query text embeddings in your database. AlloyDB provides optimizations for working with pgvector, which let you create indexes that can speed up certain queries that involve embeddings.

Learn more about using AlloyDB as an LLM, and generating and storing vector embeddings based on an LLM at https://cloud.google.com/alloydb/docs/ai/work-with-embeddings#index.

Create indexes and query vectors using ScaNN

This section shows you how to use stored embeddings to generate indexes and query embeddings. You can create ScaNN indexes with AlloyDB.

Before you begin

Before you can start creating indexes, you must complete the following prerequisites.

  • Embedding vectors are added to a table in your AlloyDB database.

  • The vector extension version 0.5.0 or later that is based on pgvector, extended by Google for AlloyDB is installed.

    CREATE EXTENSION IF NOT EXISTS vector;
    
  • To generate ScaNN indexes, install the postgres_ann extension in addition to the vector extension.

    CREATE EXTENSION IF NOT EXISTS postgres_ann;
    

Create a ScaNN index

You can create a ScaNN index for tables in your database.

AlloyDB postgres_ann, a PostgreSQL extension developed by Google that implements a highly efficient nearest-neighbor index powered by the ScaNN algorithm.

The ScaNN index is a tree-based quantization index for approximate nearest neighbor search. It provides a low index building time and small memory footprint. In addition, it provides fast QPS based on the workload.

Two-level tree ScaNN index

To apply a two-level tree index using the ScaNN algorithm to a column containing stored vector embeddings, run the following DDL query:

CREATE INDEX INDEX_NAME ON TABLE
  USING scann (EMBEDDING_COLUMN DISTANCE_FUNCTION)
  WITH (num_leaves=NUM_LEAVES_VALUE);

Replace the following:

  • INDEX_NAME: the name of the index you want to create—for example, my-scann-index. The index names are shared across your database. Ensure that each index name is unique to each table in your database.

  • TABLE: the table to add the index to.

  • EMBEDDING_COLUMN: a column that stores vector data.

  • DISTANCE_FUNCTION: the distance function to use with this index. Choose one of the following:

    • L2 distance: l2

    • Dot product: dot_product

    • Cosine distance: cosine

  • NUM_LEAVES_VALUE: the number of partitions to apply to this index. Set to any value between 1 to 1048576.

Three-level tree ScaNN index

To create a three-level tree index using the ScaNN algorithm to a column containing stored vector embeddings, run the following DDL query:

CREATE INDEX INDEX_NAME ON TABLE
  USING scann (EMBEDDING_COLUMN DISTANCE_FUNCTION)
  WITH (num_leaves=NUM_LEAVES_VALUE, max_num_levels = MAX_NUM_LEVELS);

Replace the following:

  • MAX_NUM_LEVELS: the maximum number of levels of the K-means clustering tree. Set to 1(default) for two-level tree-based quantization and to 2 for three-level tree-based quantization.

After you create the index, you can run nearest-neighbor search queries that make use of the index by following the instructions in Make a nearest-neighbor query with given text.

The index parameters must be set to strike a right balance between QPS and recall.

To create this index on an embedding column that uses the real[] data type instead of vector, cast the column into the vector data type:

CREATE INDEX INDEX_NAME ON TABLE
  USING scann (CAST(EMBEDDING_COLUMN AS vector(DIMENSIONS)) DISTANCE_FUNCTION)
  WITH (num_leaves=NUM_LEAVES_VALUE, max_num_levels = MAX_NUM_LEVELS);

Replace DIMENSIONS with the dimensional width of the embedding column.

To view the indexing progress, use the pg_stat_progress_create_index view:

SELECT * FROM pg_stat_progress_create_index;

The phase column shows the current state of your index creation, and the building index: tree training phase disappears after the index is created.

Run a query

After you have stored and indexed embeddings in your database, you can start querying using the pgvector query functionality. You cannot run bulk search queries using the postgres_ann extension.

To find the nearest semantic neighbors for an embedding vector, you can run the following example query, where you set the same distance function that you used during the index creation.

  SELECT * FROM TABLE
    ORDER BY EMBEDDING_COLUMN DISTANCE_FUNCTION_QUERY ['EMBEDDING']
    LIMIT ROW_COUNT

Replace the following:

  • TABLE: the table containing the embedding to compare the text to.

  • INDEX_NAME: the name of the index you want to use—for example, my-scann-index.

  • EMBEDDING_COLUMN: the column containing the stored embeddings.

  • DISTANCE_FUNCTION_QUERY: the distance function to use with this query. Choose one of the following based on the distance function used while creating the index:

    • L2 distance: <->

    • Inner product: <#>

    • Cosine distance: <=>

  • EMBEDDING: the embedding vector you want to find the nearest stored semantic neighbors of.

  • ROW_COUNT: the number of rows to return.

    Specify 1 if you want only the single best match.

You can use also use the embedding()function to translate the text into a vector. You apply the vector to one of the pgvector nearest-neighbor operator, <-> for L2 distance, to find the database rows with the most semantically similar embeddings. Note that you must first register the text embedding Gecko model to use this function.

Because embedding() returns a real array, you must explicitly cast the embedding() call to vector in order to use these values with pgvector operators.

  CREATE EXTENSION google_ml_integration;
  CREATE EXTENSION IF NOT EXISTS vector;

  SELECT * FROM TABLE
    ORDER BY EMBEDDING_COLUMN::vector
    <-> embedding('MODEL_IDVERSION_TAG', 'TEXT')
    LIMIT ROW_COUNT

Replace the following:

  • MODEL_ID: the ID of the model to query.

    If you are using the Vertex AI Model Garden, then specify textembedding-gecko@003 as the model ID. These are the cloud-based models that Distributed Cloud can use for text embeddings.

  • Optional: VERSION_TAG: the version tag of the model to query. Prepend the tag with @.

    If you are using one of the textembedding-gecko English models with Vertex AI, then specify one of the version tags—for example, textembedding-gecko@003.

    Google strongly recommends that you always specify the version tag. If you don't specify the version tag, then AlloyDB always uses the latest model version, which might lead to unexpected results.

  • TEXT: the text to translate into a vector embedding.

Vector index metrics

This section lists the metrics related to the vector indexes that you generate in AlloyDB. You can view these metrics using the pg_stat_ann_indexes view that is available when you install the postgres_ann extension.

Usability metrics

The usability metrics include metrics that help you understand the state of index utilization with metrics such as, index configuration and number of index scans.

Metric name Data type Description
relid OID Unique identifier of the table that contains the vector index.
indexrelid OID Unique identifier of the vector index.
schemaname NAME Name of the schema to which index belongs.
relname NAME Name of the table that contains the index.
indexrelname NAME Name of the index.
indextype NAME Type of the index. This value is always set to postgres_ann.
indexconfig TEXT[] Configuration, such as leaves count and quantizer, defined for the index when it was created.
indexsize TEXT Size of the index.
indexscan BIGINT Number of index scans initiated on the index.

Tuning metrics

Tuning metrics provide insights into your current index optimization, allowing you to apply recommendations for faster query performance.

Metric name Data type Description
insertcount BIGINT Number of insert operations on the index. This metric also includes any number of rows that existed before the index was created.
updatecount BIGINT Number of update operations on the index. This metric doesn't take into account any HOT updates.
deletecount BIGINT Number of delete operations on the index.
distribution JSONB Vector distributions across all partitions for the index.

The following fields show the distribution:
  • maximum (INT8): Maximum number of vectors across all partitions.
  • minimum (INT8): Minimum number of vectors across all partitions.
  • average (FLOAT) : Average number of vectors across all partitions.
  • outliers (INT8[]): Top outliers across all partitions. This value shows the top 20 outliers.

Note: Due to the inherent characteristics of the K-means clustering algorithm, there will always be some degree of variance in the distribution of vectors across partitions, even when the index is initially created.

Tuning recommendation based on the metrics

Mutation
The insertcount, updatecount, and deletecount metrics together show the changes or mutations in the vector for the index.
The index is created with a specific number of vectors and partitions. When operations such as insert, update, or delete are performed on the vector index, it only affects the initial set of partitions where the vectors reside. Consequently, the number of vectors in each partition fluctuates over time, potentially impacting recall, QPS, or both.
If you encounter slowness or accuracy issues such as low QPS or poor recall, in your ANN search queries over time, then consider reviewing these metrics. A high number of mutations relative to the total number of vectors could indicate the need for reindexing.
Distribution
The distribution metric shows the vector distributions across all partitions.
When you create an index, it is created with a specific number of vectors and fixed partitions. The partitioning process and subsequent distribution occurs based on this consideration. If additional vectors are added, they are partitioned among the existing partitions, resulting in a different distribution compared to the distribution when the index was created. Since the final distribution does not consider all vectors simultaneously, the recall, QPS, or both might be affected.
If you observe a gradual decline in the performance of your ANN search queries, such as slower response times or reduced accuracy in the results (measured by QPS or recall), then consider checking this metric and reindexing.