Register and call remote AI models in AlloyDB Omni

This page describes a preview available with AlloyDB Omni that lets you experiment with registering AI models and invoking predictions with Model endpoint management. To use AI models in production environments, see Build generative AI applications using AlloyDB AI and Work with vector embeddings.

Overview

The Model endpoint management preview lets you register and manage your AI model metadata in your database cluster, and then interact with the model using SQL queries. It provides the google_ml_integration extension that includes functions to add and register the model metadata, and then use the models to generate vector embeddings or invoke predictions.

We recommend not applying the functions to an application in production or similar. For your production requirements, consider using the AlloyDB AI that extends PostgreSQL syntax for querying models.

Some of the example model types that you can register using model endpoint management are as follows:

  • Vertex AI text embedding models
  • Embedding models provided by third-party providers, such as Anthropic, Hugging Face, or OpenAI.
  • Custom-hosted text embedding models
  • Generic models with a JSON-based API—for example, facebook/bart-large-mnli model hosted on Hugging Face or gemini-pro model from the Vertex AI Model Garden.

How it works

You can use model endpoint management to register models that comply to the following:

  • Model input and output supports JSON format.
  • Model can be called using the REST protocol.

When you register a model with the model endpoint management, it registers a unique model ID reference to the model. You can use this model ID to query models:

  • Generate embeddings to translate text prompts to numerical vectors. You can apply the generated vector embeddings as input to pgvector functions. For more information, see Query and index embeddings with pgvector.

  • Invoke predictions to call a model using SQL within a transaction.

Your applications can access the model endpoint management using the google_ml_integration extension. This extension provides the following functions:

  • The google_ml.create_model() SQL function, which is used to register the model metadata that is used in the prediction or embedding function.
  • The google_ml.create_sm_secret() SQL function, which uses secrets in the Google Cloud Secret Manager, where the API keys are stored.
  • The google_ml.embedding() SQL function, which is a prediction function that handles common requirements that must be satisfied for a model type.
  • The google_ml.predict_row() SQL function, which supports calling generic models that support JSON input and output format.
  • Other helper functions that handle generating custom URL, generating HTTP headers, or passing transform functions for your generic models.
  • Functions to manage the registered models and secrets.

Key concepts

Before you start using the model endpoint management, understand the concepts required to connect to and use the models.

Model provider

Model provider indicates the supported model hosting providers. The following table shows the model provider value you must set based on the model provider you use:

Model provider Set in function as…
Vertex AI google
Hugging Face models custom
Anthropic models custom
Other models custom
OpenAI open_ai

The default model provider is custom.

Based on the provider type, the supported authentication method differs. The Vertex AI models use the AlloyDB service account that they created during AlloyDB Omni installation to authenticate, while other providers can use the Secret Manager to authenticate. For more information, see Set up authentication.

Model type

Model type indicates the types of the AI model. The extension supports text as well as any generic model type. The google_ml.embedding() SQL function provides support for the text embedding model type. For generic models, you must pass additional model metadata such as custom HTTP headers or endpoints.

The supported model type you can set in the function are text-embedding and generic. Setting model type is optional for generic models.

Models with built-in support
The model endpoint management provides built-in support for all versions of the textembedding-gecko model by Vertex AI and the text-embedding-ada-002 model by OpenAI. For these models, AlloyDB automatically sets up default transform functions.
Other text embedding models
The google_ml.embedding() function can be used for any text-embedding model along with the transform functions to handle the input and output formats that the model supports. Optionally, you can use the HTTP header generation function that generates custom headers required by your model.
Generic models
The model endpoint management supports all other models of any type through the google_ml.predict_row() function. You can set model metadata, such as request endpoint and HTTP headers that are specific to your model.

Authentication

Auth types indicate the authentication type that you can use to connect to the model endpoint management using the google_ml_integration extension. Setting authentication is optional and is required only if you need to authenticate to access your model.

For Vertex AI models, the AlloyDB service account that you used during AlloyDB Omni installation is used for authentication. For other models, API key or bearer token that is stored as a secret in the Secret Manager can be used with the google_ml.create_sm_secret() SQL function.

The following table shows the auth types that you can set:

Authentication method Set in function as… Model Provider
AlloyDB service agent alloydb_service_agent_iam Vertex AI provider
Secret Manager secret_manager third-party providers, such as Anthropic, Hugging Face, or OpenAI

Prediction functions

The google_ml_integration extension includes the following prediction functions:

google_ml.embedding() function
Used to call registered text embedding models to generate embeddings. It includes built-in support for the textembedding-gecko model by Vertex AI and the text-embedding-ada-002 model by OpenAI.
For text embedding models without built-in support, the input and output parameters are unique to a model and need to be transformed for the function to call the model. Create a transform input function to transform input of the prediction function to the model specific input, and a transform output function to transform model specific output to the prediction function output.
google_ml.predict_row() function
Used to call registered generic models, as long as they support JSON-based API, to invoke predictions.

Transform functions

Transform functions modify the input to a format that the model understands, and convert the model response to a format that the prediction function expects. The transform functions are used with the google_ml.embedding() prediction function for text embedding models without built-in support. The signature of the transform functions depends on the prediction function for the model type.

The following example shows the signatures for the prediction function for text embedding models:

// define custom model specific input/output transform functions.
CREATE OR REPLACE FUNCTION input_transform_function(model_id VARCHAR(100), input_text TEXT) RETURNS JSON;

CREATE OR REPLACE FUNCTION output_transform_function(model_id VARCHAR(100), response_json JSON) RETURNS real[];

HTTP header generation function

The HTTP header generation function generates the output in JSON key value pairs that are used as HTTP headers. The signature of the prediction function defines the signatures of the header generation function.

The following example shows the signature for the google_ml.embedding() prediction function.

CREATE OR REPLACE FUNCTION generate_headers(model_id TEXT, input TEXT) RETURNS JSON;

For the google_ml.predict_row() prediction function, the signature is as follows:

CREATE OR REPLACE FUNCTION generate_headers(model_id TEXT, input JSON) RETURNS JSON;

What's next