This page describes a technology preview available with AlloyDB Omni that lets you experiment with querying your database using natural language.
Overview
You can use AlloyDB Omni to preview a set of experimental features that allows your database-driven application to more securely execute natural-language queries from your application's users, such as "Where is my package?" or "Who is the top earner in each department?" AlloyDB Omni translates the natural-language input into a SQL query specific to your database, restricting the results only to what the user of your application is allowed to view.
The power and risks of natural-language queries
Large language models, such as Gemini Pro, can enable your application to run database queries based on natural-language queries created by your application's end users. For example, a model with access to your application's database schema can take end-user input like this:
What are the cheapest direct flights from Boston to Denver in July?
And translate it into a SQL query like this:
SELECT flight.id, flight.price, carrier.name, [...]
FROM [...]
WHERE [...]
ORDER BY flight.price ASC
LIMIT 10
Natural-language queries can provide your application a powerful tool for serving your users. However, this technology also comes with clear security risks that you must consider before you allow end users to run arbitrary queries on your database tables. Even if you have configured your application to connect to your database as a limited-access, read-only database user, an application that invites natural-language queries can be vulnerable to the following:
- Malicious users can submit prompt-injection attacks, trying to manipulate the underlying model to reveal all the data the application has access to.
- The model itself might generate SQL queries broader in scope than is appropriate, revealing sensitive data in response to even well-intentioned user queries.
Sanitize queries with parameterized secure views
To help mitigate the risks described in the previous section, Google has developed parameterized secure views, an experimental feature that you can preview using the techniques described on this page.
Parameterized secure views let you explicitly define the tables and columns that natural-language queries can pull data from, and add additional restrictions on the range of rows available to an individual application user. These restrictions let you tightly control the data that your application's users can see through natural-language queries, no matter how your users phrase these queries.
If you enable this technology preview, then you get access to an experimental
extension developed by Google called alloydb_ai_nl
. This extension provides
the following features:
- Parameterized secure views, a variant of SQL views for restricting the range of data that a query can access.
- The
google_exec_param_query()
function, which lets you query your parameterized secure views. - The
google_get_sql_current_schema()
function, which converts natural language queries into SQL queries of tables and views in your current schema.
The following sections describe these how to use these features, and demonstrate how they can work together.
Before you begin
Install AlloyDB Omni version 15.5.1 or later, including AI model integration. For more information, see Install AlloyDB Omni with AlloyDB for PostgreSQL AI.
Set up your database for parameterized secure views
Edit the contents of
/var/alloydb/config/postgresql.conf
so that the value of theshared_preload_libraries
directive includesalloydb_ai_nl
. The edited directive should look similar to the following:shared_preload_libraries='g_stats,google_job_scheduler,google_insights,pg_stat_statements,google_db_advisor,google_columnar_engine,alloydb_ai_nl'
Enable the
alloydb_ai_nl
extension:CREATE EXTENSION google_ml_integration; ALTER SYSTEM SET google_ml_integration.enable_model_support=on; ALTER SYSTEM SET alloydb_ai_nl.enabled=on; SELECT pg_reload_conf(); CREATE EXTENSION alloydb_ai_nl CASCADE;
Register a new language model based on the Gemini Pro API with Model endpoint management:
CALL google_ml.create_model( model_id => 'MODEL_ID', model_request_url => 'https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro:streamGenerateContent', model_provider => 'google', model_auth_type => 'alloydb_service_agent_iam');
Replace the following:
MODEL_ID
: an ID to assign to this model. For more information about Model endpoint management, see Register and call remote AI models in AlloyDB Omni.PROJECT_ID
: the ID of your Google Cloud project.
Create a new database user. Don't grant it any permissions or roles yet. A subsequent step in this procedure grants the user the permissions that it requires.
Parameterized secure views
A parameterized secure view works a lot like an ordinary PostgreSQL secure
view: a stored SELECT
statement, essentially. Parameterized secure views
additionally allow you to require one or more named parameter values passed to
the view when querying it, somewhat like bind variables with ordinary database
queries.
For example, imagine running an application whose database tracks shipments of items
to customers. A user logged into this application with the ID of
12345
types in the query Where is my package?
. Using parameterized secure
views, you can make sure that the following requirements apply to how
AlloyDB for PostgreSQL executes this query:
- The query can read only the database columns that you have explicitly
listed in your database's parameterized secure views. In this case, that
might be certain columns in your
items
,users
, andshipments
tables. - The query can read only the database rows associated with the user who
asked the query. In this case, that might require that returned rows have
a data relationship with the
users
table row whoseid
column value is12345
.
Create a parameterized secure view
To create a parameterized secure view, use the PostgreSQL CREATE VIEW
DDL
command with the following attributes:
- Create the view with the
security_barrier
option. - To restrict application users to seeing only the rows they're allowed to
see, add required parameters using the
$@PARAMETER_NAME
syntax in the WHERE clause. A common case is checking the value of a column usingWHERE COLUMN = $@PARAMETER_NAME
.
The following example parameterized secure view allows access to three columns
from a table named users
, and limits the results only to rows where
users.id
matches a required parameter:
CREATE VIEW user_psv WITH (security_barrier) AS
SELECT
username,
full_name,
birthday
FROM
users
WHERE
users.id = $@user_id;
The SELECT
statements at the core of parameterized secure views can be as
complex as the statements allowed by ordinary PostgreSQL views.
After you create a view, you must then grant the user you created earlier
permission to run SELECT
queries on the view:
GRANT SELECT ON VIEW_NAME TO NL_DB_USER;
Replace the following:
VIEW_NAME
: The name of the view that you created in the previous step.NL_DB_USER
: The name of the database user that you have designated to execute natural language queries.
Query a parameterized secure view
Despite their similarity to ordinary PostgreSQL views, you can't query parameterized
secure views directly. Instead, you use the google_exec_param_query()
function
provided by the alloydb_ai_nl
extension. The function has the following
syntax:
SELECT * FROM
alloydb_ai_nl.google_exec_param_query(
query => SQL_QUERY
param_names => ARRAY [PARAMETER_NAMES],
param_values => ARRAY [PARAMETER_VALUES]
)
Replace the following:
SQL_QUERY
: A SQL query whoseFROM
clause refers to one or more parameterized secure views.PARAMETER_NAMES
: A list of parameter names to pass in, as strings.PARAMETER_VALUES
: A list of parameter values to pass in. This list must be of the same size as theparam_names
list. The order of the values matches the order of the names.
The function returns a table of JSON objects. Each row in the table is
equivalent to the row_to_json()
value of the original query result row.
In typical use, the value of the query
argument is generated not by your
own code, but instead by an AI model that you have integrated your
AlloyDB for PostgreSQL database with.
The following example shows how you might query a parameterized secure view
in Python, and then display its results. It builds on the user_psv
example
view from the previous section:
# Assume a get_user_id() function that returns the ID of the current
# application user.
app_user_id = get_user_id()
pool = await asyncpg.create_pool(
host=INSTANCE_IP
user=NL_DB_USER
password=NL_DB_PASSWORD
database=DB_NAME
)
table_name = "user_psv"
query = f"""
SELECT
full_name,
birthday
FROM
{table_name}
"""
params = {
"user_id": app_user_id
}
param_query = f"""
SELECT * FROM alloydb_ai_nl.google_exec_param_query(
query => '{query}',
param_names => $1,
param_values => $2
);
"""
sql_results = await pool.execute(
param_query,
params.keys(),
params.values()
)
for row in sql_results:
print(json.loads(row))
Execute a natural-language query
Executing a natural-language query using parameterized secure views is a two-step process:
- As a database user with only
SELECT
access to the appropriate parameterized secure views, convert the natural language query to SQL using a large language model. - Use the
google_exec_param_query()
function to process the SQL, binding it to parameter values appropriate to the current user session.
The following sections describe these steps in further detail.
Convert natural language to SQL
To translate natural-language input into SQL, use the
google_get_sql_current_schema()
function that is included with the parameterized
secure views technology preview:
SELECT alloydb_ai_nl.google_get_sql_current_schema(
sql_text => 'NL_TEXT',
model_id => 'MODEL_ID',
prompt_text => 'HINT_TEXT'
);
Replace the following:
NL_TEXT
: The natural-language text to turn into a SQL query.MODEL_ID
: The ID of the model that you registered with the model catalog when setting up your database for parameterized secure views.HINT_TEXT
: Additional information about the database schema, expressed in natural language. This lets you give the model additional hints about important aspects of the schema that it might not extract only by analyzing the table, column, and relationship structures. As an example:When joining flights and seats, be sure to join on flights.id = seats.flight_id.
The output of the function is a string containing a SQL query.
Run the converted SQL using parameters
After converting the natural-language query to SQL, you can call
google_exec_param_query()
as described earlier on this page, passing in any
parameters that your parameterized secure views might need.
The function works if you pass it more parameters than it needs with a given query, so you can call it with all of the parameters used by all of the parameterized secure views that your application has values for. The function throws an exception if it tries to run a query requiring an undefined parameter.
An example of executing a natural-language query
This section demonstrates a complete flow from natural-language input to SQL result set. The code samples show the underlying SQL queries and functions that an application runs.
For this example flow, assume the following about your application:
- Your database-driven application tracks product shipments to customers.
- You have registered a Gemini Pro-based model named
my-gemini-model
in the Model Catalog. - You have defined a parameterized secure view in your database named
shipment_view
.- The view selects data from several tables relevant to shipments to customers.
- The view requires a
user_id
parameter, whose value is the ID of an end user of the application.
- An end user whose application user ID is
12345
types "Where is my package?" into your web application. Your application calls
google_get_sql_current_schema()
to translate the input into SQL:SELECT alloydb_ai_nl.google_get_sql_current_schema( sql_text => 'Where is my package?' model_id => 'my-gemini-model' );
This call returns a string containing a single SQL
SELECT
query. The query is limited only to the parameterized secure views visible to the database user that you created to work with parameterized secure views.The SQL generated from
Where is my package?
might resemble the following:SELECT current_location, ship_date, ship_eta FROM shipment_view;
Because
shipment_view
is a parameterized secure view and not an ordinary PostgreSQL view, your application must usegoogle_exec_param_query()
to securely run the query with theuser_id
parameter that it requires, as shown in the next step.Your application passes the SQL to
google_exec_param_query()
, along with the parameters that constrain the output. In our example, that is the ID of the application end user who provided the input:SELECT * FROM alloydb_ai_nl.google_exec_param_query( query => 'SELECT current_location, ship_date, ship_eta FROM shipment_view', param_names => ['user_id'], param_values => ['12345'] );
The output is a SQL result set, expressed as JSON data.
Your application handles the JSON data as needed.
Database design for natural-language handling
The google_get_sql_current_schema()
function provided with this technology
preview serves mainly to demonstrate the functionality of parameterized secure
views, giving you an early opportunity to experiment with this developing
technology. As with any Preview, you shouldn't apply this function to an
application in production.
With that in mind, you can apply the advice in this section to improve the
quality of google_get_sql_current_schema()
output during your experimentation
with it.
Design your schema for human comprehension
In general, give your database structures names and comments clear enough to allow a typical human developer to infer the purpose of its tables, columns, and relationships. This clarity can help a large language model generate more accurate SQL queries based on your schema.
Use descriptive names
Prefer descriptive names for tables, columns, and relationships. Avoid
abbreviations or acronyms. For example, the model works better with a table named users
than with
one named u
.
If it's not feasible to rename existing data structures, provide hints to the
model using the prompt_text
argument when calling google_get_sql_current_schema()
.
Use specific data types
The model can make better inferences about your data if you use more specific
data types with your columns. For example, if you use a column exclusively to
store true-or-false values, then use a boolean
data type with true
and
false
instead of an integer
with 1
and 0
.
Roll back with caution after enabling the preview
If you have enabled the parameterized secure views technology preview on your database, but then decide to roll back AlloyDB Omni to a version before 15.5.0, then you must take a few manual cleanup steps before downgrading.
If you
don't take these steps, then any attempt to query, modify, or drop a parameterized secure view results in a SQL error. This includes queries on your database's view catalog
that would otherwise include parameterized secure views in their results, such
as SELECT * FROM pg_views
.
To completely remove this technology preview from your database before an AlloyDB Omni rollback, follow these steps:
In
psql
, use theDROP VIEW
command to delete every parameterized secure view in your database.In
psql
, use theDROP EXTENSION
command to disable thealloydb_ai_nl
extension on your database.In your
postgresql.conf
file, remove the reference toalloydb_ai_nl
from theshared_preload_libraries
directive.
For more information about rolling back your AlloyDB Omni installation, see Roll back an upgrade.