Query your database using natural language

This page describes a technology preview available with AlloyDB Omni that lets you experiment with querying your database using natural language.

Overview

You can use AlloyDB Omni to preview a set of experimental features that allows your database-driven application to more securely execute natural-language queries from your application's users, such as "Where is my package?" or "Who is the top earner in each department?" AlloyDB Omni translates the natural-language input into a SQL query specific to your database, restricting the results only to what the user of your application is allowed to view.

The power and risks of natural-language queries

Large language models, such as Gemini Pro, can enable your application to run database queries based on natural-language queries created by your application's end users. For example, a model with access to your application's database schema can take end-user input like this:

What are the cheapest direct flights from Boston to Denver in July?

And translate it into a SQL query like this:

SELECT flight.id, flight.price, carrier.name, [...]
FROM [...]
WHERE [...]
ORDER BY flight.price ASC 
LIMIT 10

Natural-language queries can provide your application a powerful tool for serving your users. However, this technology also comes with clear security risks that you must consider before you allow end users to run arbitrary queries on your database tables. Even if you have configured your application to connect to your database as a limited-access, read-only database user, an application that invites natural-language queries can be vulnerable to the following:

Malicious users can submit prompt-injection attacks, trying to manipulate the underlying model to reveal all the data the application has access to.
The model itself might generate SQL queries broader in scope than is appropriate, revealing sensitive data in response to even well-intentioned user queries.

Sanitize queries with parameterized secure views

To help mitigate the risks described in the previous section, Google has developed parameterized secure views, an experimental feature that you can preview using the techniques described on this page.

Parameterized secure views let you explicitly define the tables and columns that natural-language queries can pull data from, and add additional restrictions on the range of rows available to an individual application user. These restrictions let you tightly control the data that your application's users can see through natural-language queries, no matter how your users phrase these queries.

If you enable this technology preview, then you get access to an experimental extension developed by Google called alloydb_ai_nl. This extension provides the following features:

Parameterized secure views, a variant of SQL views for restricting the range of data that a query can access.
The google_exec_param_query() function, which lets you query your parameterized secure views.
The google_get_sql_current_schema() function, which converts natural language queries into SQL queries of tables and views in your current schema.

The following sections describe these how to use these features, and demonstrate how they can work together.

Before you begin

Install AlloyDB Omni version 15.5.1 or later, including AI model integration. For more information, see Install AlloyDB Omni with AlloyDB for PostgreSQL AI.

Set up your database for parameterized secure views

Connect to your AlloyDB Omni cluster using psql.
Edit the contents of /var/alloydb/config/postgresql.conf so that the value of the shared_preload_libraries directive includes alloydb_ai_nl. The edited directive should look similar to the following:
```
shared_preload_libraries='g_stats,google_job_scheduler,google_insights,pg_stat_statements,google_db_advisor,google_columnar_engine,alloydb_ai_nl'
```
Note: If you are running the Preview of simplified installation method for AlloyDB Omni, then you might need to add the shared_preload_libraries directive to your postgresql.conf file, with 'alloydb_ai_nl' as its entire value.
Stop AlloyDB Omni.
Start AlloyDB Omni.

Enable the alloydb_ai_nl extension:

CREATE EXTENSION google_ml_integration with version '1.3';
ALTER SYSTEM SET google_ml_integration.enable_model_support=on;
ALTER SYSTEM SET alloydb_ai_nl.enabled=on;
SELECT pg_reload_conf();
CREATE EXTENSION alloydb_ai_nl;

CALL google_ml.create_model(
    model_id => 'MODEL_ID',
    model_request_url => 'https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro:streamGenerateContent',
    model_provider => 'google',
    model_auth_type => 'alloydb_service_agent_iam');

Replace the following:

MODEL_ID: an ID to assign to this model. For more information about Model endpoint management, see Use Model endpoint management for AI models.
PROJECT_ID: the ID of your Google Cloud project.

Create a new database user. Don't grant it any permissions or roles yet. A subsequent step in this procedure grants the user the permissions that it requires.

Parameterized secure views

A parameterized secure view works a lot like an ordinary PostgreSQL secure view: a stored SELECT statement, essentially. Parameterized secure views additionally allow you to require one or more named parameter values passed to the view when querying it, somewhat like bind variables with ordinary database queries.

For example, imagine running an application whose database tracks shipments of items to customers. A user logged into this application with the ID of 12345 types in the query Where is my package?. Using parameterized secure views, you can make sure that the following requirements apply to how AlloyDB for PostgreSQL executes this query:

The query can read only the database columns that you have explicitly listed in your database's parameterized secure views. In this case, that might be certain columns in your items, users, and shipments tables.
The query can read only the database rows associated with the user who asked the query. In this case, that might require that returned rows have a data relationship with the users table row whose id column value is 12345.

Create a parameterized secure view

To create a parameterized secure view, use the PostgreSQL CREATE VIEW DDL command with the following attributes:

Create the view with the security_barrier option.
To restrict application users to seeing only the rows they're allowed to see, add required parameters using the $@PARAMETER_NAME syntax in the WHERE clause. A common case is checking the value of a column using WHERE COLUMN = $@PARAMETER_NAME.

The following example parameterized secure view allows access to three columns from a table named users, and limits the results only to rows where users.id matches a required parameter:

CREATE VIEW user_psv WITH (security_barrier) AS 
SELECT 
  username,
  full_name,
  birthday
FROM 
  users
WHERE 
  users.id = $@user_id;

The SELECT statements at the core of parameterized secure views can be as complex as the statements allowed by ordinary PostgreSQL views.

After you create a view, you must then grant the user you created earlier permission to run SELECT queries on the view:

GRANT SELECT ON VIEW_NAME TO NL_DB_USER;

Replace the following:

VIEW_NAME: The name of the view that you created in the previous step.
NL_DB_USER: The name of the database user that you have designated to execute natural language queries.

Query a parameterized secure view

Despite their similarity to ordinary PostgreSQL views, you can't query parameterized secure views directly. Instead, you use the google_exec_param_query() function provided by the alloydb_ai_nl extension. The function has the following syntax:

SELECT * FROM
alloydb_ai_nl.google_exec_param_query(
    query => SQL_QUERY
    param_names => ARRAY [PARAMETER_NAMES],
    param_values => ARRAY [PARAMETER_VALUES]
)

Replace the following:

SQL_QUERY: A SQL query whose FROM clause refers to one or more parameterized secure views.
PARAMETER_NAMES: A list of parameter names to pass in, as strings.
PARAMETER_VALUES: A list of parameter values to pass in. This list must be of the same size as the param_names list. The order of the values matches the order of the names.

The function returns a table of JSON objects. Each row in the table is equivalent to the row_to_json() value of the original query result row.

In typical use, the value of the query argument is generated not by your own code, but instead by an AI model that you have integrated your AlloyDB for PostgreSQL database with.

The following example shows how you might query a parameterized secure view in Python, and then display its results. It builds on the user_psv example view from the previous section:

# Assume a get_user_id() function that returns the ID of the current
# application user.
app_user_id = get_user_id()

pool = await asyncpg.create_pool(
    host=INSTANCE_IP
    user=NL_DB_USER
    password=NL_DB_PASSWORD
    database=DB_NAME
)

table_name = "user_psv"

query = f"""
    SELECT 
      full_name,
      birthday
    FROM 
      {table_name} 
"""
params = {
    "user_id": app_user_id
}

param_query = f"""
SELECT * FROM alloydb_ai_nl.google_exec_param_query(
  query => '{query}',
  param_names => $1,
  param_values => $2
);
"""

sql_results = await pool.execute(
    param_query, 
    params.keys(), 
    params.values()
)

for row in sql_results:
    print(json.loads(row))

Execute a natural-language query

Executing a natural-language query using parameterized secure views is a two-step process:

As a database user with only SELECT access to the appropriate parameterized secure views, convert the natural language query to SQL using a large language model.
Use the google_exec_param_query() function to process the SQL, binding it to parameter values appropriate to the current user session.

The following sections describe these steps in further detail.

Convert natural language to SQL

To translate natural-language input into SQL, use the google_get_sql_current_schema() function that is included with the parameterized secure views technology preview:

SELECT alloydb_ai_nl.google_get_sql_current_schema(
  sql_text => 'NL_TEXT',
  model_id => 'MODEL_ID',
  prompt_text => 'HINT_TEXT'
);

Replace the following:

NL_TEXT: The natural-language text to turn into a SQL query.
MODEL_ID: The ID of the model that you registered with the model catalog when setting up your database for parameterized secure views.
HINT_TEXT: Additional information about the database schema, expressed in natural language. This lets you give the model additional hints about important aspects of the schema that it might not extract only by analyzing the table, column, and relationship structures. As an example: When joining flights and seats, be sure to join on flights.id = seats.flight_id.

The output of the function is a string containing a SQL query.

Run the converted SQL using parameters

After converting the natural-language query to SQL, you can call google_exec_param_query() as described earlier on this page, passing in any parameters that your parameterized secure views might need.

The function works if you pass it more parameters than it needs with a given query, so you can call it with all of the parameters used by all of the parameterized secure views that your application has values for. The function throws an exception if it tries to run a query requiring an undefined parameter.

An example of executing a natural-language query

This section demonstrates a complete flow from natural-language input to SQL result set. The code samples show the underlying SQL queries and functions that an application runs.

For this example flow, assume the following about your application:

Your database-driven application tracks product shipments to customers.
You have registered a Gemini Pro-based model named my-gemini-model in the Model Catalog.
You have defined a parameterized secure view in your database named shipment_view.
- The view selects data from several tables relevant to shipments to customers.
- The view requires a user_id parameter, whose value is the ID of an end user of the application.

An end user whose application user ID is 12345 types "Where is my package?" into your web application.
Your application calls google_get_sql_current_schema() to translate the input into SQL:
```
SELECT alloydb_ai_nl.google_get_sql_current_schema(
  sql_text => 'Where is my package?'
  model_id => 'my-gemini-model'
);
```
This call returns a string containing a single SQL SELECT query. The query is limited only to the parameterized secure views visible to the database user that you created to work with parameterized secure views.

The SQL generated from Where is my package? might resemble the following:
```
SELECT current_location, ship_date, ship_eta FROM shipment_view;
```
Because shipment_view is a parameterized secure view and not an ordinary PostgreSQL view, your application must use google_exec_param_query() to securely run the query with the user_id parameter that it requires, as shown in the next step.
Your application passes the SQL to google_exec_param_query(), along with the parameters that constrain the output. In our example, that is the ID of the application end user who provided the input:
```
SELECT * FROM
alloydb_ai_nl.google_exec_param_query(
    query => 'SELECT current_location, ship_date, ship_eta FROM shipment_view',
    param_names => ['user_id'],
    param_values => ['12345']
);
```
The output is a SQL result set, expressed as JSON data.
Your application handles the JSON data as needed.

Database design for natural-language handling

The google_get_sql_current_schema() function provided with this technology preview serves mainly to demonstrate the functionality of parameterized secure views, giving you an early opportunity to experiment with this developing technology. As with any Preview, you shouldn't apply this function to an application in production.

With that in mind, you can apply the advice in this section to improve the quality of google_get_sql_current_schema() output during your experimentation with it.

Design your schema for human comprehension

In general, give your database structures names and comments clear enough to allow a typical human developer to infer the purpose of its tables, columns, and relationships. This clarity can help a large language model generate more accurate SQL queries based on your schema.

Use descriptive names

Prefer descriptive names for tables, columns, and relationships. Avoid abbreviations or acronyms. For example, the model works better with a table named users than with one named u.

If it's not feasible to rename existing data structures, provide hints to the model using the prompt_text argument when calling google_get_sql_current_schema().

Use specific data types

The model can make better inferences about your data if you use more specific data types with your columns. For example, if you use a column exclusively to store true-or-false values, then use a boolean data type with true and false instead of an integer with 1 and 0.

Roll back with caution after enabling the preview

If you have enabled the parameterized secure views technology preview on your database, but then decide to roll back AlloyDB Omni to a version before 15.5.0, then you must take a few manual cleanup steps before downgrading.

If you don't take these steps, then any attempt to query, modify, or drop a parameterized secure view results in a SQL error. This includes queries on your database's view catalog that would otherwise include parameterized secure views in their results, such as SELECT * FROM pg_views.

To completely remove this technology preview from your database before an AlloyDB Omni rollback, follow these steps:

In psql, use the DROP VIEW command to delete every parameterized secure view in your database.
In psql, use the DROP EXTENSION command to disable the alloydb_ai_nl extension on your database.
In your postgres.conf file, remove the reference to alloydb_ai_nl from the shared_preload_libraries directive.

For more information about rolling back your AlloyDB Omni installation, see Roll back an upgrade.