Machine learning functions in GoogleSQL

GoogleSQL for Spanner supports the following machine learning (ML) functions.

Function list

Name	Summary
`ML.PREDICT`	Apply ML computations defined by a model to each row of an input relation.

`ML.PREDICT`

ML.PREDICT(input_model, input_relation[, model_parameters])

input_model:
  MODEL model_name

input_relation:
  { input_table | input_subquery }

input_table:
  TABLE table_name

model_parameters:
  STRUCT(parameter_value AS parameter_name[, ...])

Description

ML.PREDICT is a table-valued function that helps to access registered machine learning (ML) models and use them to generate ML predictions. This function applies ML computations defined by a model to each row of an input relation, and returns the results of those predictions. Additionally, you can use ML.PREDICT to perform vector search. When you use ML.PREDICT for vector search, it converts your natural language query text into an embedding.

Supported Argument Types

input_model: The model to use for predictions. Replace model_name with the name of the model. To create a model, see CREATE_MODEL.
input_relation: A table or subquery upon which to apply ML computations. The set of columns of the input relation must include all input columns of the input model; otherwise, the input won't have enough data to generate predictions and the query won't compile. Additionally, the set can also include arbitrary pass-through columns that will be included in the output. The order of the columns in the input relation doesn't matter. The columns of the input relation and model must be coercible.
input_table: The table containing the input data for predictions, for example, a set of features. Replace table_name with the name of the table.
input_subquery: The subquery that's used to generate the prediction input data.
model_parameters: A STRUCT value that contains parameters supported by model_name. These parameters are passed to the model inference.

Return Type

A table with the following columns:

Model outputs
Pass-through columns from the input relation

Examples

The examples in this section reference a model called DiamondAppraise and an input table called Diamonds with the following columns:

DiamondAppraise model:

Input columns Output columns

value FLOAT64 value FLOAT64

carat FLOAT64 lower_bound FLOAT64

cut STRING upper_bound FLOAT64

color STRING(1)
Diamonds table:

Columns

Id INT64

Carat FLOAT64

Cut STRING

Color STRING

Input columns	Output columns
`value FLOAT64`	`value FLOAT64`
`carat FLOAT64`	`lower_bound FLOAT64`
`cut STRING`	`upper_bound FLOAT64`
`color STRING(1)`

Columns
`Id INT64`
`Carat FLOAT64`
`Cut STRING`
`Color STRING`

The following query predicts the value of a diamond based on the diamond's carat, cut, and color.

SELECT id, color, value
FROM ML.PREDICT(MODEL DiamondAppraise, TABLE Diamonds);

+----+-------+-------+
| id | color | value |
+----+-------+-------+
| 1  | I     | 280   |
| 2  | G     | 447   |
+----+-------+-------+

You can include model-specific parameters. For example, in the following query, the maxOutputTokens parameter specifies that output, the model inference, can contain 10 or fewer tokens. This query succeeds because the model TextBison contains a parameter called maxOutputTokens.

SELECT prompt, output
FROM ML.PREDICT(
  MODEL TextBison,
  (SELECT "Is 13 prime?" as prompt), STRUCT(10 AS maxOutputTokens));

+----------------+---------------------+
| prompt         | output             |
+----------------+---------------------+
| "Is 13 prime?" | "Yes, 13 is prime." |
+----------------+---------------------+

The following example generates an embedding for a natural language query. The example then uses that embedding to find the most similar entries in a database that are indexed by vector embeddings.

-- Generate the embedding from a natural language prompt
WITH embedding AS (
  SELECT embeddings.values
  FROM ML.PREDICT(
    MODEL DiamondAppraise,
      (SELECT "What is the most valuable diamond?" as prompt)
  )
)
-- Use embedding to find the most similar entries in the database
SELECT id, color, value,
  (APPROX_COSINE_DISTANCE(valueEmbedding,
  embedding.values,
  options => JSON '{"num_leaves_to_search": 10}')) as distance
FROM products @{force_index=valueEmbeddingIndex}, embedding
WHERE valueEmbedding IS NOT NULL
ORDER BY distance
LIMIT 5;

You can use ML.PREDICT in any DQL/DML statements, such as INSERT or UPDATE. For example:

INSERT INTO AppraisedDiamond (id, color, carat, value)
SELECT
  1 AS id,
  color,
  carat,
  value
FROM
  ML.PREDICT(MODEL DiamondAppraise,
  (
    SELECT
      @carat AS carat,
      @cut AS cut,
      @color AS color
  ));