Generate ML predictions using SQL

Stay organized with collections Save and categorize content based on your preferences.

Objective

In this tutorial, you will learn how to:

  • Register a model hosted in the Vertex AI service in a Cloud Spanner schema using DDL statements.
  • Reference this model from SQL queries to generate predictions from data stored in Cloud Spanner.

Costs

This tutorial uses billable components of Google Cloud, including:

  • Cloud Spanner
  • Vertex AI

For more information about Cloud Spanner costs, see the Cloud Spanner pricing page.

For more information about Vertex AI costs, see the Vertex AI pricing page.

Before you begin

Ensure that you have a model already trained and deployed to a Vertex AI endpoint.

This tutorial uses classification and regression models deployed to the Vertex AI by following the steps described in Classification and regression overview.

For general information on ML model training and deployment, see Vertex AI Tabular Data Overview. However, for use with Spanner Vertex AI integration, a model doesn't need to be trained using Vertex AI, but only deployed to a Vertex AI endpoint. You can train the model by using any ML training framework or service, including your own ML training infrastructure, BigQuery ML or any third-party ML training service. If your data is stored in Cloud Spanner, you can use Federated queries. You can also deploy already pre-trained ML models available on public model repositories, such as TensorFlow Hub.

Configure access for Spanner Vertex AI integration to Vertex AI endpoints

To allow Spanner Vertex AI integration to access Vertex AI models, grant the Vertex AI User role to the Cloud Spanner's service agent account as follows:

  • If the Cloud Spanner service agent account doesn't exist for your project, create it by running the following command: gcloud beta services identity create --service=spanner.googleapis.com --project={PROJECT}.
  • Follow the steps described in the following tutorial to grant the Vertex AI User role to the Cloud Spanner service agent account service-{PROJECT}@gcp-sa-spanner.iam.gserviceaccount.com.

Register a remote model in a Cloud Spanner schema

You can reference a model from Cloud Spanner SQL queries after you register it in the Cloud Spanner schema. To create a reference to the Vertex AI classification model endpoint trained and deployed using the tutorial mentioned above execute the following DDL statement:

CREATE MODEL MyClassificationModel
INPUT (
  length FLOAT64,
  material STRING(MAX),
  tag_array ARRAY<STRING(MAX)>
)
OUTPUT (
  scores ARRAY<FLOAT64>,
  classes ARRAY<STRING(MAX)>
)
REMOTE
OPTIONS (
  endpoint = '//aiplatform.googleapis.com.googleapis.com/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID'
)

Here, PROJECT, LOCATION, and ENDPOINT_ID correspond to your project ID, the region where you are using Vertex AI and the ID for the Vertex AI endpoint. These should be the same values that you specified during the model deployment.

Columns length, material and tag_array correspond to the input columns (features) of the sample classification model used in the tutorial.

Columns scores and classes are standard output columns produced for all classification models trained using AutoML. The output columns for other model types (such as regression) or models trained with other training frameworks might have completely different names and types. For a complete list of model's input/output column names and their types, check model's metadata.

To create a reference to the regression model deployed in the tutorial use the following DDL statement:

CREATE MODEL MyRegressionModel
INPUT (
  age FLOAT64,
  sq_ft INT64,
  code STRING(MAX)
)
OUTPUT (
  value FLOAT64,
  lower_bound FLOAT64,
  upper_bound FLOAT64
)
REMOTE
OPTIONS (
  endpoint = '//aiplatform.googleapis.com.googleapis.com/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID'
)

Here, value, lower_bound and upper_bound are standard columns produced for all regression models trained with AutoML.

Use ML.PREDICT for ML serving

To generate prediction using registered models from DQL/DML queries use ML.PREDICT function.

For example, you can use the following SQL query to produce a classification model prediction request that is similar to the request to the Vertex AI API shown in the tutorial.

SELECT scores, classes
FROM ML.PREDICT(
  MODEL MyClassificationModel,
  SELECT
    3.6 AS length,
    'cotton' AS material,
    ['abc','def'] AS tag_array
  );

+---------------------------------------------+---------+
| scores                                      | classes |
+---------------------------------------------+---------+
| [0.96771615743637085, 0.032283786684274673] | [0, 1]  |
+---------------------------------------------+---------+

To generate a prediction from a regression model similar to the API request described in the Vertex AI tutorial, use the following query:

SELECT value, lower_bound, upper_bound
FROM ML.PREDICT(
  MODEL MyRegressionModel,
  SELECT
    3.6 AS age,
    5392 AS sq_ft,
    '90331' AS code
  );

+----------+-------------+-------------+
| value    | lower_bound | upper_bound |
+----------+-------------+-------------+
| 65.14233 | 4.6572      | 164.0279    |
+----------+-------------+-------------+

The ML.PREDICT function requires that an input relation contains all input columns of the target model. At the same time, it allows this relation to contain any other columns. In this case they will just be passed through to an output relation.

This helps to associate custom identifiers (that don't have any meaning for prediction process itself) with prediction instances. For instance, in the example below, we add additional column id to the input relation. The value of the id column uniquely identify a prediction instance and can be used to associate the generated prediction output row with the prediction input row in case an input relation contains several rows. In general, you can include any columns of an input relation to an output relation, as shown for the column age:

SELECT id, value, age
FROM ML.PREDICT(
  MODEL MyRegressionModel,
  SELECT
    1 AS id,
    3.6 AS age,
    5392 AS sq_ft,
    '90331' AS code
  );

+----+----------+-----+
| id | value    | age |
+----+----------+-----+
| 1  | 65.14233 | 3.6 |
+----+----------+-----+