Spanner Vertex AI integration overview

This page provides an overview of Spanner Vertex AI integration.

Spanner Vertex AI integration helps you to access classifier and regression ML models hosted on Vertex AI through the GoogleSQL interface. This helps to seamlessly integrate ML predictions serving functionality with general Spanner data access operations performed using DQL/DML queries.

Spanner Vertex AI integration shares the same SQL syntax with BigQuery ML, even though only a subset of BigQuery ML syntax is supported.

Benefits of Spanner Vertex AI integration

Generating ML predictions using Spanner Vertex AI integration provides multiple benefits compared to the approach where Spanner data access and access to the Vertex AI prediction endpoint are performed separately:

  • Performance:
    • Better latency: Spanner Vertex AI integration talking to the Vertex AI service directly eliminates additional round-trips between a compute node running a Spanner's client and the Vertex AI service.
    • Better throughput/parallelism: Spanner Vertex AI integration runs on top of Spanner's distributed query processing infrastructure, which supports highly parallelizable query execution.
  • User experience:
    • Ability to use a single, simple, coherent, and familiar SQL interface to facilitate both data transformation and ML serving scenarios on Spanner level of scale lowers the ML entry barrier and allows for a much smoother user experience.
  • Costs:
    • Spanner Vertex AI integration uses Spanner compute capacity to merge the results of ML computations and SQL query execution, which eliminates the need to provision an additional compute (for example, in Compute Engine or Google Kubernetes Engine) for that.

How does Spanner Vertex AI integration work?

Spanner Vertex AI integration doesn't host ML models, but relies on the Vertex AI service infrastructure instead. For a model to be used with Spanner Vertex AI integration, it should be already trained and deployed to Vertex AI.

Spanner Vertex AI integration also doesn't provide any special ML training functionality. To train models on data stored in Spanner, you can use either of the following:

As soon as a model is deployed in the Vertex AI service, a database owner can register this model using the CREATE MODEL DDL statement. After that, the model can be referenced from the ML.PREDICT functions to produce predictions.

See Generate ML predictions using SQL for a tutorial on using Spanner Vertex AI integration.

Pricing

There are no additional charges from Spanner for use of Spanner Vertex AI integration. However, there are other potential charges associated with this feature:

  • You pay the standard rates for Vertex AI online prediction. The total charge depends on the model type you use. Some model types have a flat per hour rate, depending on the machine type and number of nodes that you use. Some model types have per call rates. We recommend you deploy the latter in a dedicated project where you have set explicit prediction quotas.

  • You pay the standard rates for data transfer between Spanner and Vertex AI. The total charge depends on the region hosting the server executing the query and the region hosting the called endpoint. To minimize charges, deploy your Vertex AI endpoints in the same region as your Spanner instance. When using multi-regional instance configurations or multiple Vertex AI endpoints, deploy your endpoints on the same continent.

SLA

Due to Vertex AI online prediction availability being lower, you must properly configure Spanner ML models to maintain Spanner's high availability while using Spanner Vertex AI integration:

  1. Spanner ML models must use multiple Vertex AI endpoints on the backend to enable failover.
  2. Vertex AI endpoints must conform to the Vertex AI SLA.
  3. Vertex AI endpoints must provision enough capacity to handle incoming traffic.
  4. Vertex AI endpoints must use separate regions close to the Spanner database to avoid regional outages.
  5. Vertex AI endpoints should use separate projects to avoid issues with per-project prediction quotas.

The number of redundant Vertex AI endpoints depends on their SLA, and the number of rows in Spanner queries:

Spanner SLA VertexAI SLA 1 row 10 rows 100 rows 1000 rows
99.99% 99.9% 2 2 2 3
99.99% 99.5% 2 3 3 4
99.999% 99.9% 2 2 3 3
99.999% 99.5% 3 3 4 4

Vertex AI endpoints don't need to host exactly the same model. We recommend that you configure the Spanner ML model to have a primary, complex and compute intensive model as its first endpoint. Subsequent failover endpoints can point to simplified models that are less compute intensive, scale better and can absorb traffic spikes.

Compliance

Assured Workloads don't support the Vertex AI Prediction API. Enabling a restrict resource usage constraint disables the Vertex AI API and effectively the Spanner Vertex AI integration feature.

Additionally, we recommend that you create a VPC Service Controls perimeter to ensure your production databases cannot connect to Vertex AI endpoints in your non-production projects that might not have the proper compliance configuration.