Build generative AI applications using Cloud SQL

This page provides an overview of capabilities offered by Cloud SQL for PostgreSQL to help you build generative AI applications. For getting started with a sample application, see Get started with using Cloud SQL for generative AI applications.

Retrieval-Augmented Generation (RAG) is a technique for optimizing the output of a large language model (LLM) by referencing an authoritative knowledge base before generating a response. RAG enhances generative AI applications by improving their accuracy. Cloud SQL databases offer capabilities curated for RAG and generative AI applications, as explained in this page.

Generate vector embeddings

Vector embeddings are essential for RAG because they enable a semantic understanding and an efficient similarity search. These embeddings are numerical representations of text, images, audio, and video. Embedding models generate the vector embeddings so that, if two pieces of content are similar semantically, then their respective embeddings are located near each other in the embedding vector space.

Cloud SQL integrates with Vertex AI. You can use the models that Vertex AI hosts to generate vector embeddings by using SQL queries.

Cloud SQL extends PostgreSQL syntax with an embedding function for generating vector embeddings of text. After you generate these embeddings, you can store them in a Cloud SQL database without needing a separate vector database.

You can also use Cloud SQL to store vector embeddings that are generated outside of Cloud SQL. For example, you can store vector embeddings that are generated by using pre-trained models in the Vertex AI Model Garden. You can use these vector embeddings as inputs to pgvector functions for similarity and semantic searches.

Store, index, and query vector embeddings with pgvector

You can store, index, and query vector embeddings in Cloud SQL by using the pgvector PostgreSQL extension.

For more information about configuring this extension, see Configure PostgreSQL extensions. For more information about storing, indexing, and querying vector embeddings, see Store a generated embedding and Query and index embeddings using pgvector.

Invoke online predictions using SQL queries

You can invoke online predictions using models stored in the Vertex AI Model Garden by using SQL queries.

Use the LangChain integration

Cloud SQL integrates with LangChain, an open-source LLM orchestration framework, to simplify developing generative AI applications. You can use the following LangChain packages:

Improve vector search performance

You can improve the performance of a vector search by using the following:

  • Data cache: use a built-in data cache that leverages a fast, local SSD to store data pages that are read frequently. You can get up to three times an improvement in read performance compared to reading from a persistent disk.
  • Data cache metrics: optimize queries based on how effectively the data cache is used in a vector search.

    Cloud SQL provides the following metrics in the Metrics Explorer in Cloud Monitoring:

    Metric Description Metric label
    Data cache used The data cache usage (in bytes) database/data_cache/bytes_used
    Data cache quota The maximum data cache size (in bytes) database/data_cache/quota
    Data cache hit count The total number of data cache hit read operations for an instance database/postgresql/data_cache/hit_count
    Data cache miss count The total number of data cache miss read operations for an instance database/postgresql/data_cache/miss_count
    Data cache hit ratio The ratio of data cache hit read operations to data cache miss read operations for an instance
    database/postgresql/data_cache/hit_ratio
  • System Insights: provide system metrics such as CPU utilization, disk utilization, and throughput to help you monitor the health of instances and troubleshoot issues that affect the performance of your generative AI applications. To view these metrics, use the Cloud SQL System Insights dashboard.

  • Query Insights: detect, diagnose, and prevent query performance problems. This is helpful to improve the performance of vector search in your generative AI applications.

    You can use the Cloud SQL Query Insights dashboard to observe the performance of top queries and analyze these queries by using visual query plans. You can also monitor performance at an application level and trace the source of a problematic query across the application stack to the database by using SQLcommenter. This is an open-source, object-relational mapping (ORM), auto-instrumentation library.

    Query Insights can also help you integrate with your existing application monitoring (APM) tools so that you can troubleshoot query problems using tools with which you're familiar.

Benefits of using Cloud SQL for generative AI applications

Using Cloud SQL to build generative AI applications provides the following:

  • Use PostgreSQL to build generative AI applications. Cloud SQL for PostgreSQL supports pgvector and integrates with both Vertex AI and LangChain.
  • Use a trusted service that has enterprise-grade data protection, security, and governance. By using Cloud SQL, you gain the following benefits:
    • A high availability SLA of 99.99% that's inclusive of maintenance
    • A managed service that provides you with features such as automatic backups, replication, patches, encryption, and automatic storage capacity increases
    • Security, governance, and compliance capabilities
  • Combine with contextual operational data. Use joins and filters on operational data while using vector embeddings to get contextual, accurate, and up-to-date responses in your generative AI applications.
  • Reduce operation toils. Use Cloud SQL as your vector database to reduce the operation toils associated with exporting data to a separate vector database.
  • Access the latest generative AI models. Use SQL queries to access the latest AI models that are hosted in Vertex AI.

Get started using Cloud SQL for generative AI applications

To get started building generative AI applications, use this sample app. The app uses Cloud SQL, Vertex AI, and either Google Kubernetes Engine (GKE) or Cloud Run. You can use the app to build a basic chatbot API that:

  • Integrates GKE or Cloud Run with Cloud SQL, Vertex AI, and pgvector
  • Demonstrates connectivity to Cloud SQL using Private Service Connect in a Virtual Private Cloud (VPC)
  • Uses Terraform to configure your infrastructure
  • Uses Python with asyncpg and FastAPI
  • Supports setting up Cloud SQL and an app that runs on either GKE or Cloud Run in separate Google Cloud projects

The solution contains the following contents:

  • Terraform templates to set up your infrastructure with best practices for security
  • A sample app for an LLM-powered Chatbot that you can deploy to GKE or Cloud Run

What's next