Generative AI RAG with Cloud SQL

This guide helps you understand the Generative AI RAG with Cloud SQL template, which is a Google opinionated template for a chat app. This template demonstrates how you can create a chat application that uses retrieval-augmented generation (RAG). When users ask questions in the app, it provides responses that are based on the information stored as vectors in a database.

Products used

The application contains the following Google Cloud products:

  • Vertex AI: A machine learning (ML) platform that lets you train and deploy ML models and AI applications, and customize LLMs for use in applications.
  • Cloud SQL: A cloud-based service for MySQL, PostgreSQL and SQL Server databases that's fully managed on the Google Cloud infrastructure.
  • Cloud Run: A fully managed service that lets you build and deploy serverless containerized apps. Google Cloud handles scaling and other infrastructure tasks.
  • Secret Manager: Secure and convenient storage system for API keys, passwords, certificates, and other sensitive data.

Architecture

The following is the request processing flow of the application:

  1. Data is loaded to a PostgreSQL database in Cloud SQL.
  2. Embeddings of text fields are created by using Vertex AI and stored as vectors.
  3. A user opens the application in a browser.
  4. The frontend service communicates with the backend service for a generative AI call.
  5. The backend service converts the request to an embedding and searches existing embeddings.
  6. Natural language results from the embeddings search, along with the original prompt, are sent to Vertex AI to create a response.

What's next

Customize the template