Generative AI RAG with Cloud SQL

This guide helps you understand the Generative AI RAG with Cloud SQL template, which is a Google-provided template for a chat app. This template demonstrates how you can create a chat application that uses retrieval-augmented generation (RAG). When users ask questions in the app, it provides responses that are based on the information stored as vectors in a database.

Products used

The application contains the following Google Cloud products:

  • Vertex AI: A machine learning (ML) platform that lets you train and deploy ML models and AI applications, and customize LLMs for use in applications.
  • Cloud SQL: A cloud-based service for MySQL, PostgreSQL and SQL Server databases that's fully managed on the Google Cloud infrastructure.
  • Cloud Run: A fully managed service that lets you build and deploy serverless containerized apps. Google Cloud handles scaling and other infrastructure tasks.
  • Secret Manager: Secure and convenient storage system for API keys, passwords, certificates, and other sensitive data.

Architecture

The following image shows the components and connections in the application:

A generative AI RAG application in the design canvas. The application includes frontend, retrieval, AI, secret management, and database components.

The following is the request processing flow of the application:

  1. You load data to a PostgreSQL database in Cloud SQL.
  2. Vertex AI creates embeddings of text fields, and stores them as vectors.
  3. A user opens the application in a browser.
  4. The frontend service communicates with the retrieval service for a generative AI call.
  5. The backend service converts the request to an embedding and searches existing embeddings.
  6. The retrieval service sends natural language results from the embeddings search, along with the original prompt, to Vertex AI to create a response.

What's next