This guide helps you understand the Generative AI RAG with Cloud SQL template, which is a Google opinionated template for a chat app. This template demonstrates how you can create a chat application that uses retrieval-augmented generation (RAG). When users ask questions in the app, it provides responses that are based on the information stored as vectors in a database.
Products used
The application contains the following Google Cloud products:
- Vertex AI: A machine learning (ML) platform that lets you train and deploy ML models and AI applications, and customize LLMs for use in applications.
- Cloud SQL: A cloud-based service for MySQL, PostgreSQL and SQL Server databases that's fully managed on the Google Cloud infrastructure.
- Cloud Run: A fully managed service that lets you build and deploy serverless containerized apps. Google Cloud handles scaling and other infrastructure tasks.
- Secret Manager: Secure and convenient storage system for API keys, passwords, certificates, and other sensitive data.
Architecture
The following is the request processing flow of the application:
- Data is loaded to a PostgreSQL database in Cloud SQL.
- Embeddings of text fields are created by using Vertex AI and stored as vectors.
- A user opens the application in a browser.
- The frontend service communicates with the backend service for a generative AI call.
- The backend service converts the request to an embedding and searches existing embeddings.
- Natural language results from the embeddings search, along with the original prompt, are sent to Vertex AI to create a response.