This guide helps you understand the Generative AI RAG with Cloud SQL template, which is a Google-provided template for a chat app. This template demonstrates how you can create a chat application that uses retrieval-augmented generation (RAG). When users ask questions in the app, it provides responses that are based on the information stored as vectors in a database.
Products used
The application contains the following Google Cloud products:
- Vertex AI: A machine learning (ML) platform that lets you train and deploy ML models and AI applications, and customize LLMs for use in applications.
- Cloud SQL: A cloud-based service for MySQL, PostgreSQL and SQL Server databases that's fully managed on the Google Cloud infrastructure.
- Cloud Run: A fully managed service that lets you build and deploy serverless containerized apps. Google Cloud handles scaling and other infrastructure tasks.
- Secret Manager: Secure and convenient storage system for API keys, passwords, certificates, and other sensitive data.
Architecture
The following image shows the components and connections in the application:
The following is the request processing flow of the application:
- You load data to a PostgreSQL database in Cloud SQL.
- Vertex AI creates embeddings of text fields, and stores them as vectors.
- A user opens the application in a browser.
- The frontend service communicates with the retrieval service for a generative AI call.
- The backend service converts the request to an embedding and searches existing embeddings.
- The retrieval service sends natural language results from the embeddings search, along with the original prompt, to Vertex AI to create a response.
What's next
- Learn how to find and use other Google-provided templates.
- Understand how to customize templates to fit your specific needs.
- Identify general architectural best practices in the Google Cloud Architecture Framework.