Generative AI with RAG

Last reviewed 2025-09-22 UTC

Use the following architecture guides to design and deploy generative AI applications with retrieval-augmented generation (RAG) in Google Cloud.

Architecture guide	Description
RAG infrastructure for generative AI using Google Agentspace and Vertex AI	An agent-driven architecture that uses Google Agentspace as a unified platform to orchestrate an end-to-end RAG dataflow for enterprise applications that require real-time data availability and enriched contextual search.
RAG infrastructure for generative AI using Vertex AI and Vector Search	A fully managed, serverless architecture that provides optimized, high-performance vector search for large-scale applications.
RAG infrastructure for generative AI using Vertex AI and AlloyDB for PostgreSQL	An architecture that stores vector embeddings alongside your operational data in a fully managed database like AlloyDB for PostgreSQL.
Jump Start Solution: Generative AI RAG using Vertex AI and Cloud SQL	An architecture that stores vector embeddings alongside your operational data in a fully managed database like Cloud SQL.
RAG infrastructure for generative AI using GKE and Cloud SQL	A flexible, container-based architecture that provides maximum control to build custom applications with open source tools such as Ray, Hugging Face, and LangChain.
GraphRAG infrastructure for generative AI using Vertex AI and Spanner Graph	An advanced RAG architecture that combines vector search with knowledge graph queries to retrieve interconnected, contextual data, which results in more detailed and relevant generative AI responses.
Harness CI/CD pipeline for RAG applications	An architecture for a continuous integration (CI) and continuous deployment (CD) pipeline for a RAG application in Google Cloud.