RAG infrastructure for generative AI applications in Google Cloud

Last reviewed 2025-09-22 UTC

The following is a list of reference architectures to deploy a generative AI application with retrieval-augmented generation (RAG) in Google Cloud.

Reference architecture Description
RAG infrastructure for generative AI using Google Agentspace and Vertex AI An agent-driven architecture that uses Google Agentspace as a unified platform to orchestrate an end-to-end RAG dataflow for enterprise applications that require real-time data availability and enriched contextual search.
RAG infrastructure for generative AI using Vertex AI and Vector Search. A fully managed, serverless architecture that provides optimized, high-performance vector search for large-scale applications.
RAG infrastructure for generative AI using Vertex AI and AlloyDB for PostgreSQL. A fully managed database architecture that stores vector embeddings alongside your operational data in a fully managed database like Cloud SQL or AlloyDB for PostgreSQL.
RAG infrastructure for generative AI using GKE and Cloud SQL A flexible, container-based architecture that provides maximum control to build custom applications with open source tools such as Ray, Hugging Face, and LangChain.
GraphRAG infrastructure for generative AI using Vertex AI and Spanner Graph An advanced RAG architecture that combines vector search with knowledge graph queries to retrieve interconnected, contextual data, which results in more detailed and relevant generative AI responses.