Jump to Content
AI & Machine Learning

Introducing Vertex AI RAG Engine: Scale your Vertex AI RAG pipeline with confidence

January 9, 2025
Crispin Velez

Vertex AI Lead, Google Cloud

Lewis Liu

Group Product Manager, Google Cloud

Join us at Google Cloud Next

Early bird pricing available now through Feb 14th.

Register

Closing the gap between impressive model demos and real-world performance is crucial for successfully deploying generative AI for enterprise. Despite the incredible capabilities of generative AI for enterprise, this perceived gap may be a barrier for many developers and enterprises to “productionize” AI. This is where retrieval-augmented generation (RAG) becomes non-negotiable – it strengthens your enterprise applications by building trust in its AI outputs. 

Today, we’re sharing the general availability of Vertex AI’s RAG Engine, a fully managed service that helps you build and deploy RAG implementations with your data and methods. With our Vertex AI RAG Engine you will be able to:

  • Adapt to any architecture: Choose the models, vector databases, and data sources that work best for your use case. This flexibility ensures RAG Engine fits into your existing infrastructure rather than forcing you to adapt to it.

  • Evolve with your use case: Add new data sources, updating models, or adjusting retrieval parameters happens through simple configuration changes. The system grows with you, maintaining consistency while accommodating new requirements.

  • Evaluate in simple steps: Set up multiple RAG engines with different configurations to find what works best for your use case.

Introducing Vertex AI RAG Engine

Vertex AI RAG Engine is a managed service that lets you build and deploy RAG implementations with your data and methods. Think of it as having a team of experts who have already solved complex infrastructure challenges such as efficient vector storage, intelligent chunking, optimal retrieval strategies, and precise augmentation — all while giving you the controls to customize for your specific use case.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Vertex_RAG_Diagram.max-2200x2200.jpg

Figure 1: Vertex AI RAG Engine workflow.

Vertex AI’s RAG Engine offers a vibrant ecosystem with a range of options catering to diverse needs.

  • DIY capabilities: DIY RAG empowers users to tailor their solutions by mixing and matching different components. It works great for low to medium complexity use cases with easy-to-get-started API, enabling fast experimentation, proof-of-concept and RAG-based application with a few clicks. 

  • Search functionality: Vertex AI Search stands out as a robust, fully managed solution. It supports a wide variety of use cases, from simple to complex, with high out-of-the-box quality, easiness to get started and minimum maintenance.

  • Connectors: A rapidly growing list of connectors helps you quickly connect to various data sources, including Cloud Storage, Google Drive, Jira, Slack, or local files. RAG Engine handles the ingestion process (even for multiple sources) through an intuitive interface.

  • Enhanced performance and scalability: Vertex AI Search is designed to handle large volumes of data with exceptionally low latency. This translates to faster response times and improved performance for your RAG applications, especially when dealing with complex or extensive knowledge bases.

  • Simplified data management: Import your data from various sources, such as websites, BigQuery datasets, and Cloud Storage buckets, that can streamline your data ingestion process.

  • Improved LLM output quality: By using the retrieval capabilities of Vertex AI Search, you can help to ensure that your RAG application retrieves the most relevant information from your corpus, which leads to more accurate and informative LLM-generated outputs.

Customization

One of the defining strengths of Vertex AI’s RAG Engine is its capacity for customization. This flexibility allows you to fine-tune various components to perfectly align with your data and use case.

  • Parsing: When documents are ingested into an index, they are split into chunks. RAG Engine provides the possibility to tune chunk size and chunk overlap and different strategies to support different types of documents.

  • Retrieval: you might already be using Pinecone, or perhaps you prefer the open-source capabilities of Weaviate. Maybe you want to leverage Vertex AI Vector Search or our Vector database. RAG Engine works with your choice, or if you prefer, can manage the vector storage entirely for you. This flexibility ensures you're never locked into a single approach as your needs evolve.

  • Generation: You can choose from hundreds of LLMs in Vertex AI Model Garden, including Google’s Gemini, Llama and Claude.

Use Vertex AI RAG as a tool in Gemini

Vertex AI’s RAG Engine is natively integrated with Gemini API as a tool. You can create  grounded conversation that uses RAG to provide contextually relevant answers. Simply initialize a RAG retrieval tool, configured with specific settings like the number of documents to retrieve and using an LLM-based ranker. This tool is then passed to a Gemini model.

Loading...

Use Vertex AI Search as a retriever:

Vertex AI Search provides a solution for retrieving and managing data within your Vertex AI RAG applications. By using Vertex AI Search as your retrieval backend, you can improve performance, scalability, and ease of integration.

  • Enhanced performance and scalability: Vertex AI Search is designed to handle large volumes of data with exceptionally low latency. This translates to faster response times and improved performance for your RAG applications, especially when dealing with complex or extensive knowledge bases.

  • Simplified data management: Import your data from various sources, such as websites, BigQuery datasets, and Cloud Storage buckets, that can streamline your data ingestion process.

  • Seamless integration: Vertex AI provides built-in integration with Vertex AI Search, which lets you select Vertex AI Search as the corpus backend for your RAG application. This simplifies the integration process and helps to ensure optimal compatibility between components.

  • Improved LLM output quality: By using the retrieval capabilities of Vertex AI Search, you can help to ensure that your RAG application retrieves the most relevant information from your corpus, which leads to more accurate and informative LLM-generated outputs.

Loading...

Get started today

You can access Vertex AI’s RAG Engine through our Vertex AI Studio. Visit the Google Cloud Console to get started, or reach out to us for a guided proof of concept. To get started visit our RAG quick start documentation or take a look at our Vertex AI RAG Engine GitHub repository.

Posted in