AI & Machine Learning

Google Cloud expands grounding capabilities on Vertex AI

June 27, 2024

Burak Gokturk

VP & GM, Cloud AI & Industry Solutions, Google Cloud

Try Gemini 1.5 models

Google's most advanced multimodal models in Vertex AI

Introduced in April, Vertex AI Agent Builder gathers all the surfaces and tools developers need to build enterprise-ready generative AI experiences, apps, and agents.

Some of the most powerful tools are the components for retrieval augmented generation (RAG), and the unique ability to ground Gemini outputs with Google Search.

Today, I’m pleased to share that we are expanding these grounding capabilities to help our customers build more capable agents and apps:

Grounding with Google Search, now generally available, will soon offer dynamic retrieval, a new capability to help customers balance quality with cost efficiency by intelligently selecting when to use Google Search results and when to use the model’s training data.
Grounding with high-fidelity mode, announced in experimental preview today, is a new feature of our grounded generation API that will further reduce hallucinations.
Grounding with third-party datasets is coming in Q3 this year. These capabilities will help customers build AI agents and applications that offer more accurate and helpful responses. We are working with specialized providers like Moody's, MSCI, Thomson Reuters and Zoominfo to enable access to their datasets.
We’re also expanding Vector Search, the engine powering embeddings-based RAG, to offer hybrid search, now in Public Preview.

Grounding models in world knowledge with Google Search

When customers select Grounding with Google Search for their Gemini model, Gemini will use Google Search, and generate an output that is grounded with the relevant search results. It is simple to use, and it makes the world’s knowledge available to Gemini.

These capabilities address some of the most significant hurdles limiting the adoption of generative AI in the enterprise: the fact that models do not know information outside their training data, and the tendency of foundation models to “hallucinate,” or generate convincing yet factually inaccurate information. Retrieval Augmented Generation (RAG), a technique developed to mitigate these challenges, first “retrieves” facts about a question, then provides those facts to the model before it “generates” an answer – this is what we mean by grounding. Getting relevant facts quickly to augment a model's knowledge is ultimately a search problem.

Leading companies like Quora and Palo Alto Networks are using Google Cloud’s grounding capabilities to power generative AI experiences.

"Grounding with Google Search translates into more accurate, up-to-date, and trustworthy answers,” said Spencer Chan, Product Lead at Quora, which offers Grounding with Google Search on its Poe platform. “We’ve been delighted with the positive feedback so far, as users are now able to interact with Gemini bots with even greater confidence."

“We sought to optimize both the customer experience and maximize the efficiency of our support agents. In partnership with Google Cloud, this was achieved by integrating generative AI into Palo Alto Networks solutions which enhanced the ability to understand and respond to complex security inquiries,” said Alok Tongaonkar, Senior Director of Data Science at Palo Alto Networks. “This not only empowers customers with self-service troubleshooting, but also alleviates pressure on our support teams. By harnessing the grounding capabilities of Vertex AI Agent Builder alongside the power of Gemini models, we constructed our agents to deliver accurate and timely answers, all grounded in trustworthy data sources. The continuous advancements in Agent Builder's grounding functionalities promise further refinements in information retrieval and overall efficacy.”

Grounding with Google Search entails additional processing costs, but because Gemini’s training knowledge is very capable, grounding may not be needed for every query. To help customers balance the need for response quality with cost efficiency, Grounding with Google Search will soon offer dynamic retrieval, a novel capability that lets Gemini dynamically choose whether to ground user inquiries in Google Search or use the intrinsic knowledge of the models, which is more cost-efficient.

The model does this based on its ability to understand which prompts are likely to be related to never-changing, slowly-changing, or fast-changing facts. Consider scenarios like inquiring about the latest movies, where Grounding with Google Search can provide the most up-to-date information. Conversely, for general questions, like "Tell me the capital of France,”, Gemini can instantly draw from its extensive knowledge, providing responses without the need for external grounding.

Grounding models in enterprise context At Google Cloud, we firmly believe that the key to unlocking the full potential of generative AI lies in grounding it in "enterprise truth." This involves connecting AI models to a wealth of reliable information sources, including web data, company documents, operational and analytical databases, enterprise applications, and other relevant sources.

Private data is not on the internet and Google Search wouldn’t be able to find it, so in addition to Grounding with Google Search, we offer multiple ways to apply Google-quality search to your enterprise data. Vertex AI Search works out-of-the-box for most enterprise use cases. And for customers looking to build custom RAG workflows, create semantic search engines, or simply upgrade existing search capabilities, we offer our search component APIs for RAG. This suite of APIs, now generally available, provides high-quality implementations for document parsing, embedding generation, semantic ranking, and grounded answer generation, as well as a fact checking service called check-grounding.

“Deloitte’s mission is to help our clients identify and realize tangible outcomes that can create differentiated business value. Using Vertex AI Agent Builder’s grounding capabilities, we have built both internal applications to accelerate our own knowledge base as well as external applications for industry clients, such as assisting the application process for an insurance provider-to-care provider search for a healthcare client,” said Gopal Srinivasan, Global Generative AI Leader for Alphabet Google alliance, Deloitte Consulting LLP. “Agent Builder offered us an out-of-the-box RAG system to build trustworthy and relevant generative applications at speed. The new search component APIs in Agent Builder can provide us with even more flexibility and control when creating applications, thereby streamlining the specialized needs of our internal and industry client teams.”

Grounding with high-fidelity mode

The answers generated with RAG-based agents and apps typically merge the provided context from enterprise data with the model’s internal training. While this may be helpful for many use cases, like a travel assistant, industries like financial services, healthcare, and insurance often require the generated response to be sourced from only the provided context. Grounding with high-fidelity mode, announced in experimental preview today, is a new feature of the Grounded Generation API that is purpose-built to support such grounding use cases.

The feature uses a Gemini 1.5 Flash model that has been fine-tuned to focus on customer-provided context to generate answers. The service supports key enterprise use cases such as summarization across multiple documents or data extraction against a corpus of financial data. This results in higher levels of factuality, and a reduction in hallucinations. When high-fidelity mode is enabled, sentences in the answer have sources attached to them, providing support for the stated claims. Grounding confidence scores are also provided.

https://storage.googleapis.com/gweb-cloudblog-publish/images/High_Fidelity_blog_post_images.max-2200x2200.png

Making it easier to use trusted third party data for RAG

Additionally, we are announcing that starting next quarter, Vertex AI will offer a new service that will let customers ground their models and AI agents with specialized third-party data. This will help enterprises integrate third-party data into their generative AI agents to unlock unique use cases, and drive greater enterprise truth across their AI experiences. We are working with premier providers such as Moody’s, MSCI, Thomson Reuters, and Zoominfo to bring their data to this service.

"Google Cloud's third-party data grounding offerings will open up new applications for KPMG and our clients,” said Brad Brown, KPMG Global Tax & Legal CTO. “By seamlessly integrating specialized third-party data from industry leaders into our generative AI offerings, we can reduce time to insight, drive more informed decision-making, and ultimately deliver greater value using highly trustworthy data sources."

Building your own RAG systems

Embeddings are numerical representations that capture semantic relationships across complex data (text, images, etc.). Embeddings power multiple use cases, including recommendation systems, ad serving, and semantic search for RAG. For such use cases, Vertex AI offers Vector Search, which can scale to billions of vectors and find the nearest neighbors in a few milliseconds.

I am pleased to share that we are expanding Vector Search to support hybrid search. Hybrid search combines vector-based and keyword-based search techniques to ensure the most relevant and accurate responses for users. It is now available in public preview.

Additionally, our new text embeddings models (text-embedding-004, text-multilingual-embedding-002) surpass our previous versions in terms of quality and are among the top-performing models on the MTEB leaderboard. They’re enabling AI models to better understand meaning, context, and similarity across diverse data types and improving the performance of embeddings and vector search-based applications. “Our aim with our research platform, Factiva, was to make the information from our dataset of over 2 billion articles more accessible to our users. As a result, we needed to create a search experience that was optimized for relevance and reliability," said Clarence Kwei, SVP of Consumer Technology for Dow Jones. "By applying Google Cloud's text-embeddings model, Gecko, and Vector Search, Factiva is now enabled with semantic search, allowing it to generate responses to queries with greater quality and accuracy, leading to a better customer experience that we believe will drive further efficiencies and ultimately result in greater product adoption."

"Traditionally, our search logic relied heavily on word matching. This approach serves well for simple queries (like “Samsung TV”), but it doesn't work as well for more complicated searches like “'a gift for my daughter who loves soccer and is a fan of Messi.” This is where we identified a need for a more powerful solution to find items semantically relevant to the user's intent." said Nicolas Presta, Sr. Engineering Manager at Mercado Libre. "To solve this problem, we started using embeddings and vector search technology. Most of our successful sales start with a search, so it is important that we give precise results that best match a user's query. These complex searches are getting better with the addition of the items retrieved from vector search, which will ultimately increase our conversion rate. Hybrid search will unlock more opportunities to uplevel our search engine so that we can create the best customer experience while improving our bottom line."

Bring it all together for the enterprise

The era of enterprise-grade generative AI has arrived. To help organizations build generative AI applications grounded in enterprise truth, Vertex AI Agent Builder meets customers meets customer at all levels of technical expertise, offering a no-code agents console, low-code APIs, and support for popular OSS frameworks like LangChain, LlamaIndex and Firebase GenKit that enable developers to build production ready solutions.

And if your data is already in one of Google's databases like Cloud SQL, Spanner, or BigQuery, you can access it via connectors to Vertex AI Search or you can enable semantic search by using their built-in vector search capabilities to help you build enterprise generative AI applications without moving or copying data.

As these technologies become even more capable, we are committed to helping businesses realize the full potential of grounded generative AI in the real world. Ready to take the next step? Reach out to your Google Cloud representative or check out Vertex AI Agent Builder.

Posted in

DevOps & SRE

An SRE’s guide to optimizing ML systems with MLOps pipelines

By Max Saltonstall • 5-minute read

AI & Machine Learning

Unlock Inference-as-a-Service with Cloud Run and Vertex AI

By Jason (Jay) Smith • 4-minute read

Telecommunications

Rethinking 5G: The cloud imperative

By Eric Parsons • 4-minute read

Compute

Introducing A4X VMs powered by NVIDIA GB200 — now in preview

By George Elissaios • 7-minute read