The new Tower of Babel? Using multilingual embeddings and vector search in BigQuery
Layolin Jesudhass
Generative AI Solutions Architect, Google
Ginny Gao
Customer Engineer - Data & Analytics, Google
In today's globalized marketplace, finding and understanding reviews in a customer's preferred language across multiple languages can be challenging. BigQuery is designed for managing and analyzing large datasets, including reviews. In this blog post, we present a solution that uses BigQuery multilingual embeddings, vector index and vector search, to let customers search for products or business reviews in their preferred language and receive results in that same language. These technologies convert text data into numerical vectors, allowing for advanced search capabilities that surpass traditional keyword matching, thereby enhancing the accuracy and relevance of search results.
Simplifying the retrieval results for users and introducing an additional level of refinement, our solution also uses the Translation API, which is seamlessly integrated within BigQuery, to translate reviews from various languages into the language of the user's choice. This way, businesses can easily analyze and gain insights from reviews written in different languages, and users can access and understand reviews in their preferred language.
The architecture diagram below provides a visual representation of this solution.
Multilingual Review Insights with BigQuery, Multilingual Embeddings, Vector Search and Translation API
To illustrate, we extracted Google Local review data (including ratings, text, etc.) and business metadata (such as address, category, etc.) for Texas businesses through September 2021. This dataset includes reviews written in various languages. For customers who prefer to read reviews in their own language, our solution enables them to pose questions in their native language and receive the most relevant reviews in their preferred language, even if those reviews were initially written in a different language.
For instance, to explore Texas bakeries, we posed the question "Where can I find authentic Egg Tarts and Cantonese-style buns in Houston?" These two bakery items are distinctive and widely available in Asia but less common in Houston, making it challenging to locate pertinent reviews among thousands of business profiles. With our solution, users can ask the question in Chinese, and receive the most relevant results in Chinese, even if the reviews were originally written in English, Japanese, and so forth. Irrespective of the language used in the reviews, this solution aggregates the most relevant information and translates the reviews into the language requested by the user, significantly enhancing the user's ability to extract valuable insights from reviews authored by individuals speaking different languages.
Before Translation:
After Translation in BigQuery: In the demo below, presented as a GIF, we showcase the search functionality in three languages:
-
Chinese
-
English
-
Spanish
BigQuery built-in functions that were used for this solution is shown below:
Demonstration of the solution:
Multilingual Search on Review Datasets: Ask Questions and Get Results in Your preferred language with the Power of BigQuery!
Customers can search for and read reviews in their preferred language without language barriers; you could then extend the solution with Gemini to summarize or classify the searched reviews. You can also extend this solution to any product, business reviews or multilingual datasets simply by adding a search feature, thereby allowing users to get their questions answered in their language of choice. Give it a try and imagine how you can develop other valuable data and AI tools using BigQuery!
References: