Built with BigQuery and Google AI: How Glean enhances enterprise search quality and relevance for teams
Dr. Ali Arsanjani
Director, AI/ML Partner Engineering, Head of AI Center of Excellence, Google Cloud
T R Vishwanath
Co-Founder, Glean
Context
About Glean
Glean searches across all your company’s apps to help you find exactly what you need and discover the information you need to do your best work. It delivers powerful, unified enterprise search across all workplace applications, websites, and data sources used within an enterprise. Search results respect the existing permissions from your company’s systems, so users only see what they already have permission to see. Glean’s enterprise search also takes into account your role, projects, collaborators, and the language and acronyms specific to your company to deliver highly personalized results that provide you with information most pertinent to you and your work. This greatly reduces time spent searching, helping you be more productive and experience less frustrations at work finding what you need to progress.
Why Google Cloud is foundational for Glean
Crucial to the performance of Glean’s powerful and personalized enterprise search is the technology behind it. Glean is built on Google Cloud (see diagram 1) and leverages Google’s data cloud – a modern data stack with components such as BigQuery, DataFlow, and Vertex AI.
Use case 1: Processing and enriching pipelines of data through Dataflow.
Glean uses Google Cloud Dataflow to extract the relevant pieces from the content indexed from different sources of workplace knowledge. It then augments the data with various relevance signals before storing them in the search index that’s hosted in the project’s Google Kubernetes Engine. Additionally, Glean uses Dataflow to generate training data at scale for our models (also trained on Google Cloud). As a whole, Google Cloud Dataflow enables Glean to build complex and flexible data processing pipelines that autoscale efficiently when processing large corpuses of data.
Use case 2: Running analytical workloads with BigQuery and Looker Studio
Glean closely measures and optimizes the satisfaction of users who are using the product to find information. This involves understanding the actions taken by the Glean user in a session and identifying when the user was able to find content useful to them in the search results, as opposed to when the results were not helpful for the user. In order to compute this metric, Glean stores the anonymized actions taken in the product in BigQuery and uses BigQuery queries to compute user satisfaction metrics. These metrics are then visualized by building a product dashboard over the BigQuery data using Looker Studio.
Use case 3: Running ML models with VertexAI.
Glean is able to train state-of-the-art language models adapted to enterprise/domain-specific language at scale by using TPUs through Vertex AI.
TPUs, or Tensor Processing Units, are custom-designed hardware accelerators developed by Google specifically for machine learning workloads. TPUs are designed to speed up and optimize the training and inference of deep neural networks.
Google offers TPUs as a cloud-based service to its customers, which enables users to train and run machine learning models at a much faster rate than traditional CPUs or GPUs. Compared to traditional hardware, TPUs have several advantages, including higher performance, lower power consumption, and better cost efficiency. TPUs are also designed to work seamlessly with other Google Cloud services, such as TensorFlow, which is a popular open-source machine learning framework. This makes it easy for developers and data scientists to build, train, and deploy machine learning models using TPUs on Google Cloud.
Training data derived from the enterprise corpora is used to do domain-adaptive pretraining and task-specific fine-tuning on large-scale models with flexibility enabled by Vertex AI. Search is additionally powered by vector search served with encoders and (Artificial Neural Networks) ANN indices trained and built through Vertex AI.
A collaborative solution
What is the joint solution and what does it look like arch diagram
Glean offers a variety of features and services for its users:
Feature 1: Search across all your company’s apps.
Glean understands context, language, behavior, and relationships with others, to constantly learn about what you need and instantly find personalized answers to your questions. For instance, Glean factors in signals like docs shared with you, documents trending in your team, the office you are based in, the applications you use the most, and the most common questions being asked and answered in the various communication apps to surface documents and answers that are relevant to your search query. In order to provide this personalized experience, Glean understands the interactions between enterprise users as well as actions performed relevant to information discovery. It uses Cloud Dataflow to join these signals with the parsed information from the different applications, and trains semantic embeddings using the information on Vertex AI.
Feature 2: Discover the things you should know
Glean makes it easy to get things done by surfacing the information you need to know and the people you need to meet. This is effectively a recommendation engine for end users that surfaces data and knowledge that is contextually relevant for them. Glean leverages the vector embeddings trained using Vertex AI to be able recommend relevant enterprise information to employees that are timely and contextually relevant.
Feature 3: Glean is easy to use and ready to go, right out of the box.
It connects with all the apps you already use, so employees can continue working with the tools they already know and love. Glean uses fully-managed and auto-scalable Google Cloud components like App Engine, Kubernetes and Cloud Tasks to ensure high reliability and low operational overhead for the infrastructure components of the search stack.
Glean uses a variety of Google Cloud components to build the product, including:
Cloud GKE
Cloud Dataflow
Cloud SQL
Cloud Storage
Cloud PubSub
Cloud KMS
Cloud Tasks
Cloud DNS
Cloud IAM
Compute Engine
Vertex AI
BigQuery
Stackdriver Logging/Monitoring/Tracing
Building the product on top of these components provides Glean with a reliable, secure, scalable and cost-effective platform. It enables us to focus on the core application and relevance features, and helps us stand out.
Building better data products with Google Cloud
Glean trusts Google Cloud as our principal and unique cloud provider. This is mainly because of four factors:
Factor 1: Security
Google Cloud provides fine-grained IAM roles as well as various security features such as Cloud Armor, IAP based authentication, encryption by default, key management service, shielded VMs and private Google access that enables Glean to have a hardened, least-privilege configuration where the customer is fully in control of the data and has a full view into the access to the system.
Factor 2: Reliable and scalable infrastructure services
With fully managed services that auto-scale like GKE, Cloud SQL, Cloud Storage, and Cloud Dataflow, we can focus on the core application logic and not worry about the system being unable to handle peak load or uptime of the system, nor worry about needing to manually down-scale the system during periods of low use for cost efficiency.
Factor 3: Advanced data processing, analytics and AI/ML capabilities
For a search and discovery product like Glean, it's very important to be able to make use of flexible data processing and analytics features in a cost effective manner. Glean builds on top of Google Cloud features like Cloud Dataflow, Vertex AI, and BigQuery to provide a highly personalized and relevant product experience to its users.
Factor 4: Support
The Google Cloud team has been a true partner to Glean and has been providing prompt support of any production issues or questions we have about the Google Cloud feature set. They’re also highly receptive to feedback and direct interaction with the product group to influence the product roadmap through new features.
Conclusion
At time of writing, Glean is one of over 800 tech companies powering their products and businesses using data cloud products from Google, such as BigQuery, Dataflow, Vertex AI. Google’s Built with BigQuery initiative, helps ISVs like Glean get started building applications using data and machine learning products and continue to add additional levels of capabilities with additional product features. By providing dedicated access to technology, expertise, and go-to-market programs, the Google Built-with initiatives (BigQuery, Google AI, etc.) can help tech companies accelerate, optimize and amplify their success.
Glean’s enterprise search and knowledge management solutions are built on Google Cloud. By partnering with Google Cloud, Glean leverages an all-in-one cloud platform for data collection, data transformation, storage, analytics and machine learning capabilities.
Through Built with BigQuery, Google is enabling, and co-innovating with tech companies like Glean to unlock product capabilities and build innovative applications using Google’s data cloud and AI products that simplify access to the underlying technology, receive helpful and dedicated engineering support, and engage in joint go-to-market programs. Participating companies can:
Get started fast with a Google-funded, pre-configured sandbox.
Accelerate product design and architecture through access to designated experts from the ISV Center of Excellence for Data Analytics and AI, who can provide insight into key use cases, architectural patterns and best practices
Amplify success with joint marketing programs to drive awareness and generate demand, and increase adoption
We would like to thank Smitha Venkat and Eugenio Scafati on the Google team for contributing to this blog.