BenchSci helps pharma deliver new medicines—stat!—with Google Cloud
Aaron Gabow
Director of Engineering, BenchSci
Craig Newell
Principal Engineer, BenchSci
Every startup should have a lofty goal, even if they’re not 100% certain how they’ll reach it. Our company, BenchSci, is a Canadian biotech startup whose mission is to help scientists bring new medicines to patients 50% faster by 2025. Since founding the company in 2015, we’ve been building a platform to help scientists design better experiments by mining a vast catalog of public datasets, research articles, and proprietary customer datasets. And that platform is built entirely on Google Cloud, whose breadth and depth of features has supported us as we move toward our goal.
There's urgency to our mission because pharmaceutical R&D can be inefficient. Take for example preclinical research: one study estimates that half of preclinical research spending is wasted, amounting to $28.2B annually in the U.S. alone and up to $48.6 billion globally1. And by our estimates, about 36.1% of that preclinical research waste comes from scientists using inappropriate reagents—materials such as antibodies used in life science experiments.
As such, our first product was an AI-assisted reagent selection tool. It collects relevant scientific papers and reagent catalogs, extracts relevant data points from them with proprietary machine learning models, and makes the results searchable to scientists from an easy-to-use interface. Scientists can quickly determine up front whether a particular reagent is a good fit for their experiment, based on existing experimental evidence. That way, they can focus on experiments with the greatest likelihood of productive results and bring new treatments to patients faster.
All this runs on Google Cloud. We collect papers, theses, product catalogs, medical and biological databases, and other data, and store them in Cloud Storage. We then organize and extract insights from the data, using a pipeline built from tools including Dataflow and BigQuery. Next, we process the data with our machine learning algorithms, and store results in Cloud SQL and Cloud Storage. Scientists access the results via a web interface built on Google Kubernetes Engine (GKE), Cloud Load Balancer, Identity-Aware Proxy, Cloud CDN, Cloud DNS, and other services. Finally, we use multiple cloud projects, IAM, and infrastructure as code to keep data secure and each customer isolated. As such, we’ve eliminated the need for all but the most specialized R&D infrastructure, as well as for operational hardware, and slashed our management overhead.
The combination of Google Cloud’s managed services and easily scalable persistent containers and VMs also lets us prototype and test new capabilities, then bring them to production with minimal management on our part.
Google Cloud has also scaled with BenchSci's needs. The data we analyze has increased by an order of magnitude over three years, and switching to BigQuery and Cloud SQL, for example, removed a great deal of our operational overhead. We also appreciate the flexibility of BigQuery to drive critical steps in our text-processing ML pipeline and the stability of Cloud SQL to drive data access.
Over time, we’ve also evolved our data processing pipeline. We started out with Dataproc, a managed Hadoop service, but eventually rewrote this system in Dataflow, which uses Apache Beam. Dataflow can handle hundreds of terabytes, and lets us focus on implementing our business logic rather than managing the underlying infrastructure.
Recently, we’ve expanded our platform to support private datasets. Initially, we served all our customers different views of the same underlying public data. In time though, some customers asked if we could include their proprietary pharmacological data in our system. Rather than managing multitenant systems with strict project isolation between them, we leveraged GKE and Config Connector to create unique environments for each customer's data—without increasing the operational demand on our teams.
In short, Google Cloud has enabled us to focus on solving problems without being distracted by having to build and operate computing infrastructure and services. Looking ahead, running our company on Google Cloud gives us the confidence to grow by collecting more and broader data sources; extracting more information from each unit of data with ML algorithms; processing ever more extensive and more proprietary data; and serving a broader range of customer needs through a varied set of interfaces and access points. Our goal is still ambitious, but by partnering with Google Cloud, it feels attainable.
Learn more about healthcare and life sciences solutions on Google Cloud.1. https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165