Recursion Pharmaceuticals:
AI-enabled pipeline finds treatments for rare diseases

About Recursion Pharmaceuticals

Founded in 2013, Recursion Pharmaceuticals combines experimental biology, bioinformatics, and artificial intelligence in a hybrid lab-to-cloud platform to identify treatments for any disease that can be modeled at the cellular level. Recursion's ultimate vision is building a map of human cellular biology to shift the pace and scale at which new treatments could benefit patients.

Industries: Life Sciences
Location: United States

Tell us your challenge. We're here to help.

Contact us

Recursion Pharmaceuticals is creating an AI-enabled human biology map to accelerate discovery of treatments for rare, untreated diseases.

Google Cloud results

  • Accelerates cellular microscopy image processing with TPUs
  • Reduces deep learning model training from hours to minutes with TensorFlow
  • Integrates Recursion's local and cloud operations with Google Kubernetes Engine on-premises

First drug to clinical trial found by machine learning

Accelerating drug discovery

Recursion Pharmaceuticals is creating an AI-enabled map of human biology to accelerate discovering promising treatments for dozens of diseases with unmet medical need. The company, headquartered in Salt Lake City, Utah, has built a drug discovery platform that combines chemistry, automated biology, and cloud computing tools to reveal new therapeutic candidates, potentially cutting the time to discover and develop a new medicine by a factor of 10.

It's a paradox, which observers call Eroom's Law: drug discovery, despite improvements in medical technology, has become increasingly slower and more expensive. As a result, the number of FDA-approved new medicines reaching patients has stagnated, while the costs of pharmaceutical R&D has continued to increase.

There are approximately 6,000 rare diseases affecting an estimated 25 million people – many of them young children – in the United States alone. The high cost and extremely long timelines of conventional drug development are even more daunting for rare diseases. As a result, less than 5 percent of rare diseases have FDA-approved treatments.

Recursion is looking to address this paradox by combining a deep understanding of biology with the latest in machine learning tools to reimagine the drug discovery and development process. The company aims to use this approach to more rapidly and inexpensively discover new medicines for dozens of diseases, both rare and common.

In this quest, engineering, AI-pattern recognition, and cloud computing have proven as necessary as biochemical expertise in crafting the solution. Recursion's data pipeline incorporates image processing, inference engines, and deep learning modules. Its engineers have built a platform that supports bursts of computational power that weigh in at trillions of calculations per second.

Recursion is already getting results. In just under two years, Recursion has created hundreds of rare disease models and generated a shortlist of drug candidates across several diseases, including moving a compound into Phase 1 clinical development towards treatment for cerebral cavernous malformation, a rare hereditary stroke syndrome. The candidate is among the first to make it to human testing that was found using machine learning. It typically takes at least 10 years to develop a new drug.

"Our goal is to discover 100 clinical candidates in the first 10 years of the company, which would be orders of magnitude more than even the largest pharmaceutical companies can achieve right now," says Recursion's Senior Vice President, Translational Discovery/Chief Evangelist, Ron Alfa.

From wet biology to the cloud

Recursion has integrated a massive processing pipeline and neural networks into a target platform that is scalable, cost-effective, and tracking to achieve potential treatments for both rare and common conditions in cardiology, neurology, dermatology, oncology, immunology, and ophthalmology, among others.

Recursion Pharmaceuticals AI-enabled pipeline finds treatments for rare diseases
Recursion converts wet biology to high-resolution cellular images and then uses a combination of CellProfiler and Convolutional Neural Networks (CNNs) to extract and process the images for its Google Cloud environment built on Google Kubernetes Engine (GKE) and cloud-based Confluent Kafka streaming. The firm is looking to transition to GKE On-Prem for a subset of its image processing and to train its machine learning modules using Cloud TPU pods, which can directly access data from Cloud Storage. BigQuery manages analytics and metrics on the treatment candidates, processing, and more.

It starts with wet biology – plates of glass bottom wells containing thousands of healthy and diseased human cells. The firm's biologists run experiments on the cells, applying stains that help characterize and quantify the features of the cellular samples: their roundness, the thickness of their membrane, the shape of their mitochondria, and other characteristics.

Recursion takes high-resolution photos of miniaturized cell biology experiments – healthy, diseased, and treated cells – and extracts these to data models for ingestion by its cloud-based AI pipeline supported by Google Cloud.
Recursion takes high-resolution photos of miniaturized cell biology experiments – healthy, diseased, and treated cells – and extracts these to data models for ingestion by its cloud-based AI pipeline supported by Google Cloud.

A microscopy team captures this data by snapping high-resolution photos of the cells at several different light wavelengths. A data pipeline that sits on top of Google Kubernetes Engine (GKE) and Confluent Kafka, all running on Google Cloud, extracts and analyzes cell features from the images. Mathematical models with data that represent the cell features are deployed in packages to GKE containers. Then, data are processed by deep neural networks to find patterns, including those humans might not recognize. The neural nets are trained to compare healthy and diseased tissue signatures with those of tissues before and after a variety of drug treatments. This strategy returns a list of promising pharmaceutical remedies. Recursion calls it drug rediscovery because the process yields new uses for existing compounds.

Exploring a big data solution

In the last two years, Recursion has adapted its solution to manage the growing scale and complexity required to tackle its compute- and memory-intensive tasks that are approaching 20 terabytes per week.

In early 2017, Recursion expanded and improved its high throughput lab operations responsible for generating the microscopy images. "That scale-up basically increased our data generation about tenfold," says Director of Product Management, Katherine Matsumoto at Recursion. The leap in scale meant rethinking its approach to managing big data. Recursion had to replace its existing batch processing solution with a distributed streaming model. After evaluating Spark and Storm, Recursion settled on a model that leveraged Confluent Kafka and GKE. "We selected Google Cloud as our cloud partner since Google Cloud has the most expertise with Kubernetes," says Ben Mabey, Vice President, Engineering at Recursion. "It reduced the amount of time it took us to spin things up and it saved a lot of money."

Today, Recursion is once again transitioning as the company evaluates solutions that are on the edge of cloud computing technology.

"The potential of using Cloud TPU pods to accelerate our deep learning research while keeping operational costs and complexity low is a big draw."

Ben Mabey, Vice President, Engineering, Recursion Pharmaceuticals

Moving from GPUs to TPUs

To train its deep learning models, Recursion uses on-premises GPUs. It then uses Google Cloud CPUs to perform inference on new images in the pipeline using these models. Recursion is currently evaluating cloud-based alternatives to better tackle both tasks.

A leading candidate is TensorFlow TPU technology, which Recursion believes can accelerate and automate image processing. TPUs are an organic fit because Recursion is already using TensorFlow to train its neural networks in its proprietary biological domains. Google has provided reference model architectures optimized for Cloud TPUs, which has allowed Recursion to easily migrate workloads over to Cloud TPUs. "The potential of using Cloud TPU pods to accelerate our deep learning research while keeping operational costs and complexity low is a big draw," says Ben. "It takes us now a little over 24 hours to train models on our local GPU cluster. It will take us, depending on the size of the TPU pod, anywhere from 7 hours to 15 minutes. Getting answers to our researchers in an order of minutes or hours versus days is a definite value add for the business."

The efficiency of TPU processing over that of GPUs is huge. A TPU processes at 90 trillion operations per second, nearly twice that of GPUs, while consuming only one-third of the power. And, unlike GPUs, which are general-purpose accelerators, TPUs are designed to accelerate the pattern matching that drives machine learning. The turn-key nature of the TPU pod is also a big selling point as it means that teams no longer need to manage clusters of GPUs for training.

TPUs promise to help Recursion better execute its mission "to decode biology to radically improve lives". From its initial and continuing focus on drug repurposing to treat rare diseases, Recursion is broadening its platform to probe data for treatments for inflammation, infectious disease, oncology, and aging.

"Using GKE On-Prem is attractive as it will allow us to manage all of our Kubernetes clusters with a single, easy-to-use console."

Ben Mabey, Vice President, Engineering, Recursion Pharmaceuticals

A tighter cloud integration

Moving its AI-module training to the cloud is a big step for Recursion. But not all of Recursion's operations – its image microscopy acquisition in particular – are candidates for cloud computing. "We'll always have a local component to our hybrid cloud solution," says Ben. "All of our data are generated locally so we need to have services that can ingest and pre-process that data before uploading it to the cloud."

To streamline the task of getting its data cloud-ready and achieve tighter integration with its cloud processing, the Recursion team is exploring the use of local Kubernetes clusters. "We want our hybrid environment to be as uniform as possible, and that's why we're looking into using GKE On-Prem," says Ben. GKE On-Prem is currently in Alpha release. "Using GKE On-Prem is attractive as it will allow us to manage all of our Kubernetes clusters with a single, easy-to-use console," he adds.

Recursion is in the final stages of switching out its Amazon storage solution for Cloud Storage. Doing so means that it can train its neural networks directly from cloud storage. "The Google file system and networks are much faster," says Ben. "And the ability to train our deep learning models from cloud storage is a huge win in terms of lower operational complexity and cost."

"Right now, a number of cloud providers have Kubernetes solutions, but Google is by far the most mature. In particular Google Kubernetes Engine, web console, and the CLI are all just more intuitive and the ergonomics surrounding them are a lot better."

Ben Mabey, Vice President, Engineering, Recursion Pharmaceuticals

Two keys: Cloud partnering and open source

In its transition from another cloud environment, Recursion cites several push and pull factors for going with Google Cloud.

"One factor was support," says Ben. "The responsiveness and the kind of high-touch customer support provided by the Google team stood out from the other cloud providers." Another was the Google stewardship of the Kubernetes project, which means that Recursion could rely on Google for its expertise. Ben explains, "Right now, a number of cloud providers have Kubernetes solutions, but Google is by far the most mature. In particular Google Kubernetes Engine, the web console, and the CLI are all just more intuitive and the ergonomics surrounding them are a lot better."

Google Cloud is also proving a better fit for deep learning. "The storage is fast, and Cloud TPU is the only cloud solution for turnkey distributed training that we view as mature," says Ben. Another attraction to Google for Recursion is commitment to the open source community demonstrated by Google. From the start, Recursion took an open source approach to building its solution. The philosophy informs its partnership strategy as well.

"The ongoing support of the open source community provided by Google really resonates with how we approach our business and the best practices we see for moving ahead," says Ben.

Contributors to this story

Ben Mabey: Vice President, Engineering, Recursion Pharmaceuticals. Ben earned his MSc in Computer Science at the University of Utah with an emphasis on image processing, computational geometry, and machine learning.

Ron Alfa: Senior Vice President, Translational Discovery/Chief Evangelist, Recursion Pharmaceuticals. Ron earned his MD and PhD (Neurosciences) from Stanford University and an MA from University College London. His research focus includes metabolic hormones and molecular therapeutics.

Katherine Matsumoto: Director, Product Management, Recursion Pharmaceuticals. Katherine earned her PhD in Linguistics from the University of Utah with a career focus on social media analytics, natural language processing, text analytics, and machine learning products.

Tell us your challenge. We're here to help.

Contact us

About Recursion Pharmaceuticals

Founded in 2013, Recursion Pharmaceuticals combines experimental biology, bioinformatics, and artificial intelligence in a hybrid lab-to-cloud platform to identify treatments for any disease that can be modeled at the cellular level. Recursion's ultimate vision is building a map of human cellular biology to shift the pace and scale at which new treatments could benefit patients.

Industries: Life Sciences
Location: United States