YouGenomics brings genomics data analysis to labs without bioinformatics resources using Google Genomics
When bioinformatics experts Dr. Avinash Abhyankar and Dr. Hemang Parikh looked at the genomics landscape, they noticed that labs and institutions were divided into haves and have-nots: those with computational and analytical experience to deal with enormous genomics datasets, and those that could produce but not interpret the “omics” data. Dr. Abhyankar and Dr. Parikh launched YouGenomics with a vision to make these analytical capabilities available to hospitals and small research labs.Today, they parachute in to help clients with massive-scale genomics studies and provide high-quality data processing and analysis services through their system built on Google Cloud Platform. The upcoming next release of their platform exploits the power of Google Genomics.
“With genomics data increasing day by day, we’re not too far from a point when labs cannot keep up with the growing computational demand,” Dr. Abhyankar says. “With Google Cloud Platform, we don’t need to worry about running out of space or compute— we can focus on the science.”
Bioinformatics best practices for all
Dr. Abhyankar and Dr. Parikh have witnessed the evolution of the genomics field from small studies of individual genes a decade ago to large-scale studies of whole genomes now — and they’ve seen firsthand that many research labs didn’t have the computational resources to make that leap. “It’s hard to find bioinformaticians, and small labs can’t always afford them,” Dr. Abhyankar says.
YouGenomics essentially gives research, government and hospital labs access to high-powered computational resources and bioinformatics expertise available on demand. The company focuses on whole genome sequencing data as well as other data such as epigenetics and gene expression profiling data. “Our main goal is to make analysis of these kinds of high throughput studies available to everyone, not just a few big labs with big funding,” Dr. Abhyankar adds.
Dr. Abhyankar and Dr. Parikh knew it didn’t make sense for each lab or research facility to build their own compute infrastructure to manage these projects since it would require too much maintenance and investment and wouldn’t scale as needed . “There is no way that the data or analysis can be handled locally by most labs,” Dr. Parikh says. Dr. Abhyankar and Dr. Parikh chose to work with MediaAgility, a digital consulting services company and the latter chose Google Cloud Platform as the computational backbone for the company because it’s scalable, easy to use, highly secure, and accessible all over the world.
Robust pipelines and virtually unlimited scalability
MediaAgility developed a platform for YouGenomics that automates the analysis of next-generation sequencing data and reduce the turnaround time as well as save on heavy investments. DNA/RNA sequence analysis typically involves extensive and time consuming manual processing. But this application is scalable, secure and available for users around the world.
MediaAgility’s case for cloud computing was an obvious one, and the choice of Google Cloud Platform turned out to be easy too. “I’ve been following the Global Alliance for Genomics and Health for a while now,” Dr. Abhyankar says. “They have come up with global standards for genomic data, and Google Genomics was the first to implement each and every one of those recommendations.”
With Google Genomics, the YouGenomics team can use existing data-processing pipelines or create their own, which lets them consider different approaches to find the best one for each project rather than being stuck with a one-size-fits-most pipeline. For any new dataset, they can now choose their proprietary pipeline or Google’s version of the GATK best practices pipeline for processing DNA sequence data. “The problem with genomic data analysis is that the amount of data increases after every step,” Dr. Parikh says. “Google is doing a great job of simplifying these problems and letting the users focus on the science.”
As DNA sequencers generate more data, YouGenomics recognizes that it will be increasingly important to streamline basic processing steps such as variant calling. “MediaAgility recommended that by using Google Cloud Platform along with the Google Genomics APIs, we can fairly easily automate and scale these steps,” Dr. Abhyankar says. “Once that part is done, we can use our resources more efficiently for developing new algorithms for data interpretation, automated diagnosis and so on.”
Dr. Parikh says that it was quite simple to get started with Google Genomics. “Their documentation is very good — much better than other cloud platforms,” he says. This ease of use also keeps YouGenomics lean, with no staff needed to manage the cloud platform.
Speedy queries for clients around the world
YouGenomics has provided many labs with lightning-fast data analysis. Google BigQuery has been a crucial component in that success, according to Dr. Abhyankar. “Storing whole genome and whole exome sequencing data in conventional databases makes it really slow to query, and performance takes a hit from having hundreds to thousands of samples in there,” he says, noting that a typical human genome contains about 4 million variants that must be queried for any analysis. “We use BigQuery for storing and querying genome and exome sequencing datasets, and it simplifies things quite a lot.”
GCP has also allowed the team to collaborate with labs all over the world. Clients anywhere can access the analysis and results through Google Genomics in a secure and HIPAA-compliant environment. “This would not have been feasible for us if we decided to build hardware on our own,” Dr. Abhyankar says.