DNAstack: Helping researchers find needles in genomics haystacks

About DNAstack

Toronto-based DNAstack develops a cloud-based platform for genomics data analysis and sharing. Through collaborations with Google and other organizations, DNAstack enables researchers, clinical laboratories, and pharmaceutical companies to more quickly and cost-effectively make sense of the world’s exponentially accumulating genomics data.

Industries: Healthcare & Life Sciences
Location: Canada

The DNAstack platform is built on Google Genomics and Google Cloud Platform, giving researchers fast, easy, and more secure access to genomics data for analysis and sharing.

Google Cloud Results

  • Scales easily to support a range of studies with differing resource needs
  • Enables the development of the world’s largest and ever-growing search engine for genetic mutations
  • Competes against much larger, highly funded organizations

Reduces genomics data infrastructure costs by 70%

A single human genome contains about 6 billion bits of information that must be analyzed and interpreted. Each genome consumes about 200 GB of storage. With such an unprecedented volume of data required, helping researchers find the proverbial “needle in the haystack” to identify genetic mutations that cause disease—and thus, dramatically change how medicine is practiced—is no minor task.

Enabling researchers to find needles in genomic haystacks is the mission of DNAstack, a Canadian-based startup with a fully managed cloud-based platform for the sharing, analysis, and management of genomic information. The goal is to help scientists discover, and treat, the causes of genetic diseases. The name “DNAstack” is a play on the term “needle in a haystack”. DNAstack customers include clinical diagnostic labs, pharmaceutical enterprises, agricultural companies, hospitals, and direct-to-consumer businesses.

From the 2014 founding of DNAstack, CEO and Co-founder Marc Fiume saw Google Cloud Platform (GCP) as the ideal solution upon which to build the DNAstack platform. A decisive factor was Google’s release of Google Genomics the same year. The GCP offering enables life sciences organizations to more securely store, process, and share large datasets. Google Genomics also leverages open-source Application Programming Interfaces (APIs) developed by the Global Alliance for Genomics and Health, an industry-standards setting organization for genomic research.

“We needed a strong cloud platform to build on,” says Marc. “Google’s investment in open standards and its commitment to the field of genomics, along with the strengths of Google Cloud Platform, made going with Google an easy decision, especially since Google Genomics is incorporated into Google Cloud Platform.”

“Google is making significant investments that will have massive implications for the accessibility of highly valuable biomedical data in the cloud,” Marc adds. For example, Google is enabling researchers at government and academic institutions across the United States to cost-effectively leverage Google Cloud storage, computing, and machine learning to accelerate biomedical advances.

Along with Google Genomics, DNAstack uses Cloud Storage to store massive genome samples; Compute Engine to support and scale its computational workflows; and BigQuery, which DNAstack leverages for The Beacon Network, a global search engine for genetic mutations.

“We’re competing against genomic data infrastructures that have had tens of millions of dollars invested in them. And yet, we’re a startup that’s blowing them out of the water, performance-wise, because Google is making that infrastructure investment for us.”

Marc Fiume, Co-Founder and CEO, DNAstack

Scalable storage for big data

Because it’s easily scalable up to exabytes of data, Cloud Storage enables DNAstack to support a range of research studies with differing storage needs.

“We’re supporting a cystic fibrosis study that involves genomic data from 3,000 patients, an autism study involving tens of thousands of patients, and cancer studies that can go up to hundreds of thousands of patients,” says Marc. “The data storage requirements are disease-specific and Cloud Storage gives us the flexibility to quickly and affordably scale up to hundreds of gigabytes for a small study to hundreds of terabytes for a bigger one.”

Shaping resources based on tasks

Compute Engine virtual machines also play a pivotal role for DNAstack and its customers.

Doctors can’t immediately interpret the genomics information that comes off a DNA sequencing machine. So, each genome sample goes through roughly 30 hours of computation in which the data fragments from a DNA sequencing machine are reassembled, “Humpty Dumpty” style, into a format that can be readily interpreted. In collaboration with Google and others, DNAstack developed a bioinformatics workflow execution engine in its Workflows app.

Workflows enables the execution of arbitrary computational pipelines, which allow tasks to be chained together into a cohesive set of computational operations. Each task has its own resource requirements, which necessitates assigning differing virtual machine environments. “Because Compute Engine is so scalable, people using the Workflows app can perfectly shape the computational resources they need according to task, then orchestrate the execution of thousands of workflows at any one time,” says Marc. “We’ve had customers ask if they could re-analyze the DNA of 1,000 patients over the weekend, and the good news is, they absolutely can because of the broad scalability of Compute Engine.”

High-performance search

Using BigQuery, DNAstack has built the world’s largest search engine for genetic mutations, The Beacon Network. “We needed a high-performance search engine that could enable someone to quickly find, say, a harmful mutation in an autistic male on chromosome 7,” says Marc. “The ability of BigQuery to run queries across gigabytes and petabytes of data at super-fast speeds makes our search engine possible.”

“Because Google Cloud Platform enables us to offer a ready-made genomics infrastructure, our customers spend about 70% less compared to what they’d spend developing and maintaining their own infrastructure. Right away, they can access an out-of-the-box genomics operating system that would have taken them years to develop.”

Marc Fiume, Co-Founder and CEO, DNAstack

Going to market faster

Managed services in GCP have enabled DNAstack to spin up its business quickly and to compete against much larger organizations. “With GCP, we already have the hardware and software capacity we need to run our platform,” Marc says. “We can go to market much faster with a globally competitive genome data platform. For example, we didn’t have to create a best-in-class security model. Google already did that.”

John Kelly, President of DNAstack, adds: "The support of Google Cloud Platform has helped enable DNAstack to come to the forefront of genomics data science in a very competitive manner. GCP is a powerful tool, which augments DNAstack's position as a global thought leader in this space and gives notice that we are pushing forward rapidly into the market."

The high performance of GCP is also enabling DNAstack to compete successfully against bigger players. “We’re competing against genomic data infrastructures that have had tens of millions of dollars invested in them,” Marc says. “And yet, we’re a startup that’s blowing them out of the water, performance-wise, because Google is making that infrastructure investment for us.”

Saving customers 70%

GCP is enabling DNAstack to save its customers money on infrastructure. “Because Google Cloud Platform enables us to offer a ready-made genomics infrastructure, our customers spend about 70% less compared to what they’d spend developing and maintaining their own infrastructure,” Marc says. “Right away, they can access an out-of-the-box genomics operating system that would have taken them years to develop.”

A virtuous cycle

Saving customers money creates a virtuous cycle. “The lower our prices are, the more customers we can attract,” says Marc. “The more customers we have, the more genomics data that comes into our cloud. And the more genomics data there is to share and analyze, the greater the chances are for medical breakthroughs.”

Larger datasets offer more powerful statistics and machine learning, which enables DNAstack to power the discovery of new biomarkers in the genome that cause both rare and common diseases. Offering a powerful platform for genomics data sharing and analysis is a bit like the Facebook model, Marc says. “The more users there are on Facebook, the more useful it is, and this is the same for the genomics cloud.”

“The ability to launch instances of Compute Engine and specify where a customer’s Cloud Storage buckets reside lets us scale globally and yet be in compliance with privacy and security laws that specify where data should be stored.”

Marc Fiume, Co-Founder and CEO, DNAstack

Enhancing security of genomics data

Genomics data is extremely sensitive information, representing the molecular blueprints of individuals. As a result, strict privacy and security laws protect how the data flows, what can be done with it, and where it’s stored.

“The ability to launch instances of Compute Engine and specify where a customer’s Cloud Storage buckets reside lets us scale globally and yet be in compliance with privacy and security laws that specify where data should be stored,” says Marc. For example, the recent addition of Montréal as a Google Cloud Platform region enabled DNAstack to co-launch with other organizations the Canadian Genomics Cloud, the first public cloud platform for genomics in Canada.

A sharing economy for genomics data

DNAstack is actively testing Google machine learning technologies, exploring ways in which machine learning can connect to genomes to drive new discoveries and insights.

“The future will be about incorporating information from genomes to make predictions about how patients should be treated medically,” says Marc. “But there’s far too much information locked up in medical records and genomes for those discoveries and insights to be achieved manually. We’re very confident that Google’s thought leadership in machine learning and Google Cloud Platform will give us what we’ll need to make that future a reality.

Similar to how other companies helped launch a sharing economy for cars, homes, and other services, Marc envisions a sharing economy of genomics data enabled by the cloud. The impact of easy, fast, and more secure genomics data sharing and analysis could be enormous.

“Leukemia could be cured, if you can find the right match for a bone marrow transplant,” Marc says. “Finding a match is difficult if you don't have a sufficient number of donors to tap into. But with a genomics data sharing economy, finding that match could be much easier. It could help cure someone’s leukemia. We’re excited by possibilities like this, and without Google Cloud Platform supporting our mission, it wouldn’t be possible.”

About DNAstack

Toronto-based DNAstack develops a cloud-based platform for genomics data analysis and sharing. Through collaborations with Google and other organizations, DNAstack enables researchers, clinical laboratories, and pharmaceutical companies to more quickly and cost-effectively make sense of the world’s exponentially accumulating genomics data.

Industries: Healthcare & Life Sciences
Location: Canada
Google Cloud Platform logo

12 Months FREE TRIAL

Try Kubernetes Engine, BigQuery, and other Cloud Platform products with $300 in free credit and 12 months.

TRY IT FREE
Google Cloud Platform logo

12 Months FREE TRIAL

Try Kubernetes Engine, BigQuery, and other Cloud Platform products with $300 in free credit and 12 months.

TRY IT FREE