National Institute on Aging: Accelerating the fight against Parkinson’s Disease

About National Institute on Aging

Established in 1974, the National Institute on Aging (NIA) is part of the National Institutes of Health, where it supports research to understand the biology of aging and how to improve health and activity as people age. NIA funded research targets areas from age-related cellular changes to elucidating age-related conditions such as Alzheimer’s and Parkinson’s disease.

Industries: Healthcare

Location: United States

Products: Cloud Life Sciences

Tell us your challenge. We're here to help.

Google Cloud Results

Uses Broad Institute’s GATK on Google Genomics for exome data processing
Processes nearly 200TB of data for 6,500 exomes in just 3.5 weeks, compared to months on local infrastructure
Plans to share data with researchers at 50+ institutions around the world, with user access control to enhance security

New discoveries in weeks versus months

The National Institute on Aging works with the International Parkinson’s Disease Genomics Consortium, a broad collaboration of scientists striving to characterize molecular changes associated with the debilitating disease. A recent study involved compiling information from thousands of exomes—or the DNA sequence of all transcribed regions in an individual’s genome—from data generated at various research institutes on different sequencing platforms over a period of several years.

To make real scientific discoveries possible from so many sources of data, the data had to be reanalyzed for consistency. To reduce the possibility of technical artifacts, scientists had to perform realignment, recalibration, and re-genotyping of the exomes. But there was a problem: none of the consortium members had enough local computational resources to process all 6,500 exomes.

“Cloud computing allowed us to speed up discovery. We collaborated with Google Genomics to test varying implementations of the standard processing pipeline for exome sequence data on the cohort and population scale. The cloud was really our only option for this.”

—Mike Nalls, PhD, Scientist, National Institute on Aging

The team decided to use Google Genomics, a fully managed service on Google Cloud Platform. Scientist Mike Nalls ran Broad Institute’s GATK Best Practices pipeline using Google Genomics, processing the full 200TB set of 6,500 exomes—starting with raw, unaligned sequence data and leading to a set of variant calls—in just three and a half weeks. The dataset was subsequently used to identify six new risk loci for Parkinson’s disease, helping scientists better understand genetic risks for the disease.

“Cloud computing allowed us to speed up discovery,” says Mike Nalls, PhD, Scientist at National Institute on Aging. “We collaborated with Google Genomics to test varying implementations of the standard processing pipeline for exome sequence data on the cohort and population scale.”

Analyzing massive genetic datasets

Mike could have run the analysis even faster, but opted to limit the number of virtual machines and disks to take advantage of sustained use discounts and reduce costs. Even if hardware could have been procured, the effort would have taken months of compute time using local infrastructure. With Google Genomics on Google Cloud Platform, the National Institute on Aging can now analyze massive datasets, giving scientists access to virtually unlimited compute resources for large-scale projects.

Extensive controls for data access

Because the consortium spans more than 50 research institutes across Europe and the U.S., cloud computing was helpful in providing access to the dataset and analysis results. That’s important for a large consortium, where members may not have equal access to all data.

“We used Google Cloud Platform to share data between sites. The partitioning of data in the cloud allows us to have control over who can see what data. We can maintain privacy of individual samples and how they need to be treated in the cloud.”

—Mike Nalls, PhD, Scientist, National Institute on Aging

“We used Google Cloud Platform to share data between sites,” adds Mike. “The partitioning of data in the cloud, in terms of permissions for different buckets, allows us to have control over who can see what data. We can maintain privacy of individual samples and how they need to be treated in the cloud.”

The cloud environment also allows for greater flexibility in manipulating data. Mike, for instance, could perform analyses on and check the status of the Parkinson’s dataset from any computer with Internet access or even his cell phone, rather than relying on a massive cluster.

Powering future studies

Today, the scientists comprising the International Parkinson’s Disease Genomics Consortium have a high-quality dataset that is securely accessible and will power a number of future studies into biological underpinnings of the disease. With cloud computing, the consortium can begin generating results much sooner and more cost-effectively than they would have with local compute resources.

“We’re using the dataset for a number of projects that will attempt to identify and refine both novel and known risk loci for Parkinson’s disease,” says Mike.

Tell us your challenge. We're here to help.

About National Institute on Aging

Industries: Healthcare

Location: United States

Cloud Life Sciences

About National Institute on Aging

Tell us your challenge. We're here to help.

By accessing Google Cloud Platform through Google Genomics, researchers at the National Institute on Aging can more securely store, process, explore, and share large biological datasets.

Google Cloud Results

Analyzing massive genetic datasets

Extensive controls for data access

Powering future studies

Tell us your challenge. We're here to help.

About National Institute on Aging