Accelerating breakthroughs in understanding autism with Google Cloud Platform
Robert Ring
Chief Science Officer, Autism Speaks
Editor's note: Today’s guest blog is by Autism Speaks Chief Science Officer Robert Ring. As the world’s largest autism science and advocacy organization, Autism Speaks has committed more than $500 million to its mission, the majority in science and medical research.
An estimated 1 in 68 children in the U.S. is on the autism spectrum. Caused by a combination of genetic and environmental influences, autism is characterized, in varying degrees, by deficits in social communication and interaction, along with the presence of repetitive patterns of behavior, interests or activities. Many individuals with autism also face a lifetime of associated medical conditions (e.g. anxiety, sleep problems, seizures and/or GI symptoms) that frequently contribute to poor outcomes.
With the participation of our amazing autism community, Autism Speaks has worked for 15 years to assembled the largest open-access collections of DNA samples from families affected by autism. Our Autism Genetic Resource Exchange (AGRE) holds the DNA of 12,000 individuals affected by autism and their parents and siblings as well as information on the autism symptoms and autism-related medical conditions of these individuals.
Building on AGRE, Autism Speaks launched the AUT10K program in collaboration with the University of Toronto’s Hospital for Sick Children’s Centre for Applied Genomics. AUT10K has already completed the sequencing of 1,000 cases, and currently has close to 2,000 additional samples nearing completion.
From the beginning, we realized that the amount of data collected by AUT10K would create many challenges. We needed to find a way to store and analyze massive data sets, while allowing remote access to this unprecedented resource for autism researchers around the world.
In the beginning, we shared genomic information by shipping hard drives around the world. Downloading even one individual’s whole genome in a conventional manner can take hours – the equivalent of downloading a hundred feature films. And by the time AUT10K achieves its milestone of 10,000 genomes, we knew we’d have a database on the petabyte scale.
Now, Autism Speaks is using Google Cloud Platform to store its data and enable real-time, collaborative access among researchers around the world. We are in the process of uploading 100 terabytes of data to Google Cloud Storage, and from there, we can import it into Google Genomics. Google Genomics will allow scientists to access the data via the Genomics API, explore it interactively using Google BigQuery, and perform custom analysis using Google Compute Engine.
Researchers will spend less time moving data around and more time analyzing data and collaborating with colleagues. We hope this will enable us to make discoveries and drive innovation faster than ever.
The insight and expertise the Google team has already brought to the table has been unmatched. Our work with them has been a game-changer for AUT10k. Together, we hold the capability of accelerating breakthroughs in understanding the causes and subtypes of autism in ways that can advance diagnosis and treatment as never before.