Harnessing big genomic data to find the missing answers of Autism Spectrum Disorder

About Autism Speaks

Autism Speaks is the world's leading science and advocacy organization dedicated to funding research into the causes, prevention, and treatments for autism spectrum disorder. Thanks to thousands of participating families, Autism Speaks manages the world's largest private collection of autism-related DNA samples. To unlock the full potential of this invaluable resource, Autism Speaks launched the MSSNG project in 2014 to sequence 10,000 whole genomes from these samples and make the data available for further analysis by the global autism research community. To learn more, visit http://www.mss.ng.

Google Genomics results:

  • Tools and libraries to generate APIs and client libraries from an App Engine application
  • Processed data using a standard alignment and variant calling pipeline and imported into Google Genomics for efficient and scalable access and analysis
  • Provided secure web-based data access via the GA4GH API
  • Built a graphical research portal tailored to the needs of the global autism research community that frees staff to focus on products, not infrastructure
  • Imported phenotypic data into BigQuery for interactive analysis with the genetic variants

Scaling to thousands of genomes

    As the MSSNG project team was developing their roadmap to sequence the whole genomes of thousands of individuals, they quickly came to realize that the scale of data generated by a project of this size would exceed the capacity and capabilities of their usual partners. At 100 to 200 gigabytes per raw genome, the MSSNG project could easily surpass a petabyte of data.

    "To manage this scale, we had to reach beyond academia and the life sciences. We had to forge a new collaboration with experts in storing, analyzing, and providing access to big data," said Dr. Robert Ring, chief science officer of Autism Speaks. "Connecting biological discoveries with Google expertise in extracting value from huge amounts of information will advance not only autism research, but the entire field of genomic medicine."

    Working through Google Genomics, the MSSNG project has access to the same technologies that power Google Search and Maps. Using these technologies, the MSSNG team is creating solutions for securely storing, processing, exploring and sharing complex biological datasets. Autism Speaks has already uploaded nearly 100 terabytes of data from more than 1,300 genomes onto Google Cloud Storage and has an additional 2,000 samples in the sequencing queue. In the end, the MSSNG database will hold information from the whole genomes of 10,000 individuals, making it the world's largest single repository of autism-related DNA sequencing data.

    Enabling open science

      An important part of the MSSNG project is sharing this data with the global autism research community. Until now, the transport of genomic data between collaborators involved physically shipping hard drives, a costly and time-intensive process. The MSSNG database lets the autism community instantly power research projects by providing web-based access to genomic data from thousands of individuals, together with new online analysis tools.

      In January 2015, Nature Medicine published results from a MSSNG project-led study that revealed new insights into the diversity of autism. The largest ever autism genome study of its kind revealed that the disorder's genetic underpinnings are even more complex than previously thought: Most siblings who have autism spectrum disorder have different autism-linked genes. The study's de-identified data has been uploaded to Google Cloud Platform and is being made available to scientists for global research.

      "I am immensely excited because for the first time, any scientist anywhere in the world will be able to collaborate and perform analyses with these data in a 'common cloud'," said Dr. Stephen Scherer, Ph.D., D.Sc., FRSC, MSSNG program director. "T​hanks to Google Cloud Platform and our work with the Google Genomics team, this vast sea of information will be made accessible for free to researchers everywhere. This is an exemplar for a future when open-access genomics will lead to personalized treatments for many developmental and medical disorders."

      The MSSNG portal, which is built on Google Cloud Platform and Google Genomics, will allow qualified researchers to access the sequencing data using any modern web browser. Once logged on, researchers and bioinformaticians can query the data using tools such as BigQuery or use the Google Genomics API for batch analysis pipelines. Using Google Cloud Platform, the MSSNG project will be prepared to manage the ever changing data query workload demands for this global project. 

      “This collaboration between Google and Autism Speaks has the potential to transform the autism research landscape in exceptional ways.”

      Stephen Scherer, Ph.D., D.Sc., FRSC, MSSNG program director, Autism Speaks

      Finding the missing answers in Autism

        Over the last five years, scientists have identified a number of rare gene changes, or mutations, associated with autism. A small number of these are sufficient to cause autism by themselves. Most cases of autism, however, appear to be caused by a combination of autism risk genes and environmental factors influencing early brain development. Researchers will use the MSSNG Project data to study and help answer some of the most vexing questions about the genotype-phenotype relationships in autism.

        Each individual genome sequenced and stored in the MSSNG database will be associated with an array of detailed clinical information about the donor, which has been collected in a standardized way. This clinical information includes diagnoses and a rich diversity of related medical and research information. When combined with DNA sequencing data, researchers will be able to ask better questions and get faster answers about how genetic mutations lead to the development of autism and its many associated medical conditions.

        “The insight and expertise the Google team has brought to the table in terms of innovative new ways to look at datasets this large has been unmatched,” said Dr. Ring. “Together, we hold the capability of accelerating breakthroughs in understanding the causes and subtypes of autism in ways that can advance diagnosis and treatment as never before. This is an incredibly important moment in autism genomic discovery, and we are poised to write the next chapter together.”

        About Autism Speaks

        Autism Speaks is the world's leading science and advocacy organization dedicated to funding research into the causes, prevention, and treatments for autism spectrum disorder. Thanks to thousands of participating families, Autism Speaks manages the world's largest private collection of autism-related DNA samples. To unlock the full potential of this invaluable resource, Autism Speaks launched the MSSNG project in 2014 to sequence 10,000 whole genomes from these samples and make the data available for further analysis by the global autism research community. To learn more, visit http://www.mss.ng.

        Google Cloud Platform logo

        12 Months FREE TRIAL

        Try Kubernetes Engine, BigQuery, and other Cloud Platform products with $300 in free credit and 12 months.

        TRY IT FREE
        Google Cloud Platform logo

        12 Months FREE TRIAL

        Try Kubernetes Engine, BigQuery, and other Cloud Platform products with $300 in free credit and 12 months.

        TRY IT FREE