Building a global biomedical data ecosystem with the National Institutes of Health
Biomedical research faces enormous challenges as the volume of genomic, transcriptomic, metabolomic, phenotypic, and other data generated in research labs across the world continues to grow. According to the National Center for Biotechnology Information, the total amount of sequence data alone is doubling every seven months.
Although analyzing this staggering amount of data presents the potential for enormous positive impact, sharing data among researchers has historically been frustratingly difficult. Data is difficult to aggregate, even virtually, due to its size and to privacy concerns. Patient consents can be irregular or not machine-readable. Researcher identity often has to be established in order to access datasets. And datasets are not always housed in an environment rich in compute resources. Cloud computing can overcome all these challenges by providing scalable storage, elastic computing resources, fast global networks, and tools for data preparation and analysis.
Today we’re announcing that we are joining the National Institutes of Health (NIH) as a partner on the STRIDES (Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability) Initiative to bring the power of Google Cloud to biomedical research. As part of this agreement, we’ll make some of the most important NIH-funded datasets available to users with appropriate privacy controls. To simplify access to these datasets, we’ll integrate researcher authentication and authorization mechanisms with Google Cloud credentials. And we’re working with the Global Alliance for Genomics & Health and the BioCompute Consortium to support industry standards for data access, discovery, and cloud computation.
In addition to this partnership, we’re continuing to develop a suite of open source tools, such as Variant Transforms, to structure and integrate biomedical datasets using BigQuery, our highly-scalable managed data warehouse solution.
We are proud to be working with the NIH to dramatically increase the number of biomedical datasets available on Google Cloud. By helping researchers to discover and authenticate against these datasets using open standards, and by making these datasets ready for researchers to perform scalable analytics and data science, we hope to usher in the next generation of biomedical discoveries. For more information on healthcare and life sciences solutions on Google Cloud, visit our website. We're also looking for community input on which datasets to host first—give us your feedback here.