How Google Cloud is helping COVID-19 academic research
Joe Corkery, M.D.
Director of Product Management, Healthcare & Life Sciences, Google Cloud
As COVID-19 continues to grow in impact, healthcare and life science researchers are in a race to understand more about the novel coronavirus, and are increasingly turning to cloud technologies to aid them in their work.
We’re so grateful for the work of these experts, and want to support them with tools and technologies that can help them combat this pandemic. Today, we’re sharing more on a number of initiatives that we’re engaged with to support researchers and the organizations and communities they serve.
Helping researchers forecast COVID-19 spread and impactThe Laboratory for the Modelling of Biological + Sociotechnical Systems (MoBS) in the Network Science Institute at Northeastern University started running large-scale, data-driven model simulations on Google Cloud in January to estimate how mitigation strategies such as travel restrictions and social distancing policies would impact the spread of infection. The models are tremendously complex, containing dozens of parameters and huge amounts of data, and require enormous amounts of compute power, data processing, and storage.
By using Google Cloud’s High Performance Computing (HPC) capabilities, including batch processing via the Cloud Life Sciences API, Northeastern University researchers have been able to simultaneously run thousands of preemptible Virtual Machines (PVMs) to power their work. This has reduced the time it takes to run complex simulations from days to hours. Furthermore, when the simulations are complete, they can then analyze the results using BigQuery and quickly share these insights with researchers and public health agencies around the world to accelerate the shared understanding of how the virus is spreading.
The benefit is tremendous. To date, Northeastern University researchers have been able to generate over nine million different models and analyze more than 5,500 terabytes of resulting data. They also assessed the relative risk of importing cases (visualized using Google’s free visualization tool Data Studio), and published their findings in Science.
“Developing data-driven models for predicting COVID-19 infection spread and potential impact is monumental as we race to slow the virus,” said Dr. Matteo Chinazzi, Associate Research Scientist at MoBS.
Continuing to support critical researchWe are mobilizing $20 million in Google Cloud credits to enable researchers to harness the power of the cloud in their fight against COVID-19. To administer these credits effectively, we are partnering with the Harvard Global Health Institute to identify promising research opportunities and apply Google Cloud’s capabilities to support them. Harvard Global Health Institute has gathered a team of scientific advisors from a diverse range of disciplines to review submissions. Researchers who need Google Cloud capacity for work on COVID-19 can submit proposals directly to us—applications will be considered on a rolling basis.
“With academic researchers racing to discover potential treatments and therapies, collaboration is more important than ever. Our partnership with Google provides these researchers much needed resources to speed up the global response to COVID-19,” said Dr. Ashish K. Jha of the Harvard Global Health Institute. “We’re considering all different types of research approaches like clinical research, bench science research, drug delivery and therapeutics research, health services and policy research, and epidemiological research to address the urgency of the pandemic.”
We are also supporting researchers at the University of Virginia Biocomplexity Institute who are running daily epidemic simulations on Google Cloud. The results of these simulations are datasets that help state, local, and national governments track the spread of COVID-19, assess the impact of interventions, decide on how and when interventions will be relaxed and make decisions on how and where to allocate resources.
Bringing data analytics and machine learning to more researchersTo make data more widely available and accessible for researchers, Google Cloud launched the COVID-19 Public Dataset Program which enables free querying of COVID-19 related datasets in BigQuery. This includes the widely referenced Johns Hopkins University cases data (which can also be visualized in Google Sheets as a dashboard), as well as datasets that may prove relevant in COVID-19 research such as the American Community Survey and Open Street Maps. Additionally, we have introduced seven new Social Determinant of Health (SDoH) datasets available in the program that can help researchers identify which communities in the United States are most vulnerable to the pandemic.
In March, the White House and supporting institutions called upon the AI community to develop new text and data mining techniques to examine the COVID-19 Open Research Dataset (CORD-19), the most extensive machine-readable coronavirus literature collection to date. To help, we asked our Kaggle community of data scientists to join the effort, and to also take part in additional challenges to forecast the spread of COVID-19. The contributions from those efforts, including an ML-curated literature review, can be found here.
Accelerating drug discovery research efforts at lower costsResearchers are working around the clock to better understand COVID-19 and minimize its impact on both our health and the global economy. By distributing their work across tens of thousands of virtual machines on Google Cloud, researchers are able to speed up their models and analyses, resulting in substantial savings in both time and resources. Google Cloud preemptible VMs are a great way to run these types of easily distributed, fault-tolerant research applications, enabling researchers to accelerate the computational portion of their research at a fraction of the cost of standard VMs.
With the goal of accelerating as many COVID-19 related research projects as possible, Google is expanding access to preemptible VMs through PVM specific credits to support COVID-19 initiatives, in addition to the general cloud credits mentioned earlier in this post. As we receive COVID-19 research proposals, Google will work with researchers to identify ways they can accelerate and scale up their work through the use of preemptible VMs, as is the case in the following example.
Developing a new drug in the United States typically costs between 2-3 billion dollars and takes about ten years. Teams at Harvard Medical School and Dana Farber Cancer Institute (DFCI) are using VirtualFlow, an open-source scalable virtual drug discovery platform running on Google Cloud that utilizes preemptible VMs, to more quickly and accurately narrow down promising drug targets to accelerate the discovery of therapies and treatments for COVID-19 patients.
VirtualFlow is helping them target billions of drug compounds against SARS-CoV-2 proteins in a matter of days, greatly increasing their capacity to study and analyze potential therapies for COVID-19.
“The virtual testing approaches we are using have massively reduced the time required for drug and treatment discovery and will hopefully lead to faster development of therapeutics for diseases,” said Christoph Gorgulla, a postdoctoral research fellow at Harvard Medical School.
“Leveraging the abundance of structural data available on the SARS-CoV-2 proteins we are using Google Cloud’s technology to identify inhibitors of viral proteins. The use of hundreds of thousands of computational cores at Google Cloud, allows us to finish this task of screening a billion compounds, (~12 billion docking instances) in a couple of weeks. To accomplish this on a standard laptop would take 1500 years”, said Haribabu Arthanari, who is an assistant professor at the Harvard Medical School.
SARS-CoV-2 main protease with a virtual hit compound docked into the protein active site.
Once a short-list of promising pharmaceutical compounds have been identified, the team from Harvard Medical School will work with researchers at other institutions with facilities in place to begin testing. At the same time, the VirtualFlow team will run additional screens against databases of already-approved drugs to see if any contain these compounds. Harvard Medical School also has a number of other research collaborations running in parallel with other institutions to match the most promising drug compounds, which will allow their work to progress more rapidly.
Continuing to make data privacy and security a priorityData is the cornerstone of educational and academic research, and the privacy and security of that data is critically important. Our Trust Principles ensure data on Google Cloud is handled in accordance with widely recognized patient privacy and data security practices, and businesses and organizations that use Google Cloud remain in complete control of their data.
Google Cloud’s commitment to supporting educational and academic research is core to our DNA, and we’ll continue to find ways to help researchers and organizations apply cloud technologies for the benefit of all.