COVID-19 public dataset program: Making data freely accessible for better public outcomes

Data always plays a critical role in the ability to research, study, and combat public health emergencies, and nowhere is this more true than in the case of a global crisis. Access to data sets—and tools that can analyze that data at cloud scale—are increasingly essential to the research process, and are particularly necessary in the global response to the novel coronavirus (COVID-19).

To aid researchers, data scientists, and analysts in the effort to combat COVID-19, we are making a hosted repository of public datasets, like our COVID-19 Open Data dataset, the Global Health Data from the World Bank, and OpenStreetMap data, free to access and query through our COVID-19 Public Dataset Program. Researchers can also use BigQuery ML to train advanced machine learning models with this data right inside BigQuery at no additional cost.  

“Making COVID-19 data open and available in BigQuery will be a boon to researchers and analysis in the field,” says Sam Skillman, Head of Engineering at Descartes Labs. “In particular, having queries be free will allow greater participation, and the ability to quickly share results and analysis with colleagues and the public will accelerate our shared understanding of how the virus is spreading.”

These datasets remove barriers and provide access to critical information quickly and easily, eliminating the need to search for and onboard large data files. Researchers can access the datasets from within the Google Cloud Console, along with a description of the data and sample queries to advance research. All data we include in the program will be public and freely available. The program will remain in effect until September 15, 2020. 

“Developing data-driven models for the spread of this infectious disease is critical,” said Matteo Chinazzi, Associate Research Scientist, Northeastern University. “Our team is working intensively to model and better understand the spread of the COVID-19 outbreak. By making COVID-19 data open and available in BigQuery, researchers and public health officials can better understand, study, and analyze the impact of this disease.”

The contents of these datasets are provided to the public strictly for educational and research purposes only. We are not onboarding or managing PHI or PII data as part of the COVID-19 Public Dataset Program. Google has practices and policies in place to ensure that data is handled in accordance with widely recognized patient privacy and data security policies.

We on the Google Cloud team sincerely hope that the COVID-19 Public Dataset Program will enable better and faster research to combat the spread of this disease. Get started today.

Update: We recently made training available to help teach the fundamentals of working with these datasets on Google Cloud. Get started here.