What are the newest datasets in Google Cloud?
Michael Hamamoto Tribble
Head of Datasets for Google Cloud
Product Marketing, Google Cloud
Try Google Cloud
Start building on Google Cloud with $300 in free credits and 20+ always free products.Free trial
Editor’s note: With Google Cloud’s datasets solution, you can access an ever-expanding resource of the newest datasets to support and empower your analyses and ML models, as well as frequently updated best practices on how to get the most out of any of our datasets. We will be regularly updating this blog with new datasets and announcements, so be sure to bookmark this link and check back often.
New dataset: Regional Carbon-free Energy Data
Google Cloud data centers rely on electricity from nearby grids which produce more or less carbon emissions depending on the type of power plant generating electricity. In 2020, we set a goal to match our energy consumption with carbon-free energy (CFE), every hour and in every region by 2030. As we work towards our 24/7 CFE goal, we want to provide transparency on our progress, which is why we are publishing the CFE Score (one of the key metrics for our 24/7 methodology) for each region we operate in. To learn more about this work, and to see visualizations, keep reading.
New dataset: U.S. Climate Gridded Dataset (NClimGrid)
NClimGrid is a gridded dataset derived from NOAA NCEI's Global Historical Climatology Network (GHCN). It consists of four climate variables derived from the GHCN-D dataset: maximum temperature, minimum temperature, average temperature and precipitation. The data files provide monthly values in a 5km by 5km lat/lon grid for the Continental United States. This information is helpful in analyzing historical and regional climate trends.
New dataset: U.S. Climate Normals
Earlier in May, NOAA NCEI released the new 30-year U.S. Climate Normals for the time period 2011-2020. Climate normals are a statistically smoothed, quality-controlled, 30-year average of recent climate conditions. The U.S. Climate Normals collection is available for the following time periods: 1901-1930, 1911-1940, and so on through 1991-2020. Because they are updated once per decade, the Normals gradually come to reflect the "new normal" of climate change caused by global warming. Users of this data include agriculture, construction, infrastructure and energy industry planners, to name a few.
New dataset: Ocean Climate Stations Moorings (Keo & Papa)
The mission of the Ocean Climate Stations (OCS) Project is to make meteorological and oceanic measurements from autonomous platforms. Calibrated, quality-controlled, and well-documented climatological measurements are available on the OCS webpage and the OceanSITES Global Data Assembly Centers (GDACs), with near real-time data available prior to release of the complete, downloaded datasets. Kuroshio Extension Observatory (KEO) and Ocean Weather Station Papa (PAPA) are two stations providing data to this project.
New dataset: Google Cloud Release Notes
Looking for a new way to access Google Cloud Release Notes besides the docs, the XML feed, and Cloud Console? Check out the Google Cloud Release Notes dataset. With up-to-date release notes for all generally available Google Cloud products, this dataset allows you to use BigQuery to programmatically analyze release notes across all products, exploring security bulletin updates, fixes, changes, and the latest feature releases.
Access the BigQuery release notes dataset from https://cloud.google.com/release-notes/all
Best practice: Use Google Trends data for common business needs
The Google Trends dataset represents the first time we’re adding Google-owned Search data into Datasets for Google Cloud. The Trends data allows users to measure interest in a particular topic or search term across Google Search, from around the United States, down to the city-level. You can learn more about the dataset here, and check out the Looker dashboard here! These tables are super valuable in their own right, but when you blend them with other actionable data you can unlock whole new areas of opportunity for your team. To learn how to make informed decisions with Google Trends data, keep reading.
New dataset: COVID-19 Vaccination Search Insights
With COVID-19 vaccinations being a topic of interest around the United States, this dataset shows aggregated, anonymized trends in searches related to COVID-19 vaccination and is intended to help public health officials design, target, and evaluate public education campaigns. Check out this interactive dashboard to explore searches for COVID-19 vaccination topics by region.
New dataset: Google Diversity Annual Report 2021
Since 2014, Google has disclosed data on the diversity of its workforce in an effort to bring candid transparency to the challenges technology companies like Google face in recruitment and retention of underrepresented communities. In an effort to make this data more accessible and useful, we've loaded it into BigQuery for the first time ever. To view Google's Diversity Annual Report and learn more, check it out.
New dataset: Google Trends Top 25 Search terms
The most popular and surging Google Search terms are now available in BigQuery as a public dataset. View the Top 25 and Top 25 rising queries from Google Trends from the past 30-days, including 5 years of historical data across the 210 Designated Market Areas (DMAs) in the US. Keep reading.
Top 25 Google Search terms, ranked by search volume (1 through 25) and with average search index score across the geographic areas (DMAs) in which it was searched.
New dataset: COVID-19 Vaccination Access
With metrics quantifying travel times to COVID-19 vaccination sites, this dataset is intended to help Public Health officials, researchers, and Healthcare Providers to identify areas with insufficient access, deploy interventions, and research these issues. Check out how this data is being used in a number of new tools.
(Image courtesy of Vaccine Equity Planner, https://vaccineplanner.org/)
Best practice: Leveraging BigQuery Public Boundaries datasets for geospatial analytics
Geospatial data is a critical component for a comprehensive analytics strategy. Whether you are trying to visualize data using geospatial parameters or do deeper analysis or modeling on customer distribution or proximity, most organizations have some type of geospatial data they would like to use - whether it be customer zipcodes, store locations, or shipping addresses. However, converting geographic data into the correct format for analysis and aggregation at different levels can be difficult. In this post, we’ll walk through some examples of how you can leverage the Google Cloud platform alongside Google Cloud Public Datasets to perform robust analytics on geographic data. Keep reading.
Get the metadata and try BigQuery sandbox
When you’ve learned about many of our datasets and pre-built solutions from across Google, you may be ready to start querying them. Check out the full dataset directory and read all the metadata at g.co/cloud/marketplace-datasets, then dig into the data with our free-to-use BigQuery sandbox account, or $300 in credits with our Google Cloud free trial.