Data Analytics

New geospatial data comes to BigQuery public datasets with CARTO collaboration

gcp_carto.jpg

At Google Cloud, we host many public datasets, including weather, traffic, housing and other data, in BigQuery, our enterprise data warehousing platform. You can use this public data to experiment with data analytics and join it with your own data to find insights. We’re pleased to announce a new collaboration with CARTO to bring valuable location-based geospatial datasets to the BigQuery public datasets program. Spatial data is something that requires a community effort, and we’re excited to open up new possibilities for you to access, analyze and visualize GIS data.

This collaboration makes it easier for users to access data and do geospatial analysis with CARTO Data Observatory 2.0, a location intelligence platform that’s powered by BigQuery. 

The first available dataset is the U.S. Census Bureau American Community Survey (ACS). The American Community Survey is one of the most valuable public datasets in the world. Much like the decennial census, it provides demographic, population, and housing data at an incredibly high spatial resolution. Unlike the census, though, this data is collected, aggregated and updated every year, which makes it a powerful tool to support business, non-governmental, or academic initiatives.

For example, the query below shows the SQL to retrieve the data on the median income in Brooklyn in 2010 and 2017, calculate the difference, and join it to a census block groups dataset, which will then be visualized on a map.

  --Calculating the difference on median income in Brooklyn by BlockGroup from 2010 to 2017
WITH acs_2017 AS (
  SELECT geo_id, median_income AS median_income_2017
  FROM `bigquery-public-data.census_bureau_acs.blockgroup_2017_5yr`  
  WHERE geo_id LIKE '36047%' --Selecting Brooklyn
),

acs_2010 AS (
  SELECT geo_id, median_income AS median_income_2010
  FROM `bigquery-public-data.census_bureau_acs.blockgroup_2010_5yr` 
  WHERE geo_id LIKE '36047%' --Selecting Brooklyn
),

acs_diff AS (

SELECT
  a17.geo_id, a17.median_income_2017, a10.median_income_2010, geo.blockgroup_geom,
  a17.median_income_2017 - a10.median_income_2010 AS median_income_diff
FROM acs_2017 a17
JOIN acs_2010 a10
  ON a17.geo_id = a10.geo_id
JOIN `bigquery-public-data.geo_census_blockgroups.us_blockgroups_national` geo
  ON a17.geo_id = geo.geo_id
)

SELECT * FROM acs_diff WHERE median_income_diff IS NOT NULL

To see this in action, the CARTO team made a short Google Colab Python Notebook that performs that SQL query into BigQuery and visualizes it on CARTOframes. If you want to run it on your own, just open the following Google Colab and authenticate with your Google account that has access to BigQuery. After running this query, you can see a few of the Brooklyn neighborhoods stand out right away, as shown here:

Calculating the median income difference.gif

You can start using this ACS dataset in your BigQuery analyses or join your geo data with public datasets using any of the filters or predicates available in BigQuery GIS

Three additional public datasets will be available in the coming weeks, with many more to follow:

  • Bureau of Labor Statistics (BLS) economic data: The Bureau of Labor Statistics is the U.S. government's authoritative source on economic and employment data. The department provides extremely detailed data on the strength of the U.S. labor market, aggregated at various time periods and geographies. CARTO applies its technology to make this data easier to understand and use.

  • TIGER/Line U.S. Coastlines, clipped by CARTO: Each year, the U.S. Census Bureau publishes detailed boundary files that describe the political and statistical boundaries in the U.S. Because the Census Bureau publishes files to define the national coastline boundaries, these do not always cleanly align with the boundary between the shore and the ocean. CARTO applies their expertise to clip the boundary to more accurately align with the coastline and let you better connect your data with the $7.9 trillion economy of the U.S. coastline.

  • Who's on First: An open-source gazetteer (essentially a long list) of places around the globe, Who's on First is a combination of original works and existing open datasets that results in a massive, flexible, and incredibly detailed dictionary of places. Each place in the dataset has a stable identifier and some number of descriptive properties about that location. The dataset is carefully structured and updated, so you can depend on it to support a variety of projects.

Using CARTO Data Observatory 2.0 and BigQuery GIS
CARTO’s Data Observatory 2.0, the latest version of their spatial data repository, helps GIS professionals and data scientists save time by simplifying access to public data and easing data joins for spatial analysis through a common geography base. Importing and wrangling geospatial datasets can present challenges, like needing to validate file formats or geometries. With CARTO’s team creating these datasets as well-maintained references in BigQuery, it gets a lot easier to use these datasets in either CARTO or BigQuery. Plus, the CARTO team takes advantage of BigQuery’s native GIS functionality in its own technology stack.

"We chose BigQuery to power Data Observatory because it allows us to carry out geospatial analysis at scale for a wide range of use cases,” says Javier de la Torre, founder and chief strategy officer at CARTO. “And we like that Google Cloud hosts these datasets and covers the storage costs on behalf of customers. Finally, we love that public datasets can be referenced in analyses with the same ease and performance as a customer's own internal data. No loading, no copying—just use the data and enjoy." 

Here’s a look at how CARTO incorporates Google Cloud into its architecture:

carto spatial data infrastructure.png

Read more about CARTO’s spatial data infrastructure, powered by BigQuery and other Google Cloud services.

We’re excited to make these new datasets available and bring new possibilities to your geospatial analytics projects. To get started, check out the BigQuery GIS documentation and start integrating these new datasets from the CARTO Data Observatory or our Google Cloud datasets marketplace.