Google Cloud Big Data and Machine Learning Blog

Innovation in data processing and machine learning technology

Crowning the rat capital of New York: importing data for analysis with Google BigQuery

Thursday, February 9, 2017

By Reto Meier, Google Developer Advocate

In this installment of #TILwBQ, import a new table into BigQuery to investigate the increase in rat complaints in the open NYC dataset.

The NYC 311 dataset in BigQuery indicates a dramatic increase in complaints about rats, starting in 2013.



To dig deeper, I uploaded the Rodent Inspection table from the NYC Pest Control Database to BigQuery; watch the video to see how I did this, and learn how to compare the NYC 311 data with your own dataset.

Manhattanites are the most likely to call in a false alarm

The NYC Department of Health sends inspectors to investigate all rodent complaints — 75% of the time, they don’t find anything.

The heatmap below on the left highlights locations where false alarms — inspections where no active rat signs were found — significantly outnumber those that find active rat signs. The blooms in Downtown Manhattan and Harlem show the largest number of false alarms.

The graph on the right highlights areas where inspectors found signs of active rats more often than not. Here Williamsburg and Bushwick are the areas with the highest concentration of inspections that found rat signs, compared to false alarms.



Does the increase in 311 rat complaints mean more rats, or more complaints?

In 2016 it’s both. But mainly it’s more rats. The graph below shows that since 2014 there’s been an increase in clean inspections (initial inspections that pass), but a larger increase in those finding active rat signs — with 2016 in particular seeing a significant bump.

In contrast, notice the spike in complaints in 2012 actually corresponded with fewer inspections that detected active rat signs that year.

The Bronx dethroned Manhattan as rat-central in 2011, but that lead is under threat

The graphs below show the change in inspections finding active rat signs per borough. On the left, we show year-on-year percentage change, and on the right the overall number.

Note that up to and including 2015, the change between any year was typically less than +/- 30%, but 2016 saw a big increase everywhere but the Bronx.



If the current rate of increase continues across New York, Brooklyn will be crowned the Rat Capital by 2019, with The Bronx falling to 4th place.

Looking at the heatmaps below from 2010 through 2016, areas of darker red indicate a concentration of inspections that found active rat signs.

You can see a gradual migration of the areas of greatest rat sign concentration out of Manhattan (West) towards The Bronx (North East), and more recently, a hotspot forming in Brooklyn around Williamsburg (South East).



If the current trend continues, I’d expect the heatmaps for 2017 onwards to have significant hotspots in Downtown Manhattan and Brooklyn, with The Bronx becoming less prominent.

Next steps

To learn what’s causing the increase in rats, we need more data.

For example, when we explored the effect of weather on NYC 311 complaints, we found a weak correlation between temperature and rat complaints; and 2015 and 2016 were significantly warmer than 2013 and 2014—so perhaps that’s a factor.

What other factors might be influencing the increase in rat population? Check out the New York City Open Data collection, and import your own data into BigQuery to see if you can figure it out. Then share the results with us using the Today I learned with BigQuery hashtag.

If you’re new to BigQuery:

Sign up, or sign in, to BigQuery today to create and share your own NYC analysis and visualizations. Also, learn more about BigQuery in person at a BigQuery bootcamp and technical sessions at Google Cloud NEXT ‘17 in San Francisco in March:
  • Big Data Solutions

  • Product deep dives, technical comparisons, how-to's and tips and tricks for using the latest data processing and machine learning technologies.

  • Learn More

12 Months FREE TRIAL

Try BigQuery, Machine Learning and other cloud products and get $300 free credit to spend over 12 months.

TRY IT FREE