Google Cloud Platform
New York City public datasets now available on Google BigQuery
This rich dataset makes it easy to learn how to explore and visualize data using BigQuery.
New York City is home to 8.5 million residents, and more than 50 million people visit this vibrant and dynamic city each year. With so many sights and sounds, it’s easy to get lost in the details, and lose sight of the big picture: How do New Yorkers actually survive in the “concrete jungle?”
Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:
- Over 8 million 311 service requests from 2012-2016 (updated daily)
- More than 1 million motor vehicle collisions 2012-present (updated regularly)
- Citi Bike stations and 30 million Citi Bike trips 2013-present (updated regularly)
- Over 1 billion Yellow and Green Taxi rides from 2009-present (updated regularly)
- Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015
On which New York City streets are you most likely to find a loud party?
If there's something strange in your neighborhood, the right number to call is 311; created specifically for non-emergency municipal inquiries and non-urgent community concerns. What does that include?
The graph below shows the top five reasons why New Yorkers call 311 over the past 4 years.
SELECT
Extract(YEAR from created_date) AS year,
REPLACE(UPPER(complaint_type),
"HEATING", "HEAT/HOT WATER") as complaint,
COUNT(*) AS count
FROM
`bigquery-public-data-staging.new_york.311_service_requests_all`
GROUP BY complaint, year
ORDER BY COUNT DESC
LIMIT 1000
(To run this query yourself, you can copy/paste the above SQL into BigQuery, or follow this link to my shared query.)
Call volume tells us that it gets noisy in New York, and it also gets very cold. By joining the 311 calls to the NOAA GSOD weather table, we confirm that most calls about faulty heat and hot water happen when the temperature drops — while noise remains a constant annoyance.
There were also 267,887 calls about dead, damaged or dying trees, so you might wonder if there are any healthy trees left in NYC.
Can you find the Virginia Pines in New York City?
The Decennial NYC tree surveys from 1995, 2005, and 2015 are all available in BigQuery, and the preliminary data from 2015 so far found the London Planetrees, Honeylocusts and Callery Pears represented almost a third of all trees outside of parks.
Where was the only collision caused by an animal that injured a cyclist?
There’s a lot of traffic in New York, and while the number of accidents has slowly increased each year, the number of injuries has remained fairly consistent. Fortunately, the number of deaths has dropped by an average of 9% each year.
As you can see below, “Driver Inattention/Distraction” is the most likely cause of accident and injury, but disregarding traffic control (such as running a red light) is the most common cause of death.
The following graphs show that most traffic accidents happen in Brooklyn, but it’s Midtown and Downtown Manhattan that have the highest concentration of collisions — and Staten Island the highest proportion of deaths per accident.
With motor vehicle accidents resulting in 6 motorist deaths for each cyclist death (and no Citi Bike rider deaths), you might be safer taking a Citi Bike.
What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?
Comparing the average duration of 5 of the most popular Citi Bike routes, to taxi journeys beginning and ending within an approximately 50-meter radius of the corresponding Citi Bike stations, we see that for trips under 10 minutes there’s not much difference between taking a taxi or riding a bike.
Next steps
There are countless ways to slice, dice, join and visualize this data, and we’re just getting started.If you’re new to BigQuery, here are some concepts to keep in mind while working with the New York City datasets:
- With BigQuery, everyone gets one terabyte at no charge every month to run queries. If you've never tried BigQuery before, follow these getting started instructions.
- SQL not enough? Learn how BigQuery allows you to run arbitrary JavaScript code inside SQL to enable a full range of possibilities.