NYPD Motor Vehicle Collisions Data

How to query public data sets using BigQuery

BigQuery is a fully managed data warehouse and analytics platform. Public datasets are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

To get started using a BigQuery public dataset, create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you should also enable billing.

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Manage resources page

  3. Enable billing for your project.

    Enable billing

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Dataset overview

This dataset includes details of Motor Vehicle Collisions in New York City provided by the Police Department (NYPD) from 2012 to the present. This dataset is updated daily.

You can start exploring this data in the BigQuery console:

Go to NYPD Motor Vehicle Collisions Data

Sample queries

Here are some examples of SQL queries you can run on this data in BigQuery.

These samples use BigQuery’s support for standard SQL. Use the #standardSQL tag to let BigQuery know you want to use standard SQL. For more information about the #standardSQL prefix, see Setting a query prefix.

What is the most common factor in a motor vehicle collision in NYC?

This query counts the number of collisions for each collision type, in descending order.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  contributing_factor_vehicle_1 AS collision_factor,
  COUNT(*) num_collisions
FROM
  `bigquery-public-data.new_york.nypd_mv_collisions`
WHERE
  contributing_factor_vehicle_1 != "Unspecified"
  AND contributing_factor_vehicle_1 != ""
GROUP BY
  1
ORDER BY
  num_collisions DESC

Command-line

bq query '
#standardSQL
SELECT
  contributing_factor_vehicle_1 AS collision_factor,
  COUNT(*) num_collisions
FROM
  `bigquery-public-data.new_york.nypd_mv_collisions`
WHERE
  contributing_factor_vehicle_1 != "Unspecified"
  AND contributing_factor_vehicle_1 != ""
GROUP BY
  1
ORDER BY
  num_collisions DESC
'

Driver inattention and/or distraction is by far the most common factor:

+-------------------------------------------------------+----------------+
|                   collision_factor                    | num_collisions |
+-------------------------------------------------------+----------------+
| Driver Inattention/Distraction                        |         117312 |
| Fatigued/Drowsy                                       |          48199 |
| Failure to Yield Right-of-Way                         |          39752 |
| Other Vehicular                                       |          29323 |
| Backing Unsafely                                      |          25833 |
| Turning Improperly                                    |          22659 |
| Lost Consciousness                                    |          20036 |
| Prescription Medication                               |          15881 |
| Traffic Control Disregarded                           |          11414 |
| Driver Inexperience                                   |          11074 |
                                ...
+-------------------------------------------------------+----------------+

What are the most dangerous streets for motor vehicle collisions in NYC?

This query counts the number of fatalities by streets.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  on_street_name,
  SUM(number_of_persons_killed) AS deaths
FROM
  `bigquery-public-data.new_york.nypd_mv_collisions`
WHERE
  on_street_name <> ''
GROUP BY
  on_street_name
ORDER BY
  deaths DESC
LIMIT
  10

Command-line

bq query '
#standardSQL
SELECT
  on_street_name,
  SUM(number_of_persons_killed) AS deaths
FROM
  `bigquery-public-data.new_york.nypd_mv_collisions`
WHERE
  on_street_name <> ""
GROUP BY
  on_street_name
ORDER BY
  deaths DESC
LIMIT
  10
'

The results are shown here:

+--------------------+--------+
|   on_street_name   | deaths |
+--------------------+--------+
| BROADWAY           |     21 |
| ATLANTIC AVENUE    |     16 |
| HYLAN BOULEVARD    |     15 |
| LINDEN BOULEVARD   |     14 |
| ROCKAWAY BOULEVARD |     12 |
| GRAND CONCOURSE    |     12 |
| FLATBUSH AVENUE    |     11 |
| BRUCKNER BOULEVARD |     11 |
| 3 AVENUE           |     11 |
| NORTHERN BOULEVARD |     10 |
+--------------------+--------+

About the data

Dataset Source: NYC Open Data

Category: New York City

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Update Frequency: Daily

View in BigQuery: Go to NYPD Motor Vehicle Collisions Data

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...