NHTSA Traffic Fatality Data

How to query public data sets using BigQuery

BigQuery is a fully managed data warehouse and analytics platform. Public datasets are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

To get started using a BigQuery public dataset, create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you should also enable billing.

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Manage resources page

  3. Enable billing for your project.

    Enable billing

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Dataset overview

This public dataset was created by the Unites States Department of Transportation's National Highway Traffic Safety Administration (NHTSA) and includes 20 tables that describe numerous aspects of traffic accidents that resulted in fatalities. Aspects of traffic accidents include: the types of cars and roads, the maneuvers that preceded the accident, and the involvement of pedestrians and cyclists.

You can start exploring this data in the BigQuery console:

Go to NHTSA traffic fatality data

Sample queries

Here are some examples of SQL queries you can run on this data in BigQuery.

These samples use BigQuery’s support for standard SQL. Use the #standardSQL tag to let BigQuery know you want to use standard SQL. For more information about the #standardSQL prefix, see Setting a query prefix.

Which states had the most fatal accidents in 2015?

NHTSA assigns a unique index number to each accident, found in the consecutive_number column in most of the dataset's tables. This column can be used for counting unique incidents and for joining across tables. In the following query, you count the number of accidents per state, tally the number of fatalities, and then calculate a fatalities-per-accident metric.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  state_name,
  COUNT(consecutive_number) AS accidents,
  SUM(number_of_fatalities) AS fatalities,
  SUM(number_of_fatalities)/COUNT(consecutive_number) AS fatalities_per_accident
FROM
  `bigquery-public-data.nhtsa_traffic_fatalities.accident_2015`
GROUP BY
  state_name
ORDER BY
  fatalities_per_accident DESC

Command-line

bq query --use_legacy_sql=false '
SELECT
  state_name,
  COUNT(consecutive_number) AS accidents,
  SUM(number_of_fatalities) AS fatalities,
  SUM(number_of_fatalities)/COUNT(consecutive_number) AS fatalities_per_accident
FROM
  `bigquery-public-data.nhtsa_traffic_fatalities.accident_2015`
GROUP BY
  state_name
ORDER BY
  fatalities_per_accident DESC'

Sample results are shown here:

+----------------+----------------+-------------+--------------------------+
| state_name     | accidents      | fatalities  | fatalities_per_accident  |
+----------------+----------------+-------------+--------------------------+
| North Dakota   | 111            | 131         | 1.18018018               |
| South Dakota   | 115            | 133         | 1.156521739              |
| Vermont        | 50             | 57          | 1.14                     |
| Iowa           | 282            | 320         | 1.134751773              |
| Nebraska       | 218            | 246         | 1.128440367              |
| Texas          | 3124           | 3516        | 1.125480154              |
| Arkansas       | 472            | 531         | 1.125                    |
| Wyoming        | 129            | 145         | 1.124031008              |
| Mississippi    | 604            | 677         | 1.120860927              |
| New Mexico     | 269            | 298         | 1.107806691              |
+----------------+----------------+-------------+--------------------------+

Click the following link to see a sample visualization of this data in Google Data Studio.

Which non-occupant actions were the most dangerous?

The dataset includes information on injuries and fatalities for people who were not the occupant of a vehicle but were harmed in the accident nonetheless. It also includes a table for the actions of non-occupants surrounding an accident leading up to the event. By joining these tables, you can determine which pre-accident actions were most frequently associated with injuries and deaths.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  a.injury_severity_name AS severity,
  b.non_motorist_action_circumstances_name AS prior_action,
  COUNT(a.person_number) AS total
FROM
  `bigquery-public-data.nhtsa_traffic_fatalities.person_2015` a
JOIN
  `bigquery-public-data.nhtsa_traffic_fatalities.nmprior_2015` b
ON
  a.person_number=b.person_number
  AND a.consecutive_number=b.consecutive_number
WHERE
  a.injury_severity = 3
  OR a.injury_severity = 4
GROUP BY
  severity,
  prior_action
ORDER BY
  total DESC

Command-line

bq query --use_legacy_sql=false '
SELECT
  a.injury_severity_name AS severity,
  b.non_motorist_action_circumstances_name AS prior_action,
  COUNT(a.person_number) AS total
FROM
  `bigquery-public-data.nhtsa_traffic_fatalities.person_2015` a
JOIN
  `bigquery-public-data.nhtsa_traffic_fatalities.nmprior_2015` b
ON
  a.person_number=b.person_number
  AND a.consecutive_number=b.consecutive_number
WHERE
  a.injury_severity = 3
  OR a.injury_severity = 4
GROUP BY
  severity,
  prior_action
ORDER BY
  total DESC'

Sample results are shown here:

+-------------------+-------------------------------------------------------------------------+----------------+
| severity          | prior_action                                                            | total          |
+-------------------+-------------------------------------------------------------------------+----------------+
| Fatal Injury (K)  | Crossing Roadway                                                        | 3429           |
| Fatal Injury (K)  | Movement Along Roadway with Traffic (In or Adjacent to Travel Lane)     | 889            |
| Fatal Injury (K)  | In Roadway-Other (Working, Playing, etc.)                               | 739            |
| Fatal Injury (K)  | Disabled Vehicle Related (Working on, Pushing, Leaving/Approaching)     | 299            |
| Fatal Injury (K)  | Adjacent to Roadway (e.g., Shoulder, Median)                            | 249            |
| Fatal Injury (K)  | Other                                                                   | 248            |
| Fatal Injury (K)  | Unknown                                                                 | 234            |
| Fatal Injury (K)  | Movement Along Roadway Against Traffic (In or Adjacent to Travel Lane)  | 230            |
| Fatal Injury (K)  | Movement Along Roadway – Direction Unknown (Since 2012)                 | 119            |
+-------------------+-------------------------------------------------------------------------+----------------+

In the results of this query, you can see that "crossing the road" resulted in more fatalities than the next several causes combined. Also of note is that walking beside a roadway with the direction of traffic, instead of against the direction of traffic, resulted in almost four times more fatalities. There are no numbers available on how often people engage in these various activities so you can't assign a risk evaluation using these results.

What are the fatal accident rates per capita, state by state, for distracted driving?

What are the rates of distracted driving fatalities on a state by state basis? In this query, you join tables from two other public data sets: the US Census Bureau zipcode-by-zipcode population information and another table that maps zip codes to state names. This allows you to show the rates of fatal accidents for several distraction criteria per capita. While this is a more complex query, it demonstrates the power of linking a variety of public datasets together to achieve insights not necessarily available in a single dataset.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  COUNT(a.consecutive_number) AS total,
  b.driver_distracted_by_name AS distraction,
  a.state_name AS state,
  c.state_pop AS population,
  ROUND((COUNT(a.consecutive_number) / c.state_pop) * 100000, 3) AS rate_per_100000
FROM
  `bigquery-public-data.nhtsa_traffic_fatalities.accident_2015` a
JOIN
  `bigquery-public-data.nhtsa_traffic_fatalities.distract_2015` b
ON
  a.consecutive_number = b.consecutive_number
JOIN (
  SELECT
    SUM(d.population) AS state_pop,
    e.state_name AS state
  FROM
    `bigquery-public-data.census_bureau_usa.population_by_zip_2010` d
  JOIN
    `bigquery-public-data.utility_us.zipcode_area` e
  ON
    d.zipcode=e.zipcode
  GROUP BY
    state ) c
ON
  c.state=a.state_name
WHERE
  b.driver_distracted_by_name != 'Not Distracted'
  AND b.driver_distracted_by_name != 'Unknown if Distracted'
  AND b.driver_distracted_by_name != 'Not Reported'
GROUP BY
  distraction,
  state,
  population,
  c.state_pop
ORDER BY
  rate_per_100000 DESC

Command-line

bq query --use_legacy_sql=false '
SELECT
  COUNT(a.consecutive_number) AS total,
  b.driver_distracted_by_name AS distraction,
  a.state_name AS state,
  c.state_pop AS population,
  ROUND((COUNT(a.consecutive_number) / c.state_pop) * 100000, 3) AS rate_per_100000
FROM
  `bigquery-public-data.nhtsa_traffic_fatalities.accident_2015` a
JOIN
  `bigquery-public-data.nhtsa_traffic_fatalities.distract_2015` b
ON
  a.consecutive_number = b.consecutive_number
JOIN (
  SELECT
    SUM(d.population) AS state_pop,
    e.state_name AS state
  FROM
    `bigquery-public-data.census_bureau_usa.population_by_zip_2010` d
  JOIN
    `bigquery-public-data.utility_us.zipcode_area` e
  ON
    d.zipcode=e.zipcode
  GROUP BY
    state ) c
ON
  c.state=a.state_name
WHERE
  b.driver_distracted_by_name != 'Not Distracted'
  AND b.driver_distracted_by_name != 'Unknown if Distracted'
  AND b.driver_distracted_by_name != 'Not Reported'
GROUP BY
  distraction,
  state,
  population,
  c.state_pop
ORDER BY
  rate_per_100000 DESC'

Sample results are shown here:

+-----------+-----------------------------------------------------+----------------+-------------+---------------------+
| total     | distraction                                         | state          | population  | rate_per_100000     |
+-----------+-----------------------------------------------------+----------------+-------------+---------------------+
| 117       | Inattention (Inattentive), Details Unknown          | New Mexico     | 6161043     | 1.8990291091946607  |
| 149       | Inattention (Inattentive), Details Unknown          | Kentucky       | 12977133    | 1.1481734833109902  |
| 50        | Inattention (Inattentive), Details Unknown          | Kansas         | 8551884     | 0.5846664898635202  |
| 131       | Inattention (Inattentive), Details Unknown          | New Jersey     | 26375682    | 0.496669621661347   |
| 23        | Inattention (Inattentive), Details Unknown          | Idaho          | 4702578     | 0.4890934291786335  |
| 14        | Looked But Did Not See                              | Montana        | 2965035     | 0.4721698057527146  |
| 106       | Other Distraction                                   | Virginia       | 24003717    | 0.44159827413396013 |
| ...                                                                                                                  |
+-----------+-----------------------------------------------------+----------------+-------------+---------------------+

While the query results provide interesting information, it might be more valuable to re-run the query without the distraction type in the SELECT and GROUP BY statements. Removing the distraction type leaves you with a breakdown of the most dangerous states per capita for distracted driving. Sample results for the modified query are shown here:

+-----------+----------------+-------------+---------------------+
| total     | state          | population  | rate_per_100000     |
+-----------+----------------+-------------+---------------------+
| 128       | New Mexico     | 6161043     | 2.077570307         |
| 207       | Kentucky       | 12977133    | 1.595113497         |
| 43        | Montana        | 2965035     | 1.450235832         |
| 152       | Louisiana      | 13598490    | 1.117771164         |
| 16        | Wyoming        | 1683354     | 0.9504833802        |
| 14        | Kansas         | 8551884     | 0.8769997348        |
| 75        | Washington     | 19240062    | 0.8264006634        |
| ...                                                            |
+-----------+----------------+-------------+---------------------+

About the data

Dataset Source: NHTSA

Category: Science

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Update Frequency: Annual

View in BigQuery: Go to NHTSA traffic fatality data

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...