Chicago Crime Data

How to query public data sets using BigQuery

BigQuery is a fully managed data warehouse and analytics platform. Public datasets are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

To get started using a BigQuery public dataset, create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you should also enable billing.

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Manage resources page

  3. Enable billing for your project.

    Enable billing

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Dataset overview

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified.

This data includes unverified reports supplied to the Police Department. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information. The information should not be used for comparison purposes over time.

You can start exploring this data in the BigQuery console:

Go to Chicago crime data

Sample queries

Here are some examples of SQL queries you can run on this data in BigQuery.

These samples use BigQuery’s support for standard SQL. Use the #standardSQL tag to let BigQuery know you want to use standard SQL. For more information about the #standardSQL prefix, see Setting a query prefix.

What categories of crime exhibited the greatest year-over-year increase between 2015 and 2016?

This query summarizes the number of arrests by year for 2015 and 2016 and the percentage increase between them. The query limits the dataset to crime classifications with more than 100 arrests in 2015 and shows the top 10 rates of increase.

To see the categories with the greatest decline in arrests, invert the sort by removing the DESC keyword.

Web UI

Open the following query in the Web UI

#standardSQL

standardSQL

SELECT primary_type, description, COUNTIF(year = 2015) AS arrests_2015, COUNTIF(year = 2016) AS arrests_2016, FORMAT('%3.2f', (COUNTIF(year = 2016) - COUNTIF(year = 2015)) / COUNTIF(year = 2015)*100) AS pct_change_2015_to_2016 FROM `bigquery-public-data.chicago_crime.crime` WHERE arrest = TRUE AND year IN (2015, 2016) GROUP BY primary_type, description HAVING COUNTIF(year = 2015) > 100 ORDER BY (COUNTIF(year = 2016) - COUNTIF(year = 2015)) / COUNTIF(year = 2015) DESC

Command-line

bq query --use_legacy_sql=false '
SELECT
  primary_type,
  description,
  COUNTIF(year = 2015) AS arrests_2015,
  COUNTIF(year = 2016) AS arrests_2016,
  FORMAT("%3.2f", (COUNTIF(year = 2016) - COUNTIF(year = 2015)) / COUNTIF(year = 2015)*100) AS pct_change_2015_to_2016
FROM
  `bigquery-public-data.chicago_crime.crime`
WHERE
  arrest = TRUE
  AND year IN (2015,
    2016)
GROUP BY
  primary_type,
  description
HAVING
  COUNTIF(year = 2015) > 100
ORDER BY
  (COUNTIF(year = 2016) - COUNTIF(year = 2015)) / COUNTIF(year = 2015) DESC'

Sample results are shown here:

+----------------------------------+-------------------------------------------------+--------------+--------------+-------------------------+
|           primary_type           |                   description                   | arrests_2015 | arrests_2016 | pct_change_2015_to_2016 |
+----------------------------------+-------------------------------------------------+--------------+--------------+-------------------------+
| OTHER OFFENSE                    | VEHICLE TITLE/REG OFFENSE                       |          288 |          418 | 45.14                   |
| OTHER OFFENSE                    | FALSE/STOLEN/ALTERED TRP                        |          299 |          418 | 39.80                   |
| NARCOTICS                        | FOUND SUSPECT NARCOTICS                         |          651 |          842 | 29.34                   |
| HOMICIDE                         | FIRST DEGREE MURDER                             |          164 |          201 | 22.56                   |
| ASSAULT                          | AGGRAVATED: HANDGUN                             |          468 |          521 | 11.32                   |
| MOTOR VEHICLE THEFT              | AUTOMOBILE                                      |          748 |          830 | 10.96                   |
| OTHER OFFENSE                    | OTHER VEHICLE OFFENSE                           |          186 |          200 | 7.53                    |
| LIQUOR LAW VIOLATION             | LIQUOR LICENSE VIOLATION                        |          134 |          144 | 7.46                    |
| ROBBERY                          | ARMED: HANDGUN                                  |          227 |          242 | 6.61                    |
| MOTOR VEHICLE THEFT              | THEFT/RECOVERY: AUTOMOBILE                      |          164 |          171 | 4.27                    |
| BATTERY                          | AGGRAVATED DOMESTIC BATTERY: OTHER DANG WEAPON  |          251 |          256 | 1.99                    |
| WEAPONS VIOLATION                | UNLAWFUL POSS OF HANDGUN                        |         2079 |         2075 | -0.19                   |
| THEFT                            | RETAIL THEFT                                    |         4932 |         4851 | -1.64                   |
| CRIMINAL TRESPASS                | TO VEHICLE                                      |          245 |          239 | -2.45                   |
| ROBBERY                          | STRONGARM - NO WEAPON                           |          353 |          338 | -4.25                   |
| THEFT                            | FROM BUILDING                                   |          148 |          141 | -4.73                   |
| BURGLARY                         | FORCIBLE ENTRY                                  |          367 |          348 | -5.18                   |
| ...                                                                                                                                        |
+----------------------------------+-------------------------------------------------+--------------+--------------+-------------------------+

Which month generally has the greatest number of motor vehicle thefts?

The following query summarizes the number of MOTOR VEHICLE THEFT incidents for each year and month, and ranks the month’s total from 1 to 12. Then, the outer SELECT clause limits the final result set to the first overall ranking for each year. According to the data, in 3 of the past 10 years, December had the highest number of car thefts.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  year,
  month,
  incidents
FROM (
  SELECT
    year,
    EXTRACT(MONTH
    FROM
      date) AS month,
    COUNT(1) AS incidents,
    RANK() OVER (PARTITION BY year ORDER BY COUNT(1) DESC) AS ranking
  FROM
    `bigquery-public-data.chicago_crime.crime`
  WHERE
    primary_type = 'MOTOR VEHICLE THEFT'
    AND year <= 2016
  GROUP BY
    year,
    month )
WHERE
  ranking = 1
ORDER BY
  year DESC

Command-line

bq query --use_legacy_sql=false '
SELECT
  year,
  month,
  incidents
FROM (
  SELECT
    year,
    EXTRACT(MONTH
    FROM
      date) AS month,
    COUNT(1) AS incidents,
    RANK() OVER (PARTITION BY year ORDER BY COUNT(1) DESC) AS ranking
  FROM
    `bigquery-public-data.chicago_crime.crime`
  WHERE
    primary_type = 'MOTOR VEHICLE THEFT'
    AND year <= 2016
  GROUP BY
    year,
    month )
WHERE
  ranking = 1
ORDER BY
  year DESC'

Sample results are shown here:

+--------+--------+-----------+
| year   | month  | incidents |
+--------+--------+-----------+
| 2016   | 12     | 1112      |
| 2015   | 8      | 966       |
| 2014   | 10     | 922       |
| 2013   | 1      | 1470      |
| 2012   | 6      | 1469      |
| 2011   | 1      | 1862      |
| 2010   | 12     | 1880      |
| 2009   | 12     | 1539      |
| 2008   | 7      | 2015      |
| 2007   | 10     | 1709      |
| ...                         |
+--------+--------+-----------+

How does temperature affect the incident rate of violent crime (assault or battery)?

In this query, the Chicago Crime Data is joined with the NOAA Global Surface Summary of the Day Weather Data so we can identify the impact of extreme cold or hot weather on incidents of violent crime. Each day is ranked from coldest to hottest within each year, the incident rates for the coldest and hottest 10% of days in the year are bucketed, and the rates are compared against the middle 80%.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  year,
  coldest_days,
  incidents_coldest,
  ROUND(incidents_coldest / coldest_days, 2) AS rate_coldest,
  middle_days,
  incidents_middle,
  ROUND(incidents_middle / middle_days, 2) AS rate_middle,
  hottest_days,
  incidents_hottest,
  ROUND(incidents_hottest / hottest_days, 2) AS rate_hottest
FROM (
  SELECT
    w.year AS year,
    COUNTIF(rank <= 0.10) AS coldest_days,
    SUM(IF(rank <= 0.10, c.incidents, 0)) AS incidents_coldest,
    COUNTIF(rank > 0.10
      AND rank <= 90) AS middle_days,
    SUM(IF(rank > 0.10
        AND rank <= 90, c.incidents, 0)) AS incidents_middle,
    COUNTIF(rank > 0.90
      AND rank <= 90) AS hottest_days,
    SUM(IF(rank > 0.90, c.incidents, 0)) AS incidents_hottest
  FROM (
    SELECT
      TIMESTAMP_TRUNC(TIMESTAMP(CONCAT(year, '-', mo, '-', da)),
        DAY) AS date,
      year,
      PERCENT_RANK() OVER (PARTITION BY year ORDER BY AVG(temp)) AS rank,
      AVG(temp) AS temperature
    FROM
      `bigquery-public-data.noaa_gsod.gsod*` AS w
    JOIN
      `bigquery-public-data.noaa_gsod.stations` AS s
    ON
      w.stn = s.usaf
      AND s.name = "CHICAGO O'HARE INTERNATIONAL"
    WHERE
      w._TABLE_SUFFIX BETWEEN '2014'
      AND '2016'
    GROUP BY
      date,
      year ) AS w
  LEFT JOIN (
    SELECT
      TIMESTAMP_TRUNC(date,
        DAY) AS date,
      year,
      COUNT(1) AS incidents
    FROM
      `bigquery-public-data.chicago_crime.crime`
    WHERE
      primary_type IN ('BATTERY',
        'ASSAULT')
      AND year BETWEEN 2014
      AND 2016
    GROUP BY
      date,
      year ) AS c
  ON
    w.date = c.date
  GROUP BY
    year)
ORDER BY
  year DESC

Command-line

bq query --use_legacy_sql=false '
SELECT
  year,
  coldest_days,
  incidents_coldest,
  ROUND(incidents_coldest / coldest_days, 2) AS rate_coldest,
  middle_days,
  incidents_middle,
  ROUND(incidents_middle / middle_days, 2) AS rate_middle,
  hottest_days,
  incidents_hottest,
  ROUND(incidents_hottest / hottest_days, 2) AS rate_hottest
FROM (
  SELECT
    w.year AS year,
    COUNTIF(rank <= 0.10) AS coldest_days,
    SUM(IF(rank <= 0.10, c.incidents, 0)) AS incidents_coldest,
    COUNTIF(rank > 0.10
      AND rank <= 90) AS middle_days,
    SUM(IF(rank > 0.10
        AND rank <= 90, c.incidents, 0)) AS incidents_middle,
    COUNTIF(rank > 0.90
      AND rank <= 90) AS hottest_days,
    SUM(IF(rank > 0.90, c.incidents, 0)) AS incidents_hottest
  FROM (
    SELECT
      TIMESTAMP_TRUNC(TIMESTAMP(CONCAT(year, '-', mo, '-', da)),
        DAY) AS date,
      year,
      PERCENT_RANK() OVER (PARTITION BY year ORDER BY AVG(temp)) AS rank,
      AVG(temp) AS temperature
    FROM
      `bigquery-public-data.noaa_gsod.gsod*` AS w
    JOIN
      `bigquery-public-data.noaa_gsod.stations` AS s
    ON
      w.stn = s.usaf
      AND s.name = "CHICAGO O'HARE INTERNATIONAL"
    WHERE
      w._TABLE_SUFFIX BETWEEN '2014'
      AND '2016'
    GROUP BY
      date,
      year ) AS w
  LEFT JOIN (
    SELECT
      TIMESTAMP_TRUNC(date,
        DAY) AS date,
      year,
      COUNT(1) AS incidents
    FROM
      `bigquery-public-data.chicago_crime.crime`
    WHERE
      primary_type IN ('BATTERY',
        'ASSAULT')
      AND year BETWEEN 2014
      AND 2016
    GROUP BY
      date,
      year ) AS c
  ON
    w.date = c.date
  GROUP BY
    year)
ORDER BY
  year DESC'

Sample results are shown here:

+-------+---------------+--------------------+---------------+--------------+-------------------+--------------+---------------+--------------------+---------------+
| year  | coldest_days  | incidents_coldest  | rate_coldest  | middle_days  | incidents_middle  | rate_middle  | hottest_days  | incidents_hottest  | rate_hottest  |
+-------+---------------+--------------------+---------------+--------------+-------------------+--------------+---------------+--------------------+---------------+
| 2016  | 37            | 5415               | 146.35        | 329          | 63575             | 193.24       | 37            | 7569               | 204.57        |
| 2015  | 37            | 5232               | 141.41        | 328          | 60716             | 185.11       | 37            | 7690               | 207.84        |
| 2014  | 37            | 5120               | 138.38        | 328          | 61222             | 186.65       | 36            | 7701               | 213.92        |
+-------+---------------+--------------------+---------------+--------------+-------------------+--------------+---------------+--------------------+---------------+

To see this visually, alter the query to display the daily figures over the three year sampling period. You can use this query to create a report in Google Data Studio.

For a tutorial on using Google Data Studio with BigQuery, see Visualizing BigQuery Data Using Google Data Studio.

The altered query looks like the following.

Web UI

#standardSQL
SELECT
  w.date AS date,
  temperature,
  incidents
FROM (
  SELECT
    TIMESTAMP_TRUNC(TIMESTAMP(CONCAT(year, '-', mo, '-', da)),
      DAY) AS date,
    year,
    PERCENT_RANK() OVER (PARTITION BY year ORDER BY AVG(temp)) AS rank,
    AVG(temp) AS temperature
  FROM
    `bigquery-public-data.noaa_gsod.gsod*` AS w
  JOIN
    `bigquery-public-data.noaa_gsod.stations` AS s
  ON
    w.stn = s.usaf
    AND s.name = "CHICAGO O'HARE INTERNATIONAL"
  WHERE
    w._TABLE_SUFFIX BETWEEN '2014'
    AND '2016'
  GROUP BY
    date,
    year ) AS w
LEFT JOIN (
  SELECT
    TIMESTAMP_TRUNC(date,
      DAY) AS date,
    year,
    COUNT(1) AS incidents
  FROM
    `bigquery-public-data.chicago_crime.crime`
  WHERE
    primary_type IN ('BATTERY',
      'ASSAULT')
    AND year BETWEEN 2014
    AND 2016
  GROUP BY
    date,
    year ) AS c
ON
  w.date = c.date
ORDER BY
  Date

Command-line

bq query --use_legacy_sql=false '
SELECT
  w.date AS date,
  temperature,
  incidents
FROM (
  SELECT
    TIMESTAMP_TRUNC(TIMESTAMP(CONCAT(year, '-', mo, '-', da)),
      DAY) AS date,
    year,
    PERCENT_RANK() OVER (PARTITION BY year ORDER BY AVG(temp)) AS rank,
    AVG(temp) AS temperature
  FROM
    `bigquery-public-data.noaa_gsod.gsod*` AS w
  JOIN
    `bigquery-public-data.noaa_gsod.stations` AS s
  ON
    w.stn = s.usaf
    AND s.name = "CHICAGO O'HARE INTERNATIONAL"
  WHERE
    w._TABLE_SUFFIX BETWEEN '2014'
    AND '2016'
  GROUP BY
    date,
    year ) AS w
LEFT JOIN (
  SELECT
    TIMESTAMP_TRUNC(date,
      DAY) AS date,
    year,
    COUNT(1) AS incidents
  FROM
    `bigquery-public-data.chicago_crime.crime`
  WHERE
    primary_type IN ('BATTERY',
      'ASSAULT')
    AND year BETWEEN 2014
    AND 2016
  GROUP BY
    date,
    year ) AS c
ON
  w.date = c.date
ORDER BY
  Date'

Crime by temperature

The trend is clear from the plot. Other than January 1st (which is represented by the three outliers in the less than 25 degrees quadrant), lower temperatures are generally associated with lower rates of violent crime.

About the data

Dataset Source: City of Chicago

Category: Chicago, Public Safety

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofchicago.org - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Update Frequency: Daily

View in BigQuery: Go to Chicago crime data

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...