San Francisco Police Reports Data

How to query public data sets using BigQuery

BigQuery is a fully managed data warehouse and analytics platform. Public datasets are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

To get started using a BigQuery public dataset, create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you should also enable billing.

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Manage resources page

  3. Enable billing for your project.

    Enable billing

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Dataset overview

This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).

You can start exploring this data in the BigQuery console:

Go to SF Police Reports data

Sample queries

Here are some examples of SQL queries you can run on this data in BigQuery.

These samples use BigQuery’s support for standard SQL. Use the #standardSQL tag to let BigQuery know you want to use standard SQL. For more information about the #standardSQL prefix, see Setting a query prefix.

Which category of police incidents have historically been the most common in San Francisco?

This query groups by the category column to show the most common incidents since 2003.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  category,
  count(*) as incident_count
FROM
  `bigquery-public-data.san_francisco.sfpd_incidents`
GROUP BY
  category
ORDER BY
  incident_count DESC
LIMIT
  10

Command-line

bq query --use_legacy_sql=false '
SELECT
  category,
  count(*) as incident_count
FROM
  `bigquery-public-data.san_francisco.sfpd_incidents`
GROUP BY
  category
ORDER BY
  incident_count DESC
LIMIT
  10'

The results are shown here:

+------------------------+-------------------+
| category               | Incident_count    |
+------------------------+-------------------+
| LARCENY/THEFT          | 424697            |
| OTHER OFFENSES         | 287776            |
| NON-CRIMINAL           | 217741            |
| ASSAULT                | 178106            |
| VEHICLE THEFT          | 120175            |
| DRUG/NARCOTIC          | 115311            |
| VANDALISM              | 104786            |
| WARRANTS               | 95207             |
| BURGLARY               | 84335             |
| SUSPICIOUS OCC         | 73339             |
+------------------------+-------------------+

What were the most common police incidents in the category of LARCENY/THEFT in 2016?

Using the most common category of incident from the above query, this next query finds the most common incident within that category in a specific year - 2016.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  descript,
  COUNT(*) AS incident_count_2016
FROM
  `bigquery-public-data.san_francisco.sfpd_incidents`
WHERE
  category="LARCENY/THEFT"
  AND EXTRACT(YEAR FROM timestamp) = 2016
GROUP BY
  descript
ORDER BY
  incident_count_2016 DESC
LIMIT
  10

Command-line

bq query --use_legacy_sql=false '
SELECT
  descript,
  COUNT(*) AS incident_count_2016
FROM
  `bigquery-public-data.san_francisco.sfpd_incidents`
WHERE
  category="LARCENY/THEFT"
  AND EXTRACT(YEAR FROM timestamp) = 2016
GROUP BY
  descript
ORDER BY
  incident_count_2016 DESC
LIMIT
  10'

The results are shown here:

+--------------------------------+----------------------+
| descript                       | Incident_count_2016  |
+--------------------------------+----------------------+
| GRAND THEFT FROM LOCKED AUTO   | 17723                |
| PETTY THEFT OF PROPERTY        | 4403                 |
| PETTY THEFT FROM LOCKED AUTO   | 3993                 |
| GRAND THEFT OF PROPERTY        | 2189                 |
| GRAND THEFT FROM UNLOCKED AUTO | 1912                 |
| PETTY THEFT FROM A BUILDING    | 1856                 |
| PETTY THEFT SHOPLIFTING        | 1697                 |
| GRAND THEFT FROM PERSON        | 1457                 |
| GRAND THEFT FROM A BUILDING    | 1101                 |
| GRAND THEFT PICKPOCKET         | 840                  |
+------------------------+------------------------------+

Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

We can also query for year over year change in number of incidents in any category. This query looks at the biggest year over year change for incidents in the category of NON-CRIMINAL.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  descript,
  COUNTIF(EXTRACT(YEAR FROM timestamp) = 2016) -
  COUNTIF(EXTRACT(YEAR FROM timestamp) = 2015) AS yoy_change,
  COUNTIF(EXTRACT(YEAR FROM timestamp) = 2016) AS count_2016
FROM
  `bigquery-public-data.san_francisco.sfpd_incidents`
WHERE
  category != "NON-CRIMINAL"
GROUP BY
  descript
ORDER BY
  ABS(yoy_change) DESC
LIMIT
  10

Command-line

bq query --use_legacy_sql=false '
SELECT
  descript,
  COUNTIF(EXTRACT(YEAR FROM timestamp) = 2016) -
  COUNTIF(EXTRACT(YEAR FROM timestamp) = 2015) AS yoy_change,
  COUNTIF(EXTRACT(YEAR FROM timestamp) = 2016) AS count_2016
FROM
  `bigquery-public-data.san_francisco.sfpd_incidents`
WHERE
  category != "NON-CRIMINAL"
GROUP BY
  descript
ORDER BY
  ABS(yoy_change) DESC
LIMIT
  10'

The results are shown here:

+-----------------------------------------+------------+-------------+
| descript                                | yoy_change | count_2016 |
+-----------------------------------------+------------+-------------+
| DRIVERS LICENSE, SUSPENDED OR REVOKED   | -1435      | 3374        |
| STOLEN AUTOMOBILE                       | -1392      | 3599        |
| WARRANT ARREST                          | -990       | 3004        |
| MALICIOUS MISCHIEF, VANDALISM           |  957       | 4248        |
| GRAND THEFT FROM UNLOCKED AUTO          | -868       | 1912        |
| TRAFFIC VIOLATION                       |  496       | 1820        |
| TRAFFIC VIOLATION ARREST                |  449       | 2227        |
| CREDIT CARD, THEFT BY USE OF            | -331       | 1351        |
| FOUND PERSON                            | -308       | 1950        |
| PETTY THEFT FROM LOCKED AUTO            | -307       | 3993        |
+-----------------------------------------+------------+-------------+

About the data

Dataset Source: SF OpenData

Category: San Francisco, Crime

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Update Frequency: Daily

View in BigQuery: Go to SF Police Reports data

Send feedback about...