San Francisco Fire Department Service Calls Data

How to query public data sets using BigQuery

BigQuery is a fully managed data warehouse and analytics platform. Public datasets are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

To get started using a BigQuery public dataset, create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you should also enable billing.

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Manage resources page

  3. Enable billing for your project.

    Enable billing

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Dataset overview

This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.

You can start exploring this data in the BigQuery console:

Go to SF Fire Department Service Calls data

Sample queries

Here are some examples of SQL queries you can run on this data in BigQuery.

These samples use BigQuery’s support for standard SQL. Use the #standardSQL tag to let BigQuery know you want to use standard SQL. For more information about the #standardSQL prefix, see Setting a query prefix.

What are the top 10 incident types that the San Francisco Fire Department responds to?

This query uses the call_type column to find the top 10 incidents that the San Francisco Fire Department responded to for the life of this dataset (since April 2000).

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  call_type,
  COUNT(*) AS call_type_count
FROM
  `bigquery-public-data.san_francisco.sffd_service_calls`
WHERE
  call_type != ''
GROUP BY
  call_type
ORDER BY
  call_type_count DESC
LIMIT
  10

Command-line

bq query --use_legacy_sql=false '
SELECT
  call_type,
  COUNT(*) AS call_type_count
FROM
  `bigquery-public-data.san_francisco.sffd_service_calls`
WHERE
  call_type != ''
GROUP BY
  call_type
ORDER BY
  call_type_count DESC
LIMIT
  10'

The results are shown here:

+-------------------------------+-------------------+
| call_type                     | call_type_count   |
+-------------------------------+-------------------+
| Medical Incident              | 2730241           |
| Structure Fire                | 579746            |
| Alarms                        | 455460            |
| Traffic Collision             | 167331            |
| Other                         | 67993             |
| Citizen Assist / Service Call | 64829             |
| Outside Fire                  | 48482             |
| Administrative                | 30131             |
| Vehicle Fire                  | 20560             |
| Water Rescue                  | 19371             |
+-------------------------------+-------------------+

How many medical incidents and structure fires are there in each neighborhood?

This query used the call_type and neighborhood_district columns to group responses to medical incidents and structure fires by location.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  neighborhood_district,
  COUNTIF(call_type = "Medical Incident") AS medical_incident_count,
  COUNTIF(call_type = "Structure Fire") AS structure_fire_count,
  Count(*) as total_count
FROM
  `bigquery-public-data.san_francisco.sffd_service_calls`
GROUP BY
  neighborhood_district
ORDER BY
  total_count DESC

Command-line

bq query --use_legacy_sql=false '
SELECT
  neighborhood_district,
  COUNTIF(call_type = "Medical Incident") AS medical_incident_count,
  COUNTIF(call_type = "Structure Fire") AS structure_fire_count,
  Count(*) as total_count
FROM
  `bigquery-public-data.san_francisco.sffd_service_calls`
GROUP BY
  neighborhood_district
ORDER BY
  total_count DESC'

The results are shown here:

+---------------------------------+------------------------+----------------------+-------------+
| neighborhood_district           | medical_incident_count | structure_fire_count | total_count |
+---------------------------------+------------------------+----------------------+-------------+
| Tenderloin                      | 417221                 | 60026                | 560393      |
| South of Market                 | 292262                 | 41835                | 405339      |
| Mission                         | 262486                 | 59086                | 391954      |
| Financial District/South Beach  | 152376                 | 31416                | 286230      |
| Bayview Hunters Point           | 145772                 | 38783                | 233917      |
                                       ...
+---------------------------------+------------------------+----------------------+-------------+

What’s the average response time for each type of dispatched vehicle?

This query uses the on_scene_timestamp and recieved_timestamp columns to determine latency for each unit_type dispatched to a call. Try joining this query with our NOAA weather data to see how weather impacts latency.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  unit_type,
  ROUND(AVG(TIMESTAMP_DIFF(on_scene_timestamp, received_timestamp, MINUTE)), 2)
    as latency,
  Count(*) as total_count
FROM
  `bigquery-public-data.san_francisco.sffd_service_calls`
WHERE
  EXTRACT(DATE from received_timestamp) = EXTRACT(DATE from on_scene_timestamp)
GROUP BY
  unit_type
ORDER BY
  latency ASC

Command-line

bq query --use_legacy_sql=false '
SELECT
  unit_type,
  ROUND(AVG(TIMESTAMP_DIFF(on_scene_timestamp, received_timestamp, MINUTE)), 2)
    as latency,
  Count(*) as total_count
FROM
  `bigquery-public-data.san_francisco.sffd_service_calls`
WHERE
  EXTRACT(DATE from received_timestamp) = EXTRACT(DATE from on_scene_timestamp)
GROUP BY
  unit_type
ORDER BY
  latency ASC'

The results are shown here:

+--------------+----------+-------------+
| unit_type    | latency  | total_count |
+--------------+----------+-------------+
| ENGINE       | 5.91     | 1237866     |
| TRUCK        | 6.71     | 248277      |
| RESCUE SQUAD | 7.01     | 32823       |
| CHIEF        | 7.59     | 192531      |
| MEDIC        | 9.86     | 1012575     |
                 ...
+--------------+----------+-------------+

About the data

Dataset Source: SF OpenData

Category: San Francisco

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Update Frequency: Daily

View in BigQuery: Go to SF Fire Department Service Calls data

Send feedback about...