Bay Area Bike Share Trips Data

How to query public data sets using BigQuery

BigQuery is a fully managed data warehouse and analytics platform. Public datasets are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

To get started using a BigQuery public dataset, create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you should also enable billing.

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Manage resources page

  3. Enable billing for your project.

    Enable billing

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Dataset overview

This data includes all Bay Area Bike Share trips from August 2013 to the present, and is updated daily.

You can start exploring this data in the BigQuery console:

Go to Bay Area Bike Share Trips Dataset

Sample queries

Here are some examples of SQL queries you can run on this data in BigQuery.

These samples use BigQuery’s support for standard SQL. Use the #standardSQL tag to let BigQuery know you want to use standard SQL. For more information about the #standardSQL prefix, see Setting a query prefix.

How long was the average ride length each year?

First let’s look at the average ride length for each year since 2013. This query uses the duration_sec column to calculate average ride length.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  EXTRACT(YEAR FROM start_date) AS year,
  ROUND(AVG(duration_sec/60), 2) AS avg_duration_min,
  COUNT(*) AS yearly_trips
FROM
  `bigquery-public-data.san_francisco.bikeshare_trips`
GROUP BY
  YEAR
ORDER BY
  YEAR DESC

Command-line

bq query --use_legacy_sql=false '
SELECT
  EXTRACT(YEAR FROM start_date) AS year,
  ROUND(AVG(duration_sec/60), 2) AS avg_duration_min,
  COUNT(*) AS yearly_trips
FROM
  `bigquery-public-data.san_francisco.bikeshare_trips`
GROUP BY
  YEAR
ORDER BY
  YEAR DESC'

The results are shown here:

+---------+------------------+--------------+
|  year   | avg_duration_min | yearly_trips |
+---------+------------------+--------------+
| 2016    | 13.82            | 210494       |
| 2015    | 15.68            | 346252       |
| 2014    | 18.87            | 326339       |
| 2013    | 21.97            | 100563       |
+---------+------------------+--------------+

Which bikes have been ridden the most?

Next, using the bike_number identifier, let’s find which bikes have logged the most time by riders.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  bike_number,
  ROUND(SUM(duration_sec/60)) AS total_duration_min
FROM
  `bigquery-public-data.san_francisco.bikeshare_trips`
WHERE
  EXTRACT(DATE from start_date)=EXTRACT(DATE from end_date)
GROUP BY
   bike_number
ORDER BY
  total_duration_min DESC
LIMIT 10

Command-line

bq query --use_legacy_sql=false '
SELECT
  bike_number,
  ROUND(SUM(duration_sec/60)) AS total_duration_min
FROM
  `bigquery-public-data.san_francisco.bikeshare_trips`
WHERE
  EXTRACT(DATE from start_date)=EXTRACT(DATE from end_date)
GROUP BY
   bike_number
ORDER BY
  total_duration_min DESC
LIMIT 10'

The results are shown here:

+-------------+--------------------+
| bike_number | total_duration_min |
+-------------+--------------------+
| 625         | 44125.0            |
| 613         | 42746.0            |
| 553         | 42557.0            |
| 388         | 42366.0            |
| 389         | 42104.0            |
| 468         | 41764.0            |
| 631         | 41718.0            |
| 623         | 41525.0            |
| 587         | 41175.0            |
| 542         | 41139.0            |
+-------------+--------------------+

The data also includes the station name for where each trip starts and ends. Let’s look at the most popular start and end station pairing. Note, if you need to map stations, you can use the bikeshare_stations table which contains geo points.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  start_station_name,
  end_station_name,
  count(*) as count
FROM `bigquery-public-data.san_francisco.bikeshare_trips`
group by start_station_name, end_station_name
ORDER by count desc
LIMIT 10

Command-line

bq query --use_legacy_sql=false '
SELECT
  start_station_name,
  end_station_name,
  count(*) as count
FROM `bigquery-public-data.san_francisco.bikeshare_trips`
group by start_station_name, end_station_name
ORDER by count desc
LIMIT 10'

The results are shown here:

+------------------------------------------+------------------------------------------+-------+
|  start_station_name                      | end_station_name                         | count |
+------------------------------------------+------------------------------------------+-------+
| Harry Bridges Plaza (Ferry Building)     | Embarcadero at Sansome                   | 9150  |
| San Francisco Caltrain 2 (330 Townsend)  | Townsend at 7th                          | 8508  |
| 2nd at Townsend            | 18.87       | Harry Bridges Plaza (Ferry Building)     | 7620  |
| Harry Bridges Plaza (Ferry Building)     | 2nd at Townsend                          | 6888  |
| Embarcadero at Sansome                   | Steuart at Market| 100563                | 6874  |
| Townsend at 7th                          | San Francisco Caltrain 2 (330 Townsend)  | 6836  |
| Embarcadero at Folsom                    | San Francisco Caltrain (Townsend at 4th) | 6351  |
| San Francisco Caltrain (Townsend at 4th) | Harry Bridges Plaza (Ferry Building)     | 6215  |
| Steuart at Market                        | 2nd at Townsend                          | 6039  |
| Steuart at Market                        | San Francisco Caltrain (Townsend at 4th) | 5959  |
+------------------------------------------+------------------------------------------+-------+

About the data

Dataset Source: Bay Area Bike Share

Category: San Francisco, Transportation

Use: This dataset is publicly available for anyone to use and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Update Frequency: Daily

View in BigQuery: Go to Bay Area Bike Share Trips Dataset

Send feedback about...