NYC Taxi and Limousine Trips

This dataset is collected by the NYC Taxi and Limousine Commission (TLC) and includes trip records from all trips completed in yellow and green taxis in NYC from 2009 to 2015. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. For detailed information about this dataset, go to TOC Trip Record Data

You can start exploring this data for yellow and green taxis in the BigQuery console:

Go to the yellow taxi dataset

Go to the green taxi dataset

Sample queries

Here are some examples of SQL queries you can run on this data in BigQuery.

2014 trips by month (yellow cabs)

Web UI

#legacySQL
SELECT
  LEFT(STRING(pickup_datetime), 7) month,
  COUNT(*) trips
FROM
  [nyc-tlc:yellow.trips]
WHERE
  YEAR(pickup_datetime) = 2014
GROUP BY
  1
ORDER BY
  1

Command-line

bq query --use_legacy_sql=true '
SELECT
  LEFT(STRING(pickup_datetime), 7) month,
  COUNT(*) trips
FROM
  [nyc-tlc:yellow.trips]
WHERE
  YEAR(pickup_datetime) = 2014
GROUP BY
  1
ORDER BY
  1'

yellow cab trips per month 2014

Average speed per hour of day (yellow cabs)

Web UI

#legacySQL
SELECT
  HOUR(pickup_datetime) hour,
  INTEGER(100*AVG(trip_distance/((dropoff_datetime-pickup_datetime)/3600000000)))/100 speed
FROM
  [nyc-tlc:yellow.trips]
WHERE
  fare_amount/trip_distance BETWEEN 2
  AND 10
GROUP BY
  1
ORDER BY
  1

Command-line

bq query --use_legacy_sql=true '
SELECT
  HOUR(pickup_datetime) hour,
  INTEGER(100*AVG(trip_distance/((dropoff_datetime-pickup_datetime)/3600000000)))/100 speed
FROM
  [nyc-tlc:yellow.trips]
WHERE
  fare_amount/trip_distance BETWEEN 2
  AND 10
GROUP BY
  1
ORDER BY
  1'

During the day the average speed is around 11-12 MPH, but at 5 AM the average speed almost doubles to 22 MPH.

average cab speed hourly

Average speed by day of the week

Web UI

#legacySQL
SELECT
  DAYOFWEEK(pickup_datetime) dayofweek,
  INTEGER(100*AVG(trip_distance/((dropoff_datetime-pickup_datetime)/3600000000)))/100 speed
FROM
  [nyc-tlc:yellow.trips]
WHERE
  fare_amount/trip_distance BETWEEN 2
  AND 10
  AND HOUR(pickup_datetime) BETWEEN 8
  AND 18
GROUP BY
  1
ORDER BY
  1

Command-line

bq query --use_legacy_sql=true '
SELECT
  DAYOFWEEK(pickup_datetime) dayofweek,
  INTEGER(100*AVG(trip_distance/((dropoff_datetime-pickup_datetime)/3600000000)))/100 speed
FROM
  [nyc-tlc:yellow.trips]
WHERE
  fare_amount/trip_distance BETWEEN 2
  AND 10
  AND HOUR(pickup_datetime) BETWEEN 8
  AND 18
GROUP BY
  1
ORDER BY
  1'

Sundays are the fastest days of the week, followed by Saturday and Monday. Why are Mondays faster than all the other week days?

average cab speed by day of week

About the data

Dataset Source: NYC OpenData

Category: Geographic, Transportation

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — https://data.cityofnewyork.us/ — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

View in BigQuery: Go to NYC TLC Yellow Trips Go to NYC TLC Green Trips

Send feedback about...

BigQuery Documentation