Data Analytics

How to forecast demand with Google BigQuery, public datasets and TensorFlow

Demand forecasting is something that every business does. If you're a restaurant owner, you need to forecast how many diners you'll have tomorrow and what foods they'll order so that you know what ingredients to shop for and how many cooks to have in your kitchen. If you sell shirts, you need to predict in advance how many shirts of each color you need to order from your clothing manufacturer. Usually, business owners forecast demand using their gut-feel (“people are going to order more souffles than omelettes”) or rules of thumb (“stock more red turtlenecks around Christmas”).

The problem with gut-feel or rules of thumb is that they're rarely quantitative. How many more souffles will be ordered? How many more red turtlenecks should we stock? Could you use machine learning to forecast demand more precisely instead of relying on gut-feel or rules of thumb? If you have enough historical data from your business, you can. In this blog post, we'll show you how.

Machine Learning

First: what is machine learning? Normally, when you want a computer to do something for you, you have to program the computer to do it with an explicit set of rules. For example, if you want a computer to look at an image of a screw on a manufacturing line and figure out whether the screw is faulty or not, you have to code up a set of rules: Is the screw bent? Is the screw head broken? Is the screw discolored? etc.


With machine learning, you turn the problem around on its head. Instead of coming up with all kinds of logical rules for why a screw might be bad, you show the computer a whole bunch of data. Maybe you show it 5,000 images of good screws and 5,000 images of faulty screws that your (human) operators discarded for one reason or the other. Then, you let the computer figure out how to tell a bad screw from a good one. The computer is the “machine” and it's “learning” to make decisions based on data.

Forecasting taxicab demand in New York City


Let’s say you're the logistics manager for a taxicab company in New York, and you need to decide how many drivers you will ask to come in this coming Thursday. You know something about taxis in New York. You know, for example, that demand will depend on the day-of-the-week (demand on Thursdays is different from demand on Mondays) and on the weather forecast for Thursday. These are our predictors — the things we'll use to create our prediction.

We also need to decide what it is that we wish to predict. Let’s say that we'll predict the total number of taxicab rides (city-wide) on a particular day. We could assume that we'll get our typical share of those rides, and call in that many drivers. Put another way, this is our machine learning problem:


Predictors and target

Google BigQuery public datasets include both overall taxicab rides in New York (as the table nyc-tlc:green) and NOAA weather data (as the table fh-bigquery:weather_gsod), and so we decide to use those as our input datasets. The overall taxicab rides is only a proxy for the actual demand — the demand may have been higher than the actual number of rides if there were not enough taxis on the street or if the taxis were in different parts of the city from where the demand existed. However, this dataset is a good place to start if we assume the market for taxis in New York is efficient. If your business does not involve taxicabs or depends on something other than weather, you would load your own historical data into BigQuery.

In Google Cloud Datalab, you can run BigQuery queries, and get the result back in a form that's usable from within Python (the full Datalab notebook, along with a more detailed commentary, is on github):


Similarly, run a BigQuery query to get the number of taxicab rides by the daynumber (which represents the day of the year, e.g., daynumber=1 is New Year’s Day):


By merging the weather and trips datasets, we end up with the complete dataset to use for machine learning:


That’s our historical data, and we can use this historical data to predict taxicab demand based on the weather.


When doing machine learning, it's a good idea to have a benchmark. This will be a simple model, or perhaps be what your gut-instinct would tell you. We can evaluate whether the machine learning model is better than this benchmark by trying out both the simple model and the machine learning one on a test dataset.

In order to create this test dataset, we'll collect all our training data, and then split it 80:20. We'll train the model on 80% of the data, and use the remaining 20% to evaluate how well the machine learning model does.

A reasonable benchmark would be to use the average taxicab demand over the entire time period. If we can do better than just going with the average, our machine learning model is skillful. To measure how well a model does, we'll use the root-mean-square error. You could choose some other measure that's pertinent to the business problem you're solving. For example, you could compute the overall loss in revenue by having too few or too many drivers on a day and use that as your measure.


Because the root mean square error (RMSE) when using the average is about 12,700, that's the measure we wish to beat using machine learning. In other words, we want a RMSE that's lower than 12,700.


TensorFlow is a software library that was open-sourced by Google in 2015. One of the things that it excels at is conducting machine learning using neural networks, especially deep neural networks. You can play with neural network architectures on the TensorFlow playground.

Even though the code below may look intimidating, most of it is boilerplate (see the Datalab notebook for the complete code; Google Cloud Machine Learning, now in alpha, provides a simpler way to do this from Datalab). I'm using a neural network with one hidden layer (limiting the number of layers because we don’t have millions of examples, just a few hundred days), choosing intermediate nodes to be rectified linear units (relu), and setting the output node to be an identity node (because this is a regression problem, not a classification one).


We save the model and run it on the test dataset, and verify that we're doing better than our benchmark:


A RMSE of about 8,200 is much better than the RMSE of 12,700 we got by simply using the historical average.

Running a trained model

Once we have trained a model, running each time on new predictor data is quite straightforward. For example, let’s say we have the weather forecast for the next three days. We can simply pass in the predictor variables (day of week, min. and max. temperature, rain) to the neural network and obtain the predicted taxicab demand for the next three days:


It appears that we should tell some of our taxi drivers to take the day off on Wednesday (day=4) and be there in full strength on Thursday (day=5). Thursdays are usually "slow" days (taxi demand in New York peaks on the weekends), but the machine learning model tells us to expect heavy demand this particular Thursday because of the weather.

Google Cloud Platform made this demand forecasting problem particularly straightforward to carry out. Cloud Datalab provides an interactive Python notebook that's well-integrated with BigQuery, Pandas, and TensorFlow. The public datasets on GCP include historical weather observation data from NOAA. To learn more about GCP and its Big Data and Machine Learning capabilities, register for a training course.