Jump to Content
Data Analytics

How to use custom holidays for time-series forecasting in BigQuery ML

July 19, 2023
https://storage.googleapis.com/gweb-cloudblog-publish/images/aiml2022_PO1vxqJ.max-2500x2500.jpg
Honglin Zheng

Software Engineer

Anant Prakash

Product Manager, BigQuery ML

Time-series forecasting is one of the most important models across a variety of industries, such as retail, telecom, entertainment, manufacturing. It serves many use cases such as forecasting revenues, predicting inventory levels and many others. It’s no surprise that time series is one of the most popular models in BigQuery ML. Defining holidays is important in any time-series forecasting model to accommodate for variations and fluctuations in the time-series data. In this blog post we will discuss how you can take advantage of the recent enhancements to define custom holidays and get better explainability for your forecasting models in BigQuery ML.

You could already specify HOLIDAY_REGION when creating a time-series model. The model would use the holiday information within that HOLIDAY_REGION to capture the holiday effect. However, we heard from our customers that they are looking to understand the holiday effect in detail — which holidays are used in modeling, what is the contribution of individual holidays in the model as well as the ability to customize or create their own holidays for modeling.

To address these, we recently launched the preview of custom holiday modeling capabilities in ARIMA_PLUS and ARIMA_PLUS_XREG. With these capabilities, you can now do the following:

  1. Access to all the built-in holiday data by querying the BigQuery public dataset bigquery-public-data.ml_datasets.holidays_and_events_for_forecasting or by using the table value function ML.HOLIDAY_INFO. You can inspect the holiday data used for fitting your forecasting model

  2. Customize the holiday data (e.g. primary date and holiday effect window) using standard GoogleSQL to improve time series forecasting accuracy

  3. Explain the contribution of each holiday to the forecasting result

Before we dive into using these features, let’s first understand custom holiday modeling and why one might need it. Let's say you want to forecast the number of daily page views of the Wikipedia page for Google I/O, Google’s flagship event for developers. Given the large attendance of Google I/O you can expect significantly increased traffic to this page around the event days. Given that these are Google-specific dates and not included in the default HOLIDAY_REGION, the forecasted page views will not provide a good explanation for the spikes around those dates. So you need the ability to specify custom holidays in your model so that you get better explainability for your forecasting. With custom holiday modeling features, you can now build more powerful and accurate time-series forecasting models using BigQuery ML.

The following sections show some examples of the new custom holiday modeling in forecasting in BigQuery ML. In this example, we explore the bigquery-public-data.wikipedia dataset, which has the daily pageviews for Google I/O, create a custom holiday for Google I/O event, and then use the model to forecast the daily pageviews based on its historical data and factoring in the customized holiday calendar.

“The bank would like to utilize a custom holiday calendar as it has ‘tech holidays’ due to various reasons like technology freezes, market instability freeze etc. And, it would like to incorporate those freeze calendars while training the ML model for Arima,” said a data scientist of a large US based financial institution.

An example: forecast wikipedia daily pageviews for Google I/O

Step 1. Create the dataset

BigQuery hosts hourly wikipedia page view data across all languages. As a first step, we aggregate them by day and all languages.

Loading...

https://storage.googleapis.com/gweb-cloudblog-publish/images/image_2_5dLDFkW.max-2000x2000.jpg

Step 2: Forecast without custom holiday

Now we do a regular forecast. We use the daily page view data from 2017 to 2021 and forecast into the year of 2022.

Loading...

We can visualize the result from ml.explain_forecast using Looker Studio and get the following graph:

https://storage.googleapis.com/gweb-cloudblog-publish/images/image_3_iLSRc9S.max-2000x2000.png

As we can see, the forecasting model is capturing the general trend pretty well. However, it is not capturing the increased traffic that are related to previous Google I/O events and not able to generate an accurate forecast for 2022 either.

Step 3: Forecast with custom holiday

As we can see from below, Google I/O happened during these dates between 2017 and 2022. We would like to instruct the forecasting model to consider these dates as well.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image_4_KlDrM3E.max-2000x2000.jpg
Loading...

As we can see, we provide a full list of Google I/O’s event dates to our forecasting model. Besides, we also adjust the holiday effect window to cover four  days around the event date to better capture some potential view traffic before and after the event.

After visualizing in Looker Studio, we get the following chart:

https://storage.googleapis.com/gweb-cloudblog-publish/images/image_5.max-2000x2000.png

As we can see from the chart, our custom holiday significantly helped boost the performance of our forecasting model and now it is perfectly capturing the increase of page views caused by Google I/O.

Step 4: Explain fine-grained holiday effect

You can further inspect the holiday effect contributed by each individual holidays by using ml.explain_forecast:

Loading...

The results look similar to the following. As we can see, Google I/O indeed contributes a great amount of holiday effect to the overall forecast result for those custom holidays.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image_6_Vl5SUIm.max-2000x2000.png

Step 5: Compare model performance

At the end, we use ml.evaluate to compare the performance of the previous model created without custom holiday and the new model created with custom holiday. Specifically, we would like to see how the new model performs when it comes to forecasting a future custom holiday, and hence we are setting the time range on the week of Google I/O in 2022.

Loading...

We get the following result, which demonstrates the great performance boost of the new model:

https://storage.googleapis.com/gweb-cloudblog-publish/images/image_7_b9HOdAc.max-2000x2000.jpg

Conclusion

In the previous example, we demonstrated how to use custom holidays in forecasting and evaluate its impact on a forecasting model. The public dataset and the ML.HOLIDAY_INFO table value function is also helpful for understanding what holidays are used to fit your model. Some gains brought by this feature are as follows:

  1. You can configure custom holidays easily using standard GoogleSQL, enjoying BigQuery scalability, data governance, etc.

  2. You get elevated transparency and explainability of time series forecasting in BigQuery.

What’s next?

Custom holiday modeling in forecasting models is now available for you to try in preview. Check out the tutorial in BigQuery ML to learn how to use it. For more information, please refer to the documentation.


Acknowledgements: Thanks to Xi Cheng, Haoming Chen, Jiashang Liu, Amir Hormati, Mingge Deng, Eric Schmidt and Abhinav Khushraj from the BigQuery ML team. Also thanks to Weijie Shen, Jean Ortega from the Fargo team of Resource Efficiency Data Science team.

Posted in