Seasonality - recurring but not necessarily periodic data patterns - is a staple of time series modeling. Since capturing true seasonality greatly enhances model accuracy, we wanted to share our thoughts and experience on the detection and modeling of such data patterns.
Seasonality exists in many flavors:
- Single (e.g. annual, like monthly sales in retail) or multiple (e.g. weekly and annual, like The Voice online votes).
- Periodic (e.g. New Year's Day) or not periodic (e.g. Black Friday).
- Integer or non-integer periods. For example weekly data with annual seasonality recurs every 365.25 / 7 = 52.18 weeks.
- Multiple and nested (e.g. the daily and weekly seasonalities of hourly electricity demand) or multiple and not nested (e.g. the weekly and yearly seasonalities of daily electricity demand).
Obviously, mixed-flavor seasonality is harder to detect. Rather than spending too much time on cumbersome data analysis, Datapred data scientists favor a two-step approach:
- Preliminary detection with simple statistical methods.
- Fine-tuning based on modeling results.
The mere visualization of autocorrelation and partial autocorrelation plots, combined with some understanding of the corresponding real-world phenomenon, will often yield a fair estimate of the seasonality you are dealing with.
You will confirm this visual hunch with a quick round of time-series decomposition, for example with moving averages:
- De-trend your data with a centered moving average the size of your estimated seasonality.
- Isolate the seasonal component with one moving average per relevant time-step (e.g. one moving average per calendar day for a weekly seasonality, or one per month for an annual seasonality).
This will give you enough knowledge to select the initial batch of seasonal models you will work from during step 2.
There are four main families of basic seasonal models:
- ExponenTial Smoothing (ETS) models, including Holt-Winters models (the 1960s seasonality stars).
- Seasonal ARIMA (SARIMA) models.
- Models based on Fourier series, where seasonalities are represented by linear combinations of cosine and sine terms.
- Regression models based on dummy variables. These variables are typically binary and indicate the occurrence of a specific season (e.g. a day of the week or a calendar event).
From our experience, you will not need more sophisticated models for seasonality analysis:
- Integer single seasonalities are captured by simple ETS models, and non-integer single seasonalities by models based on Fourier series.
- (Generalizations of) Holt-Winters models are good for nested and integer multiple seasonalities.
- Non-periodic cycles are particularly well addressed by decision trees based on dummy variables (although you may need to de-trend your date with a moving average first).
Backtesting these relatively simple models will confirm/infirm the conclusions of step 1 and provide a roadmap for potential further investigations.
The case of complex seasonalities
Complex seasonality modeling will usually require the combination of multiple seasonal models. Datapred data scientists like cascades of regression models based on dummy variables, which they find easy to set up, accurate and scalable:
- We define multiple layers of dummy variables and models - each layer corresponding to one seasonality.
- We apply the first layer to the original time series and extract the first seasonality. The residual obtained by removing this seasonality from the original time series is returned.
- We then apply the second layer to the first residual, the third layer to the second residual, etc.
Finally, as often with time series modeling, we would recommend to automate as much of the dumb stuff as possible (e.g. integer vs. non-integer period? single vs. multiple seasonality?), and quickly test a scalable implementation of simple models.
Interested in machine learning for time series? Check out our list of online resources, our download our detailed presentation on the specificities of time series modeling: