Our definition of times series is: sequentially revealed, time-stamped and time-critical data.
Examples include stock prices, logistics scans, machine logs, retail sales, website visits... Counter-examples include images, speech, consumer preferences, social graphs...
Why do machine learning algorithms applied to time series underperform when using standard training, validation and testing procedures?
This comes from three specificities of times series.
- They have no stable underlying structure. In other (technical) terms, you can't assume that data points are independent and identically distributed ("IID"). It is safer, and more realistic, to assume frequently shifting time dependencies.
- Their sequence matters for learning. Suppose you are learning to recognize cats based on one million pictures; the order in which your algorithm processes them won't change the outcome of the learning process. But if you take a stock chart, break it into daily variations and feed these in random order to your algorithm, there will be nothing useful left to learn.
- Their sequence matters for interpretation. With time series, the sequence of events is as meaningful as the events themselves. For example, the fact that a delivery delay occurred today and not yesterday is potentially crucial information for a supply chain manager. Whereas the fact that I posted my picture today and not yesterday doesn't affect Facebook's ability to tag me.
The standard training, validation and testing procedure is based on the assumption that data points are IID. And standard feature engineering techniques don't work natively on sequential data...
Hence the two usual consequences of ignoring the specificities of time series when building your machine learning solution:
- Without proper training and testing procedures, you will lose 30-50% of your solution's maximum achievable accuracy. Of course for low-impact applications such reduced performance may be acceptable - something is often better than nothing.
- Less tolerable is the fact that you will have zero confidence in your solution's ability to withstand contact with reality: while it is possible to ignore the specificities of time series in the test-tube context of prototypes, they will hit you in the face as soon as your solution is in production.
So where should you start? Our Vital Checklist for Time Series Modeling gives you a few tips. Your can also check this page for a list of resources on machine learning for time series. And don't hesitate to contact us for more information on the time series modeling capabilities of the Datapred suite.