Machine learning for demand prediction is all the rage: industrial companies are suddenly waking up to the potential of machine learning in that area, proofs of concept are being launched everywhere, consulting companies are making millions…
Let’s take a step back and discuss, based on our experience, what works and what doesn’t (in no particular order).
1. Careful preprocessing
Sales data can be messy: missing data points, outliers resulting from data entry glitches... While clients tend to overstress about data quality (they are seeing too many « data lake » vendors), a little cleaning up is always useful.
You can automate part, but not all of that cleaning up. Human understanding of the business reality behind the data is required. For example:
- Why are French e-commerce sales always higher on the fifth day of the month? (Because of social security payments.)
- When a daily sales number is zero, does it mean zero sales on that day or missing sales data?
Preprocessing will also involve stationarization - extracting stable underlying patterns (such as long-term trends and seasonality) from apparently unstable data. It is quite intuitive that leveraging such patterns will improve prediction accuracy.
2. Human expertise
Combining human expertise and machine learning is very valuable in demand prediction, since the companies that care most about demand prediction are often big practitioners of demand management.
And while some demand management initiatives may be anticipated algorithmically (e.g. seasonal sales), most are less predictable (« let’s have an Italian week in March »). But managers will know about them in advance and adjust their demand expectations accordingly. Capturing these expectations will improve prediction accuracy.
Human expertise will also help with data preprocessing and model selection. Auto-ML is fun for quick and dirty prototypes, but let’s be serious: you are a leader in your industry, your professionals are world-class experts - if they have an opinion on product seasonality or the impact of promotions, you probably want to listen.
3. Multiple models
The universal model doesn't exist. For a given data structure (e.g. demand profile), some models will perform better than others.
We have noticed that leading B2C or B2B companies often use a legacy demand prediction model that is super-optimized for « standard conditions » (best-selling products, average prediction horizon, normal business environment...).
A single machine learning model will have a hard time beating that benchmark. But aggregating multiple models (using all of them simultaneously) will let you scrap so much performance outside of these standard conditions that overall, the multi-model machine learning solution will massively over-perform.
4. Change-point detection
Breaking news: what’s truly random can’t be predicted. The goal of machine learning with regards to random but structural events affecting your demand is not to predict them, but to adapt to them as fast as possible. This is where « change-point detection » methods will help.
Such methods can be quite complex but, when done right and integrated into your machine learning solution, will add precious accuracy points to your demand predictions.
Sales are archetypal time series: sequentially revealed, time-stamped, time-critical data. Because of the specificities of time series, you can’t expect your standard train-validate-test data science process to work. Two things will happen if you use that process:
- You won’t get maximum prediction accuracy.
- Your machine learning prototype will crash in production.
The alternative is to backtest you solution: at every moment in your data set, you must train your model on known/past data at that moment, and test it on unknown/future data at that moment.
Rigorous backtests will give you an immediate prediction accuracy boost of 10-15%, and ensure that your machine learning solution is production-ready.
What doesn't work
1. Insisting on a machine learning solution when the prediction challenge is straightforward...
Suppose for example that you want national monthly demand predictions for ice cream.
At such an aggregated level (national not local, monthly not daily, any flavor not every flavor) for such a mainstream and seasonal product, chances are that classic statistics in Excel will do the job.
The more complexity, the more over-performance you can expect from machine learning. For demand prediction challenges, complexity usually means: multiple locations, combination of short- and long-term predictions, diverse product mix, massive product portfolio, navigation across a multi-level product hierarchy, impact of contextual data…
2. ...or when it's impossible
Sometimes, a prospect will tell us: "Our demand predictions are pretty good, except for product XYZ that we only sell once every three years. Could machine learning help?".
The answer is no. Machine learning requires meaningful historical data points for your prediction target. Consider the impossibility of predicting the demand of such products a cost of doing business.
3. Deep learning
Deep learning is trendy, and does indeed deliver notable results in a number of areas - image recognition and natural language processing for example.
So definitely use it in these areas or for shining in society. For maximum demand prediction accuracy however, try something else. Because as we previously highlighted on this blog:
- Most deep learning models are not sequential.
- They need much more data than your sales generate.
- Their predictions are hard to explain.
4. Withholding information
Some people imagine that a true test of machine learning for demand prediction is to benchmark the machine learning solution against their existing forecasts, depriving the machine learning solution of key information that existing forecasts leverage.
One of our clients waited until the end of a demand prediction proof of concept to reveal that their legacy monthly demand forecast was based on… their order book for the next six weeks! They still had a prediction error of 25% and the Datapred-based machine learning solution without the order book still over-performed by 5%, but what a waste…
That attitude probably stems from the misconception of machine learning as some kind of mathematical alchemy that creates something out of nothing.
It doesn’t work like that: critical information will contribute to the performance of your machine learning solution just like it contributes to your existing forecasts. The question is: based on the same data (and potentially additional data that would be available in production and that only machine learning can process), does machine learning over-perform?
Are you eager to build a machine learning solution for demand prediction? Datapred will accelerate your POC-to-Production cycle by a factor of 10. Contact us to discuss.
For resources on machine learning for time series, this page is a good starting point.