Machine learning for time series

Key concepts and useful resources.

Introduction

Time series is sequentially revealed, time-stamped and time-critical data. People also call it "streaming data", "event streams" or "sequential data".

Just like "software is eating the world", time series are eating static data sets:

  • The proliferation of time-stamped data follows naturally from the digitization of industry. The ongoing deployment of billions of connected sensors will only accelerate the trend.
  • As a consequence, lots of decision-making processes that used to be fairly static (based on stable information) are becoming dynamic (based on streaming data).
The problem for data scientists is that time series modeling differs significantly from standard machine learning practices. (For excellent overviews of these differences, click here and here.)

The problem for operational managers is that standard machine learning solutions applied to time series underperform.

***

This page provides short explanations and links to interesting resources about the three main aspects of machine learning for time series: preprocessing, modeling and post-processing.


Preprocessing

You don't need perfect data streaming from a brand-new data lake to build valuable machine learning solution for time series. But you do need to handle your data carefully.

The resources below deal with the preprocessing steps required for efficient time series modeling.


Modeling

Modeling is the core of the data scientist's jobs. The possibilities are endless, and the state of the art fast-moving.

Surprisingly given how pervasive time series are, the specificities of time series modeling are not well known. The resources below cover the main aspects of time series modeling.


Post-processing

Congratulations, your machine learning prototype is completed. Does it mean it is ready for production? Unfortunately not... Pre-processing and modeling are only half of the job. Software packaging (« post-processing ») is the other half - technical, fastidious, but absolutely necessary.

The resources below describe the main post-processing tasks required for machine learning and time series.

 

Banner specificities - Large

Features

  • Data sequentialization
  • Stationarization and filtering
  • Built-in models
  • Compatibility with ML libraries
  • Aggregation and stacking
  • Custom cost functions
  • Parallelization and distribution
  • Connectors
  • Graph structure
  • Performance monitoring
  • Checkpoints and backups
  • Continual improvement