Skip to content

Latest commit

 

History

History
23 lines (17 loc) · 3.18 KB

File metadata and controls

23 lines (17 loc) · 3.18 KB

The purpose of these notebooks is to demonstrate the application of time series clustering techniques to split the target time series (TTS) data into homogenous chunks that may produce more accurate forecasts for the subsets of data when trained individually with the Forecast service. The intuition is that training Forecast models with clustered data will allow them to learn stronger patterns from homogenous subsets of the time series data.

We provide examples of several time series clustering techniques:

  1. Based on tslearn.clustering module of Python package tslearn for clustering the time series dataset using the DTW Barycenter Averaging (DBA) KMeans algorithm with Dynamic Time Warping (DTW) distance as the metric.
  2. Based on sklearn.cluster module of Python package scikit-learn for clustering tabular dataset using the K-means algorithm. To transform Time Series data into usual tablar data we are using TSFresh python package. It automatically calculates a large number of time series characteristics, the so called features.

Dataset: We use the open source UCI Online Retail II Data Set for this demonstration.

The collection includes 3 notebooks:

  1. 01. Optional - Data Cleaning and Preparation is optional relating to data cleaning / processing
  2. 02. Time Series Clustering Using DTW KMeans is relating to time series clustering.
  3. 03. Time Series Clustering using TSFresh + KMeans

Please note, these notebooks cover the preprocessing and data preparation steps related to the clustering of Time Series data. The reader is referred to the Forecast Developers Guide for model training and evaluation.

References: