Merge pull request #379 from unit8co/develop

unit8co · Jul 9, 2021 · cb3a602 · cb3a602
2 parents 39e0081 + 49302a8
commit cb3a602
Show file tree

Hide file tree

Showing 75 changed files with 5,477 additions and 2,089 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,8 +4,36 @@
 Darts is still in an early development phase and we cannot always guarantee backwards compatibility. Changes that may **break code which uses a previous release of Darts** are marked with a "&#x1F534;".
 
 ## [Unreleased](https://github.com/unit8co/darts/tree/develop)
-[Full Changelog](https://github.com/unit8co/darts/compare/0.8.1...develop)
+[Full Changelog](https://github.com/unit8co/darts/compare/0.9.0...develop)
 
+## [0.9.0](https://github.com/unit8co/darts/tree/0.9.0) (2021-07-09)
+### For users of the library:
+
+**Added:**
+- Multiple forecasting models can now produce probabilistic forecasts by specifying a `num_samples` parameter when calling `predict()`. Stochastic forecasts are stored by utilizing the new `samples` dimension in the refactored `TimeSeries` class (see 'Changed' section). Models supporting probabilistic predictions so far are `ARIMA`, `ExponentialSmoothing`, `RNNModel` and `TCNModel`.
+- Introduced `LikelihoodModel` class which is used by probabilistic `TorchForecastingModel` classes in order to make predictions in the form of parametrized distributions of different types.
+- Added new abstract class `TorchParametricProbabilisticForecastingModel` to serve as parent class for probabilistic models.
+- Introduced new `FilteringModel` abstract class alongside `MovingAverage`, `KalmanFilter` and `GaussianProcessFilter` as concrete implementations.
+- Future covariates are now utilized by `TorchForecastingModels` when the forecasting horizon exceeds the `output_chunk_length` of the model. Before, `TorchForecastingModel` instances could only predict beyond their `output_chunk_length` if they were not trained on covariates, i.e. if they predicted all the data they need as input. This restriction has now been lifted by letting a model not only consume its own output when producing long predictions, but also utilizing the covariates known in the future, if available.
+- Added a new `RNNModel` class which utilizes and rnn module as both encoder and decoder. This new class natively supports the use of the most recent future covariates when making a forecast. See documentation for more details.
+- Introduced optional `epochs` parameter to the `TorchForecastingModel.predict()` method which, if provided, overrides the `n_epochs` attribute in that particular model instance and training session.
+- Added support for `TimeSeries` with a `pandas.RangeIndex` instead of just allowing `pandas.DatetimeIndex`.
+- `ForecastingModel.gridsearch` now makes use of parallel computation.
+- Introduced a new `force_reset` parameter to `TorchForecastingModel.__init__()` which, if left to False, will prevent the user from overriding model data with the same name and directory.
+
+
+**Fixed:**
+- Solved bug occurring when training `NBEATSModel` on a GPU.
+- Fixed crash when running `NBEATSModel` with `log_tensorboard=True`
+- Solved bug occurring when training a `TorchForecastingModel` instance with a `batch_size` bigger than the available number of training samples.
+- Some fixes in the documentation, including adding more details
+- Other minor bug fixes
+
+**Changed:**
+- &#x1F534; The `TimeSeries` class has been refactored to support stochastic time series representation by adding an additional dimension to a time series, namely `samples`. A time series is now based on a 3-dimensional `xarray.DataArray` with shape `(n_timesteps, n_components, n_samples)`. This overhaul also includes a change of the constructor which is incompatible with the old one. However, factory methods have been added to create a `TimeSeries` instance from a variety of data types, including `pd.DataFrame`. Please refer to the documentation of `TimeSeries` for more information.
+- &#x1F534; The old version of `RNNModel` has been renamed to `BlockRNNModel`.
+- The `historical_forecast()` and `backtest()` methods of `ForecastingModel` have been reorganized a bit by making use of new wrapper methods to fit and predict models.
+- Updated `README.md` to reflect the new additions to the library.
 
 ## [0.8.1](https://github.com/unit8co/darts/tree/0.8.1) (2021-05-22)
 **Fixed:**

diff --git a/README.md b/README.md
@@ -16,8 +16,8 @@ It contains a variety of models, from classics such as ARIMA to deep neural netw
 The models can all be used in the same way, using `fit()` and `predict()` functions,
 similar to scikit-learn. The library also makes it easy to backtest models,
 and combine the predictions of several models and external regressors. Darts supports both
-univariate and multivariate time series and models, and the neural networks can be trained
-on multiple time series.
+univariate and multivariate time series and models. The neural networks can be trained
+on multiple time series, and some of the models offer probabilistic forecasts.
 
 ## Documentation
 * [Examples & Tutorials](https://unit8co.github.io/darts/examples.html)
@@ -44,29 +44,33 @@ Create a `TimeSeries` object from a Pandas DataFrame, and split it in train/vali
 ```python
 import pandas as pd
 from darts import TimeSeries
+
+# Read a pandas DataFrame
 df = pd.read_csv('AirPassengers.csv', delimiter=",")
+
+# Create a TimeSeries, specifying the time and value columns
 series = TimeSeries.from_dataframe(df, 'Month', '#Passengers')
-train, val = series.split_after(pd.Timestamp('19580101'))
-```
 
-Fit an exponential smoothing model, and make a prediction over the validation series' duration:
+# Set aside the last 36 months as a validation series
+train, val = series[:-36], series[-36:]
+```
 
+Fit an exponential smoothing model, and make a (probabilistic) prediction over the validation series' duration:
 ```python
 from darts.models import ExponentialSmoothing
 
 model = ExponentialSmoothing()
 model.fit(train)
-prediction = model.predict(len(val))
+prediction = model.predict(len(val), num_samples=1000)
 ```
 
-Plot:
+Plot the median, 5th and 95th percentiles:
 ```python
 import matplotlib.pyplot as plt
 
-series.plot(label='actual')
-prediction.plot(label='forecast', lw=2)
+series.plot()
+prediction.plot(label='forecast', low_quantile=0.05, high_quantile=0.95)
 plt.legend()
-plt.xlabel('Year')
 ```
 
 <div style="text-align:center;">
@@ -81,17 +85,8 @@ the [examples](https://github.com/unit8co/darts/tree/master/examples) directory.
 
 Currently, the library contains the following features:
 
-**Forecasting Models:**
-
-* Exponential smoothing,
-* ARIMA & auto-ARIMA,
-* Facebook Prophet,
-* Theta method,
-* FFT (Fast Fourier Transform),
-* Recurrent neural networks (vanilla RNNs, GRU, and LSTM variants),
-* Temporal convolutional network.
-* Transformer
-* N-BEATS
+**Forecasting Models:** A large collection of forecasting models; from statistical models (such as
+ARIMA) to deep learning models (such as N-BEATS). See table of models below.
 
 **Data processing:** Tools to easily apply (and revert) common transformations on time series data (scaling, boxcox, …)
 
@@ -100,11 +95,40 @@ from R2-scores to Mean Absolute Scaled Error.
 
 **Backtesting:** Utilities for simulating historical forecasts, using moving time windows.
 
-**Regressive Models:** Possibility to predict a time series from several other time series
-(e.g., external regressors), using arbitrary regressive models
+**Regressive Models:** Possibility to predict a time series from lagged versions of itself
+and of some external covariate series, using arbitrary regression models (e.g. scikit-learn models)
 
 **Multivariate Support:** Tools to create, manipulate and forecast multivariate time series.
 
+**Probabilistic Support:** `TimeSeries` objects can (optionally) represent stochastic
+time series; this can for instance be used to get confidence intervals.
+
+**Filtering Models:** Darts offers three filtering models: `KalmanFilter`, `GaussianProcessFilter`,
+and `MovingAverage`, which allow to filter time series, and in some cases obtain probabilistic
+inferences of the underlying states/values.
+
+## Forecasting Models
+Here's a breakdown of the forecasting models currently implemented in Darts. We are constantly working
+on bringing more models and features.
+
+Model | Univariate | Multivariate | Probabilistic | Multiple-series training | Past-observed covariates support | Future-known covariates support
+--- | --- | --- | --- | --- | --- | --- |
+`ARIMA` | x | | x | | | |
+`VARIMA` | x | x | | | | |
+`AutoARIMA` | x | | | | | |
+`ExponentialSmoothing` | x | | x | | | |
+`Theta` and `FourTheta` | x | | | | | |
+`Prophet` | x | | | | | |
+`FFT` (Fast Fourier Transform) | x | | | | | |
+Regression Models (incl `RandomForest` and `LinearRegressionModel`) | x | | | | | |
+`RNNModel` (incl. LSTM and GRU); equivalent to DeepAR in its probabilistic version | x | x | x | x | x | x |
+`BlockRNNModel` (incl. LSTM and GRU) | x | x | | x | x | (x) |
+`NBEATSModel` | x | x | | x | x | (x) |
+`TCNModel` | x | x | x | x | x | (x) |
+`TransformerModel` | x | x | | x | x | (x) |
+Naive Baselines | x | | | | | |
+
+
 ## Contribute
 
 The development is ongoing, and there are many new features that we want to add.

diff --git a/darts/dataprocessing/transformers/boxcox.py b/darts/dataprocessing/transformers/boxcox.py
@@ -15,6 +15,8 @@
 
 logger = get_logger(__name__)
 
+# TODO: extend to stochastic series
+
 
 class BoxCox(FittableDataTransformer, InvertibleDataTransformer):
 
@@ -28,7 +30,7 @@ def __init__(self,
                  verbose: bool = False):
         """
         Box-Cox data transformer.
-        See https://otexts.com/fpp2/transformations.html#mathematical-transformations for more information
+        See https://otexts.com/fpp2/transformations.html#mathematical-transformations for more information.
 
         Parameters
         ----------
@@ -94,7 +96,7 @@ def ts_fit(series: TimeSeries,
         if lmbda is None:
             # Compute optimal lmbda for each dimension of the time series. In this case, the return type is
             # a pd.core.series.Series, which is not inhering from collections.abs.Sequence
-            lmbda = series._df.apply(boxcox_normmax, method=method)
+            lmbda = series.pd_dataframe(copy=False).apply(boxcox_normmax, method=method)
         elif isinstance(lmbda, Sequence):
             raise_if(len(lmbda) != series.width,
                      "lmbda should have one value per dimension (ie. column or variable) of the time series",
@@ -109,19 +111,19 @@ def ts_fit(series: TimeSeries,
     def ts_transform(series: TimeSeries, lmbda: Union[Sequence[float], pd.core.series.Series]) -> TimeSeries:
 
         def _boxcox_wrapper(col):
-            idx = series._df.columns.get_loc(col.name)  # get index from col name
+            idx = series.pd_dataframe(copy=False).columns.get_loc(col.name)  # get index from col name
             return boxcox(col, lmbda[idx])
 
-        return TimeSeries.from_dataframe(series._df.apply(_boxcox_wrapper))
+        return TimeSeries.from_dataframe(series.pd_dataframe(copy=False).apply(_boxcox_wrapper))
 
     @staticmethod
     def ts_inverse_transform(series: TimeSeries, lmbda: Union[Sequence[float], pd.core.series.Series]) -> TimeSeries:
 
         def _inv_boxcox_wrapper(col):
-            idx = series._df.columns.get_loc(col.name)  # get index from col name
+            idx = series.pd_dataframe(copy=False).columns.get_loc(col.name)  # get index from col name
             return inv_boxcox(col, lmbda[idx])
 
-        return TimeSeries.from_dataframe(series._df.apply(_inv_boxcox_wrapper))
+        return TimeSeries.from_dataframe(series.pd_dataframe(copy=False).apply(_inv_boxcox_wrapper))
 
     def fit(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> 'FittableDataTransformer':
         # adding lmbda and optim_method params

diff --git a/darts/dataprocessing/transformers/scaler.py b/darts/dataprocessing/transformers/scaler.py
@@ -59,17 +59,17 @@ def __init__(self,
 
     @staticmethod
     def ts_transform(series: TimeSeries, transformer) -> TimeSeries:
-        return TimeSeries.from_times_and_values(series.time_index(),
-                                                transformer.transform(series.values().
-                                                                      reshape((-1, series.width))),
-                                                series.freq())
+        return TimeSeries.from_times_and_values(times=series.time_index,
+                                                values=transformer.transform(series.values().
+                                                                             reshape((-1, series.width))),
+                                                fill_missing_dates=False)
 
     @staticmethod
     def ts_inverse_transform(series: TimeSeries, transformer, *args, **kwargs) -> TimeSeries:
-        return TimeSeries.from_times_and_values(series.time_index(),
-                                                transformer.inverse_transform(series.values().
+        return TimeSeries.from_times_and_values(times=series.time_index,
+                                                values=transformer.inverse_transform(series.values().
                                                                               reshape((-1, series.width))),
-                                                series.freq())
+                                                fill_missing_dates=False)
 
     @staticmethod
     def ts_fit(series: TimeSeries, transformer, *args, **kwargs) -> Any:

diff --git a/darts/datasets/dataset_loaders.py b/darts/datasets/dataset_loaders.py
@@ -145,5 +145,5 @@ def _load_from_disk(self, path_to_file: Path, metadata: DatasetLoaderMetadata) -
         df = pd.read_csv(path_to_file)
         if metadata.header_time is not None:
             df = self._format_time_column(df)
-            return TimeSeries.from_dataframe(df, metadata.header_time)
-        return TimeSeries(df, dummy_index=True)
+            return TimeSeries.from_dataframe(df=df, time_col=metadata.header_time)
+        return TimeSeries.from_dataframe(df)