Skip to content

Commit

Permalink
Merge pull request #379 from unit8co/develop
Browse files Browse the repository at this point in the history
  • Loading branch information
pennfranc authored Jul 9, 2021
2 parents 39e0081 + 49302a8 commit cb3a602
Show file tree
Hide file tree
Showing 75 changed files with 5,477 additions and 2,089 deletions.
30 changes: 29 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,36 @@
Darts is still in an early development phase and we cannot always guarantee backwards compatibility. Changes that may **break code which uses a previous release of Darts** are marked with a "🔴".

## [Unreleased](https://github.com/unit8co/darts/tree/develop)
[Full Changelog](https://github.com/unit8co/darts/compare/0.8.1...develop)
[Full Changelog](https://github.com/unit8co/darts/compare/0.9.0...develop)

## [0.9.0](https://github.com/unit8co/darts/tree/0.9.0) (2021-07-09)
### For users of the library:

**Added:**
- Multiple forecasting models can now produce probabilistic forecasts by specifying a `num_samples` parameter when calling `predict()`. Stochastic forecasts are stored by utilizing the new `samples` dimension in the refactored `TimeSeries` class (see 'Changed' section). Models supporting probabilistic predictions so far are `ARIMA`, `ExponentialSmoothing`, `RNNModel` and `TCNModel`.
- Introduced `LikelihoodModel` class which is used by probabilistic `TorchForecastingModel` classes in order to make predictions in the form of parametrized distributions of different types.
- Added new abstract class `TorchParametricProbabilisticForecastingModel` to serve as parent class for probabilistic models.
- Introduced new `FilteringModel` abstract class alongside `MovingAverage`, `KalmanFilter` and `GaussianProcessFilter` as concrete implementations.
- Future covariates are now utilized by `TorchForecastingModels` when the forecasting horizon exceeds the `output_chunk_length` of the model. Before, `TorchForecastingModel` instances could only predict beyond their `output_chunk_length` if they were not trained on covariates, i.e. if they predicted all the data they need as input. This restriction has now been lifted by letting a model not only consume its own output when producing long predictions, but also utilizing the covariates known in the future, if available.
- Added a new `RNNModel` class which utilizes and rnn module as both encoder and decoder. This new class natively supports the use of the most recent future covariates when making a forecast. See documentation for more details.
- Introduced optional `epochs` parameter to the `TorchForecastingModel.predict()` method which, if provided, overrides the `n_epochs` attribute in that particular model instance and training session.
- Added support for `TimeSeries` with a `pandas.RangeIndex` instead of just allowing `pandas.DatetimeIndex`.
- `ForecastingModel.gridsearch` now makes use of parallel computation.
- Introduced a new `force_reset` parameter to `TorchForecastingModel.__init__()` which, if left to False, will prevent the user from overriding model data with the same name and directory.


**Fixed:**
- Solved bug occurring when training `NBEATSModel` on a GPU.
- Fixed crash when running `NBEATSModel` with `log_tensorboard=True`
- Solved bug occurring when training a `TorchForecastingModel` instance with a `batch_size` bigger than the available number of training samples.
- Some fixes in the documentation, including adding more details
- Other minor bug fixes

**Changed:**
- 🔴 The `TimeSeries` class has been refactored to support stochastic time series representation by adding an additional dimension to a time series, namely `samples`. A time series is now based on a 3-dimensional `xarray.DataArray` with shape `(n_timesteps, n_components, n_samples)`. This overhaul also includes a change of the constructor which is incompatible with the old one. However, factory methods have been added to create a `TimeSeries` instance from a variety of data types, including `pd.DataFrame`. Please refer to the documentation of `TimeSeries` for more information.
- 🔴 The old version of `RNNModel` has been renamed to `BlockRNNModel`.
- The `historical_forecast()` and `backtest()` methods of `ForecastingModel` have been reorganized a bit by making use of new wrapper methods to fit and predict models.
- Updated `README.md` to reflect the new additions to the library.

## [0.8.1](https://github.com/unit8co/darts/tree/0.8.1) (2021-05-22)
**Fixed:**
Expand Down
70 changes: 47 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ It contains a variety of models, from classics such as ARIMA to deep neural netw
The models can all be used in the same way, using `fit()` and `predict()` functions,
similar to scikit-learn. The library also makes it easy to backtest models,
and combine the predictions of several models and external regressors. Darts supports both
univariate and multivariate time series and models, and the neural networks can be trained
on multiple time series.
univariate and multivariate time series and models. The neural networks can be trained
on multiple time series, and some of the models offer probabilistic forecasts.

## Documentation
* [Examples & Tutorials](https://unit8co.github.io/darts/examples.html)
Expand All @@ -44,29 +44,33 @@ Create a `TimeSeries` object from a Pandas DataFrame, and split it in train/vali
```python
import pandas as pd
from darts import TimeSeries

# Read a pandas DataFrame
df = pd.read_csv('AirPassengers.csv', delimiter=",")

# Create a TimeSeries, specifying the time and value columns
series = TimeSeries.from_dataframe(df, 'Month', '#Passengers')
train, val = series.split_after(pd.Timestamp('19580101'))
```

Fit an exponential smoothing model, and make a prediction over the validation series' duration:
# Set aside the last 36 months as a validation series
train, val = series[:-36], series[-36:]
```

Fit an exponential smoothing model, and make a (probabilistic) prediction over the validation series' duration:
```python
from darts.models import ExponentialSmoothing

model = ExponentialSmoothing()
model.fit(train)
prediction = model.predict(len(val))
prediction = model.predict(len(val), num_samples=1000)
```

Plot:
Plot the median, 5th and 95th percentiles:
```python
import matplotlib.pyplot as plt

series.plot(label='actual')
prediction.plot(label='forecast', lw=2)
series.plot()
prediction.plot(label='forecast', low_quantile=0.05, high_quantile=0.95)
plt.legend()
plt.xlabel('Year')
```

<div style="text-align:center;">
Expand All @@ -81,17 +85,8 @@ the [examples](https://github.com/unit8co/darts/tree/master/examples) directory.

Currently, the library contains the following features:

**Forecasting Models:**

* Exponential smoothing,
* ARIMA & auto-ARIMA,
* Facebook Prophet,
* Theta method,
* FFT (Fast Fourier Transform),
* Recurrent neural networks (vanilla RNNs, GRU, and LSTM variants),
* Temporal convolutional network.
* Transformer
* N-BEATS
**Forecasting Models:** A large collection of forecasting models; from statistical models (such as
ARIMA) to deep learning models (such as N-BEATS). See table of models below.

**Data processing:** Tools to easily apply (and revert) common transformations on time series data (scaling, boxcox, …)

Expand All @@ -100,11 +95,40 @@ from R2-scores to Mean Absolute Scaled Error.

**Backtesting:** Utilities for simulating historical forecasts, using moving time windows.

**Regressive Models:** Possibility to predict a time series from several other time series
(e.g., external regressors), using arbitrary regressive models
**Regressive Models:** Possibility to predict a time series from lagged versions of itself
and of some external covariate series, using arbitrary regression models (e.g. scikit-learn models)

**Multivariate Support:** Tools to create, manipulate and forecast multivariate time series.

**Probabilistic Support:** `TimeSeries` objects can (optionally) represent stochastic
time series; this can for instance be used to get confidence intervals.

**Filtering Models:** Darts offers three filtering models: `KalmanFilter`, `GaussianProcessFilter`,
and `MovingAverage`, which allow to filter time series, and in some cases obtain probabilistic
inferences of the underlying states/values.

## Forecasting Models
Here's a breakdown of the forecasting models currently implemented in Darts. We are constantly working
on bringing more models and features.

Model | Univariate | Multivariate | Probabilistic | Multiple-series training | Past-observed covariates support | Future-known covariates support
--- | --- | --- | --- | --- | --- | --- |
`ARIMA` | x | | x | | | |
`VARIMA` | x | x | | | | |
`AutoARIMA` | x | | | | | |
`ExponentialSmoothing` | x | | x | | | |
`Theta` and `FourTheta` | x | | | | | |
`Prophet` | x | | | | | |
`FFT` (Fast Fourier Transform) | x | | | | | |
Regression Models (incl `RandomForest` and `LinearRegressionModel`) | x | | | | | |
`RNNModel` (incl. LSTM and GRU); equivalent to DeepAR in its probabilistic version | x | x | x | x | x | x |
`BlockRNNModel` (incl. LSTM and GRU) | x | x | | x | x | (x) |
`NBEATSModel` | x | x | | x | x | (x) |
`TCNModel` | x | x | x | x | x | (x) |
`TransformerModel` | x | x | | x | x | (x) |
Naive Baselines | x | | | | | |


## Contribute

The development is ongoing, and there are many new features that we want to add.
Expand Down
14 changes: 8 additions & 6 deletions darts/dataprocessing/transformers/boxcox.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@

logger = get_logger(__name__)

# TODO: extend to stochastic series


class BoxCox(FittableDataTransformer, InvertibleDataTransformer):

Expand All @@ -28,7 +30,7 @@ def __init__(self,
verbose: bool = False):
"""
Box-Cox data transformer.
See https://otexts.com/fpp2/transformations.html#mathematical-transformations for more information
See https://otexts.com/fpp2/transformations.html#mathematical-transformations for more information.
Parameters
----------
Expand Down Expand Up @@ -94,7 +96,7 @@ def ts_fit(series: TimeSeries,
if lmbda is None:
# Compute optimal lmbda for each dimension of the time series. In this case, the return type is
# a pd.core.series.Series, which is not inhering from collections.abs.Sequence
lmbda = series._df.apply(boxcox_normmax, method=method)
lmbda = series.pd_dataframe(copy=False).apply(boxcox_normmax, method=method)
elif isinstance(lmbda, Sequence):
raise_if(len(lmbda) != series.width,
"lmbda should have one value per dimension (ie. column or variable) of the time series",
Expand All @@ -109,19 +111,19 @@ def ts_fit(series: TimeSeries,
def ts_transform(series: TimeSeries, lmbda: Union[Sequence[float], pd.core.series.Series]) -> TimeSeries:

def _boxcox_wrapper(col):
idx = series._df.columns.get_loc(col.name) # get index from col name
idx = series.pd_dataframe(copy=False).columns.get_loc(col.name) # get index from col name
return boxcox(col, lmbda[idx])

return TimeSeries.from_dataframe(series._df.apply(_boxcox_wrapper))
return TimeSeries.from_dataframe(series.pd_dataframe(copy=False).apply(_boxcox_wrapper))

@staticmethod
def ts_inverse_transform(series: TimeSeries, lmbda: Union[Sequence[float], pd.core.series.Series]) -> TimeSeries:

def _inv_boxcox_wrapper(col):
idx = series._df.columns.get_loc(col.name) # get index from col name
idx = series.pd_dataframe(copy=False).columns.get_loc(col.name) # get index from col name
return inv_boxcox(col, lmbda[idx])

return TimeSeries.from_dataframe(series._df.apply(_inv_boxcox_wrapper))
return TimeSeries.from_dataframe(series.pd_dataframe(copy=False).apply(_inv_boxcox_wrapper))

def fit(self, series: Union[TimeSeries, Sequence[TimeSeries]]) -> 'FittableDataTransformer':
# adding lmbda and optim_method params
Expand Down
14 changes: 7 additions & 7 deletions darts/dataprocessing/transformers/scaler.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,17 +59,17 @@ def __init__(self,

@staticmethod
def ts_transform(series: TimeSeries, transformer) -> TimeSeries:
return TimeSeries.from_times_and_values(series.time_index(),
transformer.transform(series.values().
reshape((-1, series.width))),
series.freq())
return TimeSeries.from_times_and_values(times=series.time_index,
values=transformer.transform(series.values().
reshape((-1, series.width))),
fill_missing_dates=False)

@staticmethod
def ts_inverse_transform(series: TimeSeries, transformer, *args, **kwargs) -> TimeSeries:
return TimeSeries.from_times_and_values(series.time_index(),
transformer.inverse_transform(series.values().
return TimeSeries.from_times_and_values(times=series.time_index,
values=transformer.inverse_transform(series.values().
reshape((-1, series.width))),
series.freq())
fill_missing_dates=False)

@staticmethod
def ts_fit(series: TimeSeries, transformer, *args, **kwargs) -> Any:
Expand Down
4 changes: 2 additions & 2 deletions darts/datasets/dataset_loaders.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,5 +145,5 @@ def _load_from_disk(self, path_to_file: Path, metadata: DatasetLoaderMetadata) -
df = pd.read_csv(path_to_file)
if metadata.header_time is not None:
df = self._format_time_column(df)
return TimeSeries.from_dataframe(df, metadata.header_time)
return TimeSeries(df, dummy_index=True)
return TimeSeries.from_dataframe(df=df, time_col=metadata.header_time)
return TimeSeries.from_dataframe(df)
Loading

0 comments on commit cb3a602

Please sign in to comment.