You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, this package is awesome. The community that deals with time series data needed to improve the game and tsflex have everything to be the main library.
However, here are a few specific suggestions:
Remove "windows" and "strides" arguments altogether for feature extraction:
It does seem a bit excessive but hear me out. They are good arguments but not fundamental for feature extraction. They could be used in data preparation, Alteryx has a library called "compose" (https://github.com/alteryx/compose) just for the purpose of creating multiple time frame windows. Once the "window" is ready, just select the functions. I propose tsflex main function (feature_collection.calculate) just use time series data and a list of functions for feature extraction, no window or strides.
Explaining further:
The way I view the implementation of the essentials would be only this: feature_collection.calculate(time_series_df, functions).
If any of the columns of the time series had any data type other than int, float, it could simply raise an error or ignore the column.
Window and stride also make feature_collection.reduce function hard to use:
After feature selection and having selected a few columns of the many created using tsflex I use the reduce that gives me the functions for transformation/extraction. The problem is that the naming convention includes window and strides (e.g: Open__mean__w=233500_s=233500) which means I have to have a time series with the same characteristics/size, which often doesn't happen. I use the arguments windows and strides like the following:
I use this because I need to process the whole dataset.
Anyway, I hope this is helpful.
The text was updated successfully, but these errors were encountered:
arturdaraujo
changed the title
Window and stride arguments are making it harder to use the package. feature_collection.reduce doesn't make sense
Window and stride arguments are making it harder to use the package. feature_collection.reduce example
Sep 28, 2022
First of all, this package is awesome. The community that deals with time series data needed to improve the game and tsflex have everything to be the main library.
However, here are a few specific suggestions:
Remove "windows" and "strides" arguments altogether for feature extraction:
It does seem a bit excessive but hear me out. They are good arguments but not fundamental for feature extraction. They could be used in data preparation, Alteryx has a library called "compose" (https://github.com/alteryx/compose) just for the purpose of creating multiple time frame windows. Once the "window" is ready, just select the functions. I propose tsflex main function (feature_collection.calculate) just use time series data and a list of functions for feature extraction, no window or strides.
Explaining further:
The way I view the implementation of the essentials would be only this: feature_collection.calculate(time_series_df, functions).
If any of the columns of the time series had any data type other than int, float, it could simply raise an error or ignore the column.
Window and stride also make feature_collection.reduce function hard to use:
After feature selection and having selected a few columns of the many created using tsflex I use the reduce that gives me the functions for transformation/extraction. The problem is that the naming convention includes window and strides (e.g: Open__mean__w=233500_s=233500) which means I have to have a time series with the same characteristics/size, which often doesn't happen. I use the arguments windows and strides like the following:
simple_feats = MultipleFeatureDescriptors(
functions=tsfresh_settings_wrapper(settings),
series_names="Open",
windows=len(stock_data) - 1,
strides=len(stock_data) - 1,
)
feature_collection = FeatureCollection(simple_feats)
features_df = feature_collection.calculate(
stock_full, return_df=True, show_progress=True, approve_sparsity=(True)
)
I use this because I need to process the whole dataset.
Anyway, I hope this is helpful.
The text was updated successfully, but these errors were encountered: