Add decomposition methods OnlineSVD, OnlinePCA, OnlineDMD/wC + Hankelizer #1509

MarekWadinger · 2024-03-06T08:29:44Z

Hello @MaxHalford, @hoanganhngo610, and everyone 👋,

In #1366, @MaxHalford showed interest in implementation of OnlinePCA and OnlineSVD methods in river.

Given my current project involvement with online decomposition methods, I believe the community could benefit from having access to these methods and their maintenance over time. Additionally, I am particularly interested in DMD, which combines the advantages of PCA and FFT. Hence, I propose the introduction of three new methods as part of the new decomposition module:

decomposition.OnlineSVD implemented based on Brand, M. (2006) (proposed by @MaxHalford in issue) with some considerations on re-orthogonalization. Since it is required quite often, compromising computation speed, it could be interesting to align with Zhang, Y. (2022) (I made some effort to implement but I'm yet to expore validity and possibility to implement revert in similar vein).

decomposition.OnlinePCA implemented based on Eftekhari, A. (2019) (proposed by @MaxHalford in issue), as it is currently state-of-the-art with all the proofs and guarantees. Would be happy to validate together if all considerations are handled in proposed OnlinePCA.

decomposition.OnlineDMD implemented based on Zhang, H. 2019. It can operate as MiniBatchTransformer, MiniBatchRegressor (sort of), and works with Rolling so I would need some help figuring out how we'd like to classify it (maybe new base class Decomposer.

Additionally, I propose preprocessing.Hankelizer, which could be beneficial for various regressors and particularly useful for enhancing feature space by introducing time-delayed embedding.

I've tried to include all necessary tests. However, I need to investigate why re-orthogonalization in OnlineSVD yields significantly different values when tested on various operating systems (locally, all tests pass).

Looking forward for your comments and revisions. 😌

…eig + FIX: exponential w in learn many + MINOR: robustness

… ADD: score attribute

Standardization of input shapes

: minor issues

…f OnlineDMDwC + TEST: OnlineDMDwC

…ovement

review-notebook-app · 2024-06-04T08:45:13Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

MarekWadinger · 2024-06-04T09:11:41Z

Hello @MaxHalford and @hoanganhngo610, 👋

I believe the methods are ready for benchmarking. The results are published in this notebook.

In the plot I combine two checks, performance w.r.t. number of features and delay imposed by conversion from pd.DataFrame (dict) to np.array used in the core.

Mean absolute number of processed samples per second is provided here (for n features in range(3,20) as it remains pretty stable):

np.array
3102 OnlineDMD
19553 OnlinePCA
631 OnlineSVD (Probably will be completely replaced by Zhang implementation bellow)
3503 OnlineSVDZhang
pd.DataFrame
1267 OnlineDMD
18012 OnlinePCA
683 OnlineSVD
1718 OnlineSVDZhang

The results in the notebook indicate that using pd.DataFrame slows down OnlinePCA, which is the fastest decomposition implementation, by up to 14%. However, I believe your concerns are likely related to the fact that the core of the decomposition methods works with np.arrays, correct?

What are your thoughts on the performance and adequacy of the evaluation?

Thanks for your time 🙏

MarekWadinger and others added 30 commits February 13, 2024 17:16

Initial commit

42e77e0

ADD: Class DMM, OnlineDMD, with Control, Weighting and Windowing

ec86781

UPDATE: make r optional + REFACTOR: DMDC -> DMDwC

c922746

UPDATE: align inputs with river.MiniBatchRegressor

ddac257

UPDATE: align notation of DMDMD and ODMD

37fa925

FIX: missing _Y buffer for xi comp + REMOVE: cvxpy dependency of xi comp

c73362f

ADD: initial implementation of SubIDDriftDetector

68d1e44

UPDATE: remove cvxpy dep of DMD

95d87bf

ADD: input Y for compatibility + FIX: known B handling

a522752

ADD: r to control truncation of eig + REFACTOR: rename eigs_modes -> …

4d25fe4

…eig + FIX: exponential w in learn many + MINOR: robustness

REFACTOR: train_size -> ref_size; _drift_detected -> drift_detected +…

057e006

… ADD: score attribute

ADD: hankel function

17219d6

FORMAT: ruff

ccfd725

ADD: automations and dev tools

ae2f0b6

FIX: Py3.9 compatibility

9d7d460

UPDATE: actions versions

ad931c2

UPDATE: actions versions

bd1f372

ADD: OnlineDMD tests

2ab0d72

UPDATE: badge handling

4fcaaff

REMOVE: redundant arguments in action

db21046

ADD: tests + FIX: _update_many; _init_update

5a2ca4f

FORMAT: ruff

86f4ad4

FIX: numerical precison issue in tesst

1f15527

ADD: tranform_one and transform_many options

d669ec3

FIX: inputs compatibility issues

68676b4

UPDATE: standardize inputs shape (m, n) -> (n, m)

428a337

UPDATE: try to speed up eig computation

b7ed6e3

UPDATE: standardize inputs shape (m, n) -> (n, m) + speed up

54e6833

ADD: TODO item

4784d80

Merge pull request #1 from MarekWadinger/dev

02f28bd

Standardization of input shapes

MarekWadinger added 25 commits March 21, 2024 16:37

FIX: dims in OnlineDMD/wC + FIX

193488d

: minor issues

FIX: minor issues with different attr. combinations + UPDATE: modes o…

6e5dd3b

…f OnlineDMDwC + TEST: OnlineDMDwC

ADD: OnlineSVD Zhang implementation for efficient reorthogonalization

8e2985f

FIX: eigvals sorting

f84ed46

UPDATE: use **params for learning in pipeline

d67ffbd

ADD: OnlineSVD revert using Zhang

18d76c2

FIX: mainly mypy alignment

fa1ebcc

FIX: notation; REFACTOR: extract funs + hierarchy; DOCUMENTATION

a8c9796

REFACTOR: _init_first_pass to reduce redundancy + minor comments impr…

7b79eb1

…ovement

ADD: revert multisample support; FIX: svd._V -> svd._Vt

af3aea5

FIX: typo in U update

06c1932

FIX: osvd learn_many and doctests

54bb2ab

FIX: minor issues for specific scenarios

1d68d28

UPDATE: more control by user + FIX: minor issues

08472bb

FIX: hiden problem with revert method

cbac2c5

FIX: problems occuring on rare occasions

6a9204d

UPDATE: major changes in revert and new nomenclature in OnlineSVDZhang

1e7f804

DATE: major changes in revert and new nomenclature in OnlineSVDZhang

b8a2fa2

Merge branch 'main' of github.com:MarekWadinger/river

81cc850

MINOR: changes hard to categorize

e7870bf

MINOR: fixtures and refactoring

4dfd138

FIX: revert logic when y=None

f88ba4c

UPDATE: enable np.array for benchmarking

caaa432

UPDATE: drop warnings in initialization

9674783

ADD: benchmark decomposition methods np vs pd inputs

f5352c6

MarekWadinger added 2 commits August 9, 2024 11:25

Merge branch 'main' of github.com:online-ml/river into online-ml-main

018a614

Merge branch 'online-ml-main'

52e6ad4

gbolmier mentioned this pull request Oct 1, 2024

Incremental PCA implementation #3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add decomposition methods OnlineSVD, OnlinePCA, OnlineDMD/wC + Hankelizer #1509

Add decomposition methods OnlineSVD, OnlinePCA, OnlineDMD/wC + Hankelizer #1509

MarekWadinger commented Mar 6, 2024 •

edited

Loading

review-notebook-app bot commented Jun 4, 2024

MarekWadinger commented Jun 4, 2024

Add decomposition methods OnlineSVD, OnlinePCA, OnlineDMD/wC + Hankelizer #1509

Are you sure you want to change the base?

Add decomposition methods OnlineSVD, OnlinePCA, OnlineDMD/wC + Hankelizer #1509

Conversation

MarekWadinger commented Mar 6, 2024 • edited Loading

review-notebook-app bot commented Jun 4, 2024

MarekWadinger commented Jun 4, 2024

MarekWadinger commented Mar 6, 2024 •

edited

Loading