Caching #16

pat-s · 2018-11-03T10:45:17Z

After dealing with the idea of caching in mlr recently, I think this is an important topic for mlr3.
It would be a core base feature and should be integrated right from the start.

While in mlr I just did it for caching filter values for now, we should think of implementing it as a pkg option and make it available for all calls (resample, train, tuning, filtering, etc).

Most calls (dataset, learner, hyperpars) are unique and caching won't have that much of an effect as for filtering (for which the call to generate the filter values is always the same and the subsetting happens afterwards).

However, it can also have a positive effect on "normal" train/test calls:

If a run (resample, tuneParams, benchmark) errors and a seed is set, the user can just rerun and profit from the cached calls
For tuning methods like grid search settings might be redundant more often and the user can profit from caching
Most often it will apply for simple train/test calls without tuning.

I've added a function delete_cache() and get_cache_dir() in my mlr PR to make the cache handling more convenient. We could think about a own class cache for such things.

Please share your opinions.

The text was updated successfully, but these errors were encountered:

mllg · 2018-11-03T19:43:14Z

I think caching would be best implemented in mlr3pipelines. This way we could cache any preprocessing step.

berndbischl · 2018-11-04T19:08:06Z

After dealing with the idea of caching in mlr recently, I think this is an important topic for mlr3.

agreed

berndbischl · 2018-11-04T19:09:37Z

I think caching would be best implemented in mlr3pipelines. This way we could cache any preprocessing step.

i think so too. and this should really be properly defined in a design doc,as this is pretty hard to get right. for this we need a little bit more progress on mlr3pipelines.

berndbischl · 2018-11-04T19:09:47Z

shall we move the issue?

berndbischl · 2018-11-04T19:21:56Z

If a run (resample, tuneParams, benchmark) errors and a seed is set, the user can just rerun and profit from the cached calls

do you mean for degugging purposes? seems potentially nice then, but i really would hold that separate as usecase and regarding technical solutions

berndbischl · 2018-11-04T19:22:47Z

like i said above to move forward we need a formal definition / proposal how caching would work in mlr3

mb706 · 2019-01-12T14:19:45Z

Some Thoughts:

mlr3 makes caching potentially easier because it provides hashes for us, although see Hashing might not work reliably for caching in pipelines mlr3#132
Caching has larger benefit for some PipeOps (e.g. filtering) than others (e.g. scaling); important is the trade-off between how much memory the cached state takes vs. how long the computation takes. There should probably be a way for the user to change caching behaviour depending on his use case.
In some cases (e.g. Filtering) it would be easiest to just cache the $state of a PipeOp; if a known input task is seen on $train(), one can then just call $predict() with the cached $state instead. This doesn't work for all PipeOps; it doesn't work for PipeOpLearnerCV.

pfistfl · 2020-04-19T08:39:48Z

tackled in #382

pat-s transferred this issue from mlr-org/mlr3 Nov 4, 2018

pfistfl mentioned this issue Nov 16, 2018

add cache for pipeline ops #5

Closed

berndbischl added the Priority: Medium label Dec 20, 2018

mb706 added the UI Specs Needed label Jan 12, 2019

mb706 added this to the far milestone Jan 30, 2019

mllg removed the UI Specs Needed label Feb 15, 2019

mb706 added Status: Needs Discussion We still need to think about what the solution should look like and removed Priority: Medium labels Aug 6, 2019

mb706 removed this from the far away milestone Aug 19, 2019

mb706 added this to the v0.2 milestone Sep 17, 2019

mb706 modified the milestones: v0.2, v0.3 Feb 25, 2020

mb706 added the Priority: Medium label Aug 31, 2020

mb706 added Status: Needs Design Needs some thought and design decisions. and removed Status: Needs Discussion We still need to think about what the solution should look like labels Sep 29, 2021

pat-s mentioned this issue Aug 24, 2022

feat: add section about Filter-based feature selection mlr-org/mlr3book#398

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching #16

Caching #16

pat-s commented Nov 3, 2018

mllg commented Nov 3, 2018

berndbischl commented Nov 4, 2018

berndbischl commented Nov 4, 2018

berndbischl commented Nov 4, 2018

berndbischl commented Nov 4, 2018

berndbischl commented Nov 4, 2018

mb706 commented Jan 12, 2019

pfistfl commented Apr 19, 2020

Caching #16

Caching #16

Comments

pat-s commented Nov 3, 2018

mllg commented Nov 3, 2018

berndbischl commented Nov 4, 2018

berndbischl commented Nov 4, 2018

berndbischl commented Nov 4, 2018

berndbischl commented Nov 4, 2018

berndbischl commented Nov 4, 2018

mb706 commented Jan 12, 2019

pfistfl commented Apr 19, 2020