Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching #16

Open
pat-s opened this issue Nov 3, 2018 · 8 comments
Open

Caching #16

pat-s opened this issue Nov 3, 2018 · 8 comments
Labels
Priority: Medium Status: Needs Design Needs some thought and design decisions.
Milestone

Comments

@pat-s
Copy link
Member

pat-s commented Nov 3, 2018

After dealing with the idea of caching in mlr recently, I think this is an important topic for mlr3.
It would be a core base feature and should be integrated right from the start.

While in mlr I just did it for caching filter values for now, we should think of implementing it as a pkg option and make it available for all calls (resample, train, tuning, filtering, etc).

Most calls (dataset, learner, hyperpars) are unique and caching won't have that much of an effect as for filtering (for which the call to generate the filter values is always the same and the subsetting happens afterwards).

However, it can also have a positive effect on "normal" train/test calls:

  • If a run (resample, tuneParams, benchmark) errors and a seed is set, the user can just rerun and profit from the cached calls
  • For tuning methods like grid search settings might be redundant more often and the user can profit from caching
  • Most often it will apply for simple train/test calls without tuning.

I've added a function delete_cache() and get_cache_dir() in my mlr PR to make the cache handling more convenient. We could think about a own class cache for such things.

Please share your opinions.

@mllg
Copy link
Member

mllg commented Nov 3, 2018

I think caching would be best implemented in mlr3pipelines. This way we could cache any preprocessing step.

@berndbischl
Copy link
Member

After dealing with the idea of caching in mlr recently, I think this is an important topic for mlr3.

agreed

@berndbischl
Copy link
Member

I think caching would be best implemented in mlr3pipelines. This way we could cache any preprocessing step.

i think so too. and this should really be properly defined in a design doc,as this is pretty hard to get right. for this we need a little bit more progress on mlr3pipelines.

@berndbischl
Copy link
Member

shall we move the issue?

@pat-s pat-s transferred this issue from mlr-org/mlr3 Nov 4, 2018
@berndbischl
Copy link
Member

  • If a run (resample, tuneParams, benchmark) errors and a seed is set, the user can just rerun and profit from the cached calls

do you mean for degugging purposes? seems potentially nice then, but i really would hold that separate as usecase and regarding technical solutions

@berndbischl
Copy link
Member

like i said above to move forward we need a formal definition / proposal how caching would work in mlr3

@mb706
Copy link
Collaborator

mb706 commented Jan 12, 2019

Some Thoughts:

  • mlr3 makes caching potentially easier because it provides hashes for us, although see Hashing might not work reliably for caching in pipelines mlr3#132
  • Caching has larger benefit for some PipeOps (e.g. filtering) than others (e.g. scaling); important is the trade-off between how much memory the cached state takes vs. how long the computation takes. There should probably be a way for the user to change caching behaviour depending on his use case.
  • In some cases (e.g. Filtering) it would be easiest to just cache the $state of a PipeOp; if a known input task is seen on $train(), one can then just call $predict() with the cached $state instead. This doesn't work for all PipeOps; it doesn't work for PipeOpLearnerCV.

@mb706 mb706 added this to the far milestone Jan 30, 2019
@mb706 mb706 added Status: Needs Discussion We still need to think about what the solution should look like and removed Priority: Medium labels Aug 6, 2019
@mb706 mb706 removed this from the far away milestone Aug 19, 2019
@mb706 mb706 added this to the v0.2 milestone Sep 17, 2019
@mb706 mb706 modified the milestones: v0.2, v0.3 Feb 25, 2020
@pfistfl
Copy link
Member

pfistfl commented Apr 19, 2020

tackled in #382

@mb706 mb706 added Status: Needs Design Needs some thought and design decisions. and removed Status: Needs Discussion We still need to think about what the solution should look like labels Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Medium Status: Needs Design Needs some thought and design decisions.
Projects
None yet
Development

No branches or pull requests

5 participants