Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LRNRQ] Add gam from package mgcv #40

Closed
7 of 8 tasks
vviers opened this issue Nov 30, 2020 · 7 comments
Closed
7 of 8 tasks

[LRNRQ] Add gam from package mgcv #40

vviers opened this issue Nov 30, 2020 · 7 comments
Labels
Learner Status: Deployed Learner is deployed in package

Comments

@vviers
Copy link

vviers commented Nov 30, 2020

Algorithm

Generalised Additive Model

Package

mgvc

Supported types

  • classif
  • regr
  • surv
  • dens

I have checked that this is not already implemented in

  • mlr3
  • mlr3learners
  • mlr3extralearners
  • Other core packages (e.g. mlr3proba)

Comments

We have started to implement this learner for our own use here : Signaux Faibles - mlr3extralearners - feat/gam_learner and were wondering if you would be interested in getting a PR from us ? We know that classif.gamboost essentially fits a GAM too (right?) but we really like the mgvc implementation.

@RaphaelS1
Copy link
Contributor

We have started to implement this learner for our own use here : Signaux Faibles - mlr3extralearners - feat/gam_learner and were wondering if you would be interested in getting a PR from us ?

Hey, just took a quick look at your fork, looks like you've already done the hard work so definitely interested in adding your learner!

We know that classif.gamboost essentially fits a GAM too (right?) but we really like the mgvc implementation.

Well classif.gamboost is probably the boosted version of what you want to implement. But it is definitely a good idea just to have straight gams as well so this is a great addition.

You marked that the learner is available for regr and surv as well, would you be interested in implementing these two? There's no requirement to do so but it's always nice to have the 'complete set' when possible.

If you do want to go ahead and open a PR please just make sure you've also got the other required files as well (unit tests, parameter tests, yml...) and that you're happy to continue maintaining the learner going forward.

@pierrecamilleri
Copy link
Contributor

Hi, thank you for your hints, I will soon work on this, and would be glad to maintain the learner afterward.

I have a question concerning the implementation: mgcv::gam is mainly parametrized through a formula, which controls the smoothers used for each individual feature, possible interactions etc.

What would be your recommandation for passing this information to mlr3 ? Is it OK to pass the formula as ParamUty to the learner ?
The problem is that the formula is mixing information about the task (feature and target names), and the learner (smoothing parameters). I checked on other learners and I haven't found any using a formula (when available, feature matrices are used instead).

I have seen that there is a "formula()" method for a task that seems not to do much more than reconstructing a formula from a list of features, so I don't think it is of any help here. It can even be confusing in the case that the learner uses a different formula than the default one given by task$formula().

I have also found this open issue (that even links to a mlr-issue about mgcv::gam), in which a limitation of including the formula in the learner is mentionned:
A benchmark with one or multiple learners and multiple tasks would fail because the data set in one task might be not compatible with the formula specified in the learner. Concerns are also raised about the tuning of the formula.

However, the final thoughts seem to point towards the integraton of the formula as a parameter of the learner. Can I do it this way ?

@RaphaelS1
Copy link
Contributor

RaphaelS1 commented Jan 13, 2021

Hi @JazzyPierrot this is indeed a discussion we've been having internally, though I note that open issue is 1.5 years old and with no discussion.

Pinging @adibender and @berndbischl here as we discussed this recently and I want to highlight this learner as another good example for when a separated 'formula' interface may be useful/required in a learner.

For now I can suggest one of the following options:

  1. As suggested by you, use ParamUty for formula and give this the required property
  2. Use the usual mlr3 interface with a task and then add parameters to the learner for smoothing e.g. for smoothing function s have a ParamUty which would contain the name of variables to wrap in the s function, so internally say you've named the parameter set, ps, then mgcv::s(ps$values$s) (and add this to the formula in .train)

@berndbischl
Copy link
Member

berndbischl commented Jan 13, 2021

Pinging @adibender and @berndbischl here as we discussed this recently and I want to highlight this learner as another good example for when a separated 'formula' interface may be useful/required in a learner.

thx. if we want this we need to discuss this in the engineering call with @mllg

(i meant: the formula interface in general)

@pierrecamilleri
Copy link
Contributor

Well noted, thank you for you feedback ! And good luck with the formula interface.

@adibender
Copy link

I think if one can not specify the model formula directly, this would be a major hindrance for usability. Imagine you want to specify y ~ s(x1, k = 20, bs = "ad") + s(x2, bs = "re") + te(x3, x4, k = 10) + x5 + x6 + x5*x6 + s(x7, by = x8) .... and this is just the tip of the iceberg. There are so many formula specials and options to specify a model in mgcv.
I think specifying this via params and then pasting it together internally will create more overhead than necessary and it is impossible to anticipate all possible specifications. For the user specifying the model would become majorly annoying.
In practice, I usually don't want to tune parameters of the GAM, I am mostly interested whether my specification of the model (or couple of them), perform better than other algos. But I'm not interested in tuning it, outside of maybe method, optimizer, gamma, etc., which are all parameters outside of the formula.

@RaphaelS1 RaphaelS1 added Learner Status: Request For requesting a new learner Learner Status: Development Learner is in development and removed new learner Learner Status: Request For requesting a new learner labels Jan 19, 2021
@RaphaelS1
Copy link
Contributor

Classif and regr closed in #61. Surv in #63

@RaphaelS1 RaphaelS1 added Learner Status: Deployed Learner is deployed in package and removed Learner Status: Development Learner is in development labels Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Learner Status: Deployed Learner is deployed in package
Projects
None yet
Development

No branches or pull requests

5 participants