Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow fitting arbitrary @formulas #13

Closed
rikhuijzer opened this issue Oct 13, 2021 · 2 comments
Closed

Allow fitting arbitrary @formulas #13

rikhuijzer opened this issue Oct 13, 2021 · 2 comments

Comments

@rikhuijzer
Copy link
Member

Currently, there are only a few models available via this interface. I suggest implementing also adding a FormulaRegressor for arbitrary formulas, via StatsModels.@formula(...).

@ablaom, what do you think? Would this make sense to add this to this interface?

@ablaom
Copy link
Member

ablaom commented Oct 13, 2021

Something along these lines would be useful and might help us win over some R users 🙏🏾 . I would support this.

However, I think that even more useful would be separate MLJ formula-based transformer that can be inserted anywhere in an MLJ pipeline (or other composite model). Here "formula" means "one-side formula"; I don't think two-sided formulas make much sense in the MLJ context because the target and features are treated separately, like in sklearn.

This transformer would probably be a Static model with a one-sided StatsModels formula as parameter. Ideally, and for consistency, it would perform a table-to-table transformation, rather than a table-to-matrix transformation, which is what StatsModels does. This does cause problems for very-high cardinality categorical features (which get one-hot encoded when you apply StatsBase formula??) but does have the advantage that new columns would come with informative names for interpretation downstream of the transformer. Actually, it probably makes sense not to force one-hot encoding anyway, as not all supervised models need this and we already have transformers to do one-hot encoding which generate the new column names.

I recall slack discussions with @kleinschmidt about this (now lost to the ether). Perhaps he would care to chime in.

See also JuliaAI/MLJModels.jl#314.

@ablaom
Copy link
Member

ablaom commented Oct 14, 2021

Okay I've created a new issue here specific to the suggestion not immediately addressing the initial comment. So further comment on that should go there, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants