Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add YeoJohnson transformer #197

Open
tlienart opened this issue Feb 29, 2020 · 6 comments
Open

Add YeoJohnson transformer #197

tlienart opened this issue Feb 29, 2020 · 6 comments

Comments

@tlienart
Copy link
Collaborator

tlienart commented Feb 29, 2020

Here: https://github.com/tk3369/YeoJohnsonTrans.jl with creds to Tom Kwong

it's in a similar vein to BoxCox

@azev77
Copy link
Contributor

azev77 commented Mar 1, 2020

@tlienart this is great!
Here are the pre-processing options I use in Caret:

"BoxCox", "YeoJohnson", "expoTrans", "center", "scale", "range", "nzv",
"knnImpute",  "bagImpute",  "medianImpute",  "pca",  "ica", "spatialSign"

dp= preProcess(d, method = c("center", "scale", "YeoJohnson", "nzv"))

Recently, I've found bestNormalize helpful.

@tlienart
Copy link
Collaborator Author

tlienart commented Mar 2, 2020

Cool! could you explain what

  • nzv (non zero value?)
  • bagImpute
  • spatialSign

do ?

(also could you link to relevant doc from Caret?)

Finally bestNormalize looks great, if someone felt up to coding something like that in Julia it'd be great to interface with it

@azev77
Copy link
Contributor

azev77 commented Mar 10, 2020

@tlienart btw
Several of the preprocessing options above are for imputing missing data.

MLJ comes w Ames housing data, they already imputed missing values & cleaned the data. Do you know which methods they used?

I think The most popular is:
https://github.com/stefvanbuuren/mice
(I wish it was faster...)

Julia has:
https://github.com/invenia/Impute.jl

You may have seen: https://discourse.julialang.org/t/how-to-do-multiple-imputation-on-julia/17713/14

@tlienart
Copy link
Collaborator Author

So we have imputation mechanisms here (https://github.com/alan-turing-institute/MLJModels.jl/blob/master/src/builtins/Transformers.jl) which currently allows you to do whatever you want that is column based (e.g.: median/mean imputation).

More intricate imputation becomes a full blown transformer, and we could just interface with any package that would provide such a thing. Interfacing with Impute.jl would be nice, they don't offer much more than what we already do but do offer carry forward for instance (which we could easily also implement here). Assuming impute.jl will grow over time it's probably still a good idea to interface with it.

(maybe open another issue for discussion of imputers?)

@azev77
Copy link
Contributor

azev77 commented Mar 10, 2020

Impute.jl is discussing implementing something like mice
invenia/Impute.jl#3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants