-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asinh
transformation
#536
Comments
Sounds like a good idea. PR welcome 😉 |
Working on this ATM, which optimization package do you recommend for optimizing a continuous parameter? I know there's a dozen implementations of gradient descent, Newton's method, etc. but I'm not sure which of these packages are already being used by MLJ.jl (I don't want to add too much compilation time). |
The only optimization that is a core part of MLJ is MLJTuning, but that currently applies only to supervised models. I'm somewhat embarrassed to say that UnivariateBoxCoxTransformer uses a simple grid search over a fixed parameter range (the resolution is a hyperparameter). It seems to work fine, and we've not had any complaints about it. That said, I'd be happy for you to implement something better here (or in that case). If you need an extra dependency, I suggest putting the implementation in a separate package. (We certainly don't want to add any AD dependency); see MLJTSVDInterface.jl for a template. Generally, going forward, I'm reluctant to add new built-in models to MLJModels, so this might be best in any case. What do you think? |
I see in other threads you have been considering using MLJTuning.jl for optimization. I think this is probably overkill, and will require you to formulate the problem as a supervised learning problem. That's not impossible, I just think that's an unnecessarily complicated route. I see that the SciPy BoxCox implementation uses Brent's method (a bisection method), which is reliable and plenty fast for 99% of the use cases I can think of. It is provided by Optim.jl here. I'll have a think about whether we want to add Optim.jl as a dep, but you can always put your implementation in a standalone package. It could then implement both the MLJModelInterface.jl and TableTransforms.jl interfaces, if you want. We recently did this at Imbalance.jl. (But note that those models are |
Hmm, that's surprising. I didn't know MLJTuning.jl only supported supervised models. You're 100% right this is overkill for the use case, but I'm on a yak shaving quest and nothing can stop me (except my limited attention span and time). What I started looking for (and really, the cleanest way to accomplish this) is some kind of generic interface that separates the optimizer or model-fitting procedure from the model itself. I basically just want to tell the model, "Use a Box-Cox transformation for the predictors, followed by a neural network/linear regression/whatever. We can optimize all the parameters with gradient descent at the same time." Or I might want to use a model, then apply a Box-Cox transformation to the output, with everything being autodiffed through. |
| Hmm, that's surprising. I didn't know MLJTuning.jl only supported supervised models. Minor correction. MLJTuning also supports (possibly unsupervised) outlier detection models. I guess, in principle, it supports any model implementing predict (e.g., KMeans clustering), so long as you have a way of pairing the output of As a side note, the general consensus seems to be that MLJ's abstract model type hierarchy was not an optimal design decision, but it's considerably embedded in the the eco-system. For example, it means independently developed models, like those provided by TablesTransforms.jl, cannot be integrated into MLJ without the use of a wrapper or a more complex "duplicate" interface. Of course, the 3rd party package could instead buy into the type hierarchy, but they may have good reasons for not doing so. Returning to your "exploration" of an One fly in the ointment is that I understand that I am not addressing the fact that |
I think what you're suggesting works, but for now I've implemented a hard-coded optimizer using Newton's method 😅 Mostly just to separate this into 2 PRs, since overhauling the whole workflow for tuning unsupervised models seems like it should get a separate issue. |
Similar to the Box-Cox transformation, the
asinh
or pseudolog transformation is a common transformation for reducing skewness and stabilizing variance. It's most often used for variables that are roughly log-normal, but can take on both positive and negative values; for example, net worth is often well-modeled as log-normal for the majority of the population, but can be negative if debts exceed assets.asinh(x/2)
is approximately equal toln(|x|)
for large values of|x|
, but is approximately equal tox
for values close to 0.The general form of the transformation is
x = scale * asinh(x / (2scale))
, withscale
a parameter chosen to satisfy some criterion such as stable variance, minimum skewness, or maximizing the log-likelihood that the data come from a normal distribution.The text was updated successfully, but these errors were encountered: