Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predictions in Log-Cost-Space #14

Open
mlindauer opened this issue Apr 4, 2017 · 3 comments
Open

Predictions in Log-Cost-Space #14

mlindauer opened this issue Apr 4, 2017 · 3 comments
Assignees

Comments

@mlindauer
Copy link
Contributor

I hope that I correctly remember our discussion yesterday about the predictions in log-cost space since I forgot my notes in my office. @frank-hutter if anything is wrong, please correct me.

@sfalkner Frank explained yesterday how he implemented the prediction in log(cost) space in SMAC and I don't know whether this is right now possible with the new RF. I hope you can please help us here.

  • Train the RF using log(cost) values
  • to get a marginalized prediction over instances
    1. compute a marginalized prediction for each tree using exp(log(cost)) of all values in the leafs -> one prediction for each tree in the original cost space
    2. mean and variance over all log(pred_t) for each t (so, again in the log-space)

How can we compute this with the RF? Is it possible using the python interface? Would it be inefficient to do it in Python? Can it be done within C++?

@sfalkner
Copy link
Collaborator

sfalkner commented Apr 4, 2017

IIRC, you marginalize over instances yourself right now anyway. You can use the all_leaf_values method to get the actual values stored in the corresponding leaf of each tree and iterate over that to compute anything you like. I don't quite understand why you want to have the final mean and variance prediction in log-space again, though.
I don't think you would gain a lot doing this in C++. If that is a major use case and you want to do a lot of predictions with that, it would be more efficient to handle the transforms during fitting such that the marginalization is fast. That would require some C++ coding, but could be done if that turns out to improve your model quality and too slow in python.
My concern still is the constant change from log to non-log space and how that affects the RF predictions. You still train it on log data, so you assume a log-normal distribution, but you want to marginalize the 'normal' values of which you will take the log again. I don't know if that's what you actually want...

@sfalkner
Copy link
Collaborator

sfalkner commented Apr 4, 2017

I forgot to mention, I updated the tests/pyrfr_example.py file to show how the all_leaf_values method is used.

@mlindauer
Copy link
Contributor Author

Thank you for the thorough explanation. Right now, my goal is to have reimplementation of the old SMAC. At some point, we should discuss with Frank whether this is the best way to do it or how we can evaluate alternatives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants