Predictions in Log-Cost-Space #14

mlindauer · 2017-04-04T06:55:48Z

I hope that I correctly remember our discussion yesterday about the predictions in log-cost space since I forgot my notes in my office. @frank-hutter if anything is wrong, please correct me.

@sfalkner Frank explained yesterday how he implemented the prediction in log(cost) space in SMAC and I don't know whether this is right now possible with the new RF. I hope you can please help us here.

Train the RF using log(cost) values
to get a marginalized prediction over instances
1. compute a marginalized prediction for each tree using exp(log(cost)) of all values in the leafs -> one prediction for each tree in the original cost space
2. mean and variance over all log(pred_t) for each t (so, again in the log-space)

How can we compute this with the RF? Is it possible using the python interface? Would it be inefficient to do it in Python? Can it be done within C++?

sfalkner · 2017-04-04T07:20:48Z

IIRC, you marginalize over instances yourself right now anyway. You can use the all_leaf_values method to get the actual values stored in the corresponding leaf of each tree and iterate over that to compute anything you like. I don't quite understand why you want to have the final mean and variance prediction in log-space again, though.
I don't think you would gain a lot doing this in C++. If that is a major use case and you want to do a lot of predictions with that, it would be more efficient to handle the transforms during fitting such that the marginalization is fast. That would require some C++ coding, but could be done if that turns out to improve your model quality and too slow in python.
My concern still is the constant change from log to non-log space and how that affects the RF predictions. You still train it on log data, so you assume a log-normal distribution, but you want to marginalize the 'normal' values of which you will take the log again. I don't know if that's what you actually want...

sfalkner · 2017-04-04T07:21:47Z

I forgot to mention, I updated the tests/pyrfr_example.py file to show how the all_leaf_values method is used.

mlindauer · 2017-04-04T07:33:10Z

Thank you for the thorough explanation. Right now, my goal is to have reimplementation of the old SMAC. At some point, we should discuss with Frank whether this is the best way to do it or how we can evaluate alternatives.

mlindauer assigned sfalkner and frank-hutter Apr 4, 2017

sfalkner added the enhancement label Jan 10, 2018

jeroenrook mentioned this issue Jan 17, 2023

Predict marginalised cost over instances #71

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictions in Log-Cost-Space #14

Predictions in Log-Cost-Space #14

mlindauer commented Apr 4, 2017

sfalkner commented Apr 4, 2017 •

edited

Loading

sfalkner commented Apr 4, 2017 •

edited

Loading

mlindauer commented Apr 4, 2017

Predictions in Log-Cost-Space #14

Predictions in Log-Cost-Space #14

Comments

mlindauer commented Apr 4, 2017

sfalkner commented Apr 4, 2017 • edited Loading

sfalkner commented Apr 4, 2017 • edited Loading

mlindauer commented Apr 4, 2017

sfalkner commented Apr 4, 2017 •

edited

Loading

sfalkner commented Apr 4, 2017 •

edited

Loading