You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I hope that I correctly remember our discussion yesterday about the predictions in log-cost space since I forgot my notes in my office. @frank-hutter if anything is wrong, please correct me.
@sfalkner Frank explained yesterday how he implemented the prediction in log(cost) space in SMAC and I don't know whether this is right now possible with the new RF. I hope you can please help us here.
Train the RF using log(cost) values
to get a marginalized prediction over instances
compute a marginalized prediction for each tree using exp(log(cost)) of all values in the leafs -> one prediction for each tree in the original cost space
mean and variance over all log(pred_t) for each t (so, again in the log-space)
How can we compute this with the RF? Is it possible using the python interface? Would it be inefficient to do it in Python? Can it be done within C++?
The text was updated successfully, but these errors were encountered:
IIRC, you marginalize over instances yourself right now anyway. You can use the all_leaf_values method to get the actual values stored in the corresponding leaf of each tree and iterate over that to compute anything you like. I don't quite understand why you want to have the final mean and variance prediction in log-space again, though.
I don't think you would gain a lot doing this in C++. If that is a major use case and you want to do a lot of predictions with that, it would be more efficient to handle the transforms during fitting such that the marginalization is fast. That would require some C++ coding, but could be done if that turns out to improve your model quality and too slow in python.
My concern still is the constant change from log to non-log space and how that affects the RF predictions. You still train it on log data, so you assume a log-normal distribution, but you want to marginalize the 'normal' values of which you will take the log again. I don't know if that's what you actually want...
Thank you for the thorough explanation. Right now, my goal is to have reimplementation of the old SMAC. At some point, we should discuss with Frank whether this is the best way to do it or how we can evaluate alternatives.
I hope that I correctly remember our discussion yesterday about the predictions in log-cost space since I forgot my notes in my office. @frank-hutter if anything is wrong, please correct me.
@sfalkner Frank explained yesterday how he implemented the prediction in log(cost) space in SMAC and I don't know whether this is right now possible with the new RF. I hope you can please help us here.
How can we compute this with the RF? Is it possible using the python interface? Would it be inefficient to do it in Python? Can it be done within C++?
The text was updated successfully, but these errors were encountered: