-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Negative sWeights #51
Comments
Hi Martha, @tlikhomanenko sometime ago prepared an overview of strategies for dealing with negative weights, but the first thing you'd better try is to simply remove samples with negative weights from training (but not from testing, that's important) |
Hi Martha, Please have a look at this notebook https://github.com/yandexdataschool/mlhep2015/blob/master/day2/advanced_seminars/sPlot.ipynb prepared for a summer school. There is a part called "Training on sPlot data" where you could find several approaches how to train your classifier on data with negative and positive weights. Hope, you'll find them useful. |
For classifiers that only compute statistics on ensembles of events whilst fitting, like decision trees, I would hope that an implementation would accept negative weights, rather than doing When it should fail is if the sum of weights in an ensemble currently under study is negative. |
Thanks for your responses. I have tried removing the negative weights from my training sample and classifier.predict(X_train) is giving me an array of all 1's. Do you know why this is happening? I am using a similar method to the 'Add events two times in training' section in the notes above. |
Hey Alex, reg = GradientBoostingRegressor(n_estimators=100, max_depth=1).fit(numpy.arange(2)[:, None], numpy.arange(2), sample_weight=[-0.9999999999, 1])
reg.predict(numpy.arange(2)[:, None])
# outputs: array([9.99999917e+09, 9.99999917e+09]) No idea, but try to use |
Yes, negative weights certainly can make things go bad, but in the case of very low sample sizes sWeights also don't make much sense, they only give 'reasonable' results with 'larges ensemble (all poorly defined terms of course). That's what I was suggest algorithms don't check immediately for negative weights, but only when actually computing quantities used in the fitting. |
@alexpearce I see potential complains like "it just worked with two trees, what's the problem with the third one?" (in a huge ensemble like uboost almost surely this check will be triggered), but I don't mind if anyone decides to PR such checks. |
Yes, exactly. The check should be made at that point, rather than when the training data is first fed into the tree. And you're right, I should just open a PR if I think this is useful behaviour. I'll look into it. (You're also right, for the third time, that I might be underestimating how often an ensemble of negative weights will have a negative sum, but I would leave that problem up to the users, to tune the hyper parameters.) |
Hi,
I am trying to use the BoostingToUniformity notebook, in particular the uBoost classifier. I am getting the error message 'the weights should be non-negative'. I have tried removing this from the source code and tried to run uBoost without this line. When I use the 'predict' function I get an array of all zeros and when I try to plot the ROC curve I get nans as the output. I am wondering if there is a way of dealing with negative weights?
Many thanks,
Martha
The text was updated successfully, but these errors were encountered: