Negative sWeights #51

marthaisabelhilton · 2018-05-29T15:02:43Z

Hi,

I am trying to use the BoostingToUniformity notebook, in particular the uBoost classifier. I am getting the error message 'the weights should be non-negative'. I have tried removing this from the source code and tried to run uBoost without this line. When I use the 'predict' function I get an array of all zeros and when I try to plot the ROC curve I get nans as the output. I am wondering if there is a way of dealing with negative weights?

Many thanks,

Martha

arogozhnikov · 2018-05-29T21:33:39Z

Hi Martha,
negative weights aren't friendly towards ML because of driving to non-convex unbounded optimization, so you should not expect those to work right for ML models (sometimes they do, however).

@tlikhomanenko sometime ago prepared an overview of strategies for dealing with negative weights, but the first thing you'd better try is to simply remove samples with negative weights from training (but not from testing, that's important)

tlikhomanenko · 2018-05-31T09:42:12Z

Hi Martha,

Please have a look at this notebook https://github.com/yandexdataschool/mlhep2015/blob/master/day2/advanced_seminars/sPlot.ipynb prepared for a summer school. There is a part called "Training on sPlot data" where you could find several approaches how to train your classifier on data with negative and positive weights. Hope, you'll find them useful.

alexpearce · 2018-06-01T11:49:29Z

For classifiers that only compute statistics on ensembles of events whilst fitting, like decision trees, I would hope that an implementation would accept negative weights, rather than doing assert (weights < 0).sum() == 0.

When it should fail is if the sum of weights in an ensemble currently under study is negative.

marthaisabelhilton · 2018-06-01T13:51:23Z

Thanks for your responses. I have tried removing the negative weights from my training sample and classifier.predict(X_train) is giving me an array of all 1's. Do you know why this is happening?

I am using a similar method to the 'Add events two times in training' section in the notes above.

arogozhnikov · 2018-06-01T14:00:50Z

@alexpearce

Hey Alex,
I don't think it is so different for trees. Things may go arbitrarily bad in very simple situations:

reg = GradientBoostingRegressor(n_estimators=100, max_depth=1).fit(numpy.arange(2)[:, None], numpy.arange(2), sample_weight=[-0.9999999999, 1])
reg.predict(numpy.arange(2)[:, None])
# outputs: array([9.99999917e+09, 9.99999917e+09])

@marthaisabelhilton

No idea, but try to use clf.predict_proba to see if those provide meaningful separation.

alexpearce · 2018-06-01T14:46:56Z

Yes, negative weights certainly can make things go bad, but in the case of very low sample sizes sWeights also don't make much sense, they only give 'reasonable' results with 'larges ensemble (all poorly defined terms of course). That's what I was suggest algorithms don't check immediately for negative weights, but only when actually computing quantities used in the fitting.

arogozhnikov · 2018-06-03T19:40:22Z

@alexpearce
Well, in such case you should check for sum in each particular leaf of the tree (since we aggregating over samples in a leaf).

I see potential complains like "it just worked with two trees, what's the problem with the third one?" (in a huge ensemble like uboost almost surely this check will be triggered), but I don't mind if anyone decides to PR such checks.

alexpearce · 2018-06-03T21:12:17Z

Well, in such case you should check for sum in each particular leaf of the tree (since we aggregating over samples in a leaf).

Yes, exactly. The check should be made at that point, rather than when the training data is first fed into the tree.

And you're right, I should just open a PR if I think this is useful behaviour. I'll look into it.

(You're also right, for the third time, that I might be underestimating how often an ensemble of negative weights will have a negative sum, but I would leave that problem up to the users, to tune the hyper parameters.)

arogozhnikov added the question label Sep 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Negative sWeights #51

Negative sWeights #51

marthaisabelhilton commented May 29, 2018

arogozhnikov commented May 29, 2018 •

edited

Loading

tlikhomanenko commented May 31, 2018

alexpearce commented Jun 1, 2018

marthaisabelhilton commented Jun 1, 2018

arogozhnikov commented Jun 1, 2018

alexpearce commented Jun 1, 2018

arogozhnikov commented Jun 3, 2018

alexpearce commented Jun 3, 2018

Negative sWeights #51

Negative sWeights #51

Comments

marthaisabelhilton commented May 29, 2018

arogozhnikov commented May 29, 2018 • edited Loading

tlikhomanenko commented May 31, 2018

alexpearce commented Jun 1, 2018

marthaisabelhilton commented Jun 1, 2018

arogozhnikov commented Jun 1, 2018

alexpearce commented Jun 1, 2018

arogozhnikov commented Jun 3, 2018

alexpearce commented Jun 3, 2018

arogozhnikov commented May 29, 2018 •

edited

Loading