You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue was observed and reported by Jack Wimberley.
If there is a region with very few original samples, decision tree can build a leaf with samples only from target distribution (> min_samples_leaf) and 0 (exactly zero) from original.
As a result, 'corrections' made by a tree do not affect train weights, but this results in blowing up weights on the test.
Workarounds
Basically, almost anything from
increase min_samples_leaf
subsample=0.5
increase regularization (available in develop version)
(and any combination of the above) works well and resolves the problem in practice.
Proper solution (not available now)
Good, correct solution would be to introduce parameter 'minimal number of samples from original distribution in a leaf', but this isn't supported by decision trees of scikit-learn (or any other library).
The text was updated successfully, but these errors were encountered:
This issue was observed and reported by Jack Wimberley.
If there is a region with very few original samples, decision tree can build a leaf with samples only from target distribution (> min_samples_leaf) and 0 (exactly zero) from original.
As a result, 'corrections' made by a tree do not affect train weights, but this results in blowing up weights on the test.
Workarounds
Basically, almost anything from
(and any combination of the above) works well and resolves the problem in practice.
Proper solution (not available now)
Good, correct solution would be to introduce parameter 'minimal number of samples from original distribution in a leaf', but this isn't supported by decision trees of scikit-learn (or any other library).
The text was updated successfully, but these errors were encountered: