Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HistGradientBoostingRegressor #105

Open
samuelefiorini opened this issue Apr 6, 2023 · 3 comments
Open

Add support for HistGradientBoostingRegressor #105

samuelefiorini opened this issue Apr 6, 2023 · 3 comments

Comments

@samuelefiorini
Copy link

I use Greykite to forecast hourly time-series with years of historical data and fit_algorithm=gradient_boosting is very slow.

According to sklearn.ensemble.HistGradientBoostingRegressor

This estimator is much faster than GradientBoostingRegressor for big datasets (n_samples >= 10 000).

have you considered adding support for this estimator? It looks straightforward from here, but I may be wrong.

@amyfei2015
Copy link

Thanks for the suggestion! We haven't planed for this yet, but we now take a note. Will update with you if we have this feature implemented. In the meanwhile please feel free to submit a pull request for this feature change if you need to use that. Thanks!

@samuelefiorini
Copy link
Author

Thanks, I did some experiments (here) and I've been able to make it run (it's far from being a PR though). In my case (hourly forecast with 2+ years of historical data) HistGradientBoostingRegressor is way faster than GradientBoostingRegressor (around 4x) while it has roughly the same performace in backtest.

However, there are also some points of discussion. For instance: due to its implementation, HistGradientBoostingRegressor does not offer a native feature importance measure. While both GradientBoostingRegressor and RandomForestsRegressor do.

A possible approach would be to rely on something like sklearn.inspection.permutation_importance, but this of course comes with higher computational cost, and it's probably not ideal. Otherwise a dummy empty array may be used, maybe raising some warning to inform the user.

@samuelefiorini
Copy link
Author

It’s been a while, but the issue regarding the addition of feature_importance in HistGradientBoosting* estimator is still open on scikit-learn: 15132. I’m adding this here for future reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants