-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom MAE Python Loss Function Does Not Match C++ Results #6744
Comments
Hey @OUStudent, thanks for using LightGBM.
Can you link them here? I'm pretty sure the answer in all of them is the |
Here is one example of an issue similar to this issue that was marked as completed but yet no definitive answer was given (#2128). You also mention the For the MAE loss function, what values should I set the |
Hi, @OUStudent, thanks for using LightGBM.
|
@shiyu1994 Thank you. I believe (1) the initial score being the median of the dataset should be better documented. However, for (2), I am struggling to understand how I can change that on my end on the Python side. I am I able to change this? If yes, can you please give a basic example on how to do so. |
For (2), it is not trivial to implement with python interface solely. What I can suggest is by using continual training for every iteration, |
@shiyu1994 Are you able to give a basic example on how to do so? Is there any documentation on how do so in general for custom loss functions? @jameslamb @jmoralez Why is there an advertisement on the ability to implement custom loss functions for LightGBM in Python when one cannot even implement the most basic of machine learning loss functions (MAE in this example) using the advertised tools without jumping through hoops? Why is there such a lack of documentation on how to actually replicate loss functions in Python? |
We know that work needs to be done, it just hasn't been yet. It's tracked in #6440, you can subscribe to that if you'd like to be notified of changes.
It is not true that you "cannot implement" MAE through a custom objective function in "implement MAE" and "match the behavior of LightGBM's C++ implementation of MAE" are different tasks. Math that might seem "basic" symbolically has more than one possible "correct" representation in software like this library, with different tradeoffs for speed, memory-efficiency, and precision. The "hoops" you're referring to are coming from you trying to imitate LightGBM's particular choices made in the face of those choices in your own code. Notice that despite you saying LightGBM's implementation is not "mathematically correct", it outperforms the Python one that you described as "mathematically correct". If you'd like to learn why, there's some more information on using the median in the ways @shiyu1994 described here: https://explained.ai/gradient-boosting/L1-loss.html#sec:1.3
Your example code is being used to make exact comparisons on the result for a small dataset. That might be misleading... LightGBM's defaults for preventing overfitting and some sources of nondeterminism (on by default in exchange for speed) mean those differences are driven by other factors besides the objective function. I adjusted your example to use these parameters: common_params = {
# turn off multi-threading (can lead to different results from numerical precision differences
"n_jobs": 1,
# always use the same type of Dataset construction
"force_row_wise": True,
# prefer the slower but deterministic form of some other operations
"deterministic": True,
# allow leaf nodes that only match a single sample
"min_data_in_leaf": 1,
# use the same set of pseudorandom numbers for anything requiring randomness inside lightGBM
"seed": 708
} And ran that with Python 3.11 and code (click me)from sklearn.datasets import make_regression
import lightgbm as lgb
import numpy as np
print(lgb.__version__)
x, y = make_regression(n_samples=1_000, n_features=3, random_state=42)
def mae_loss_v1(y_true, y_pred):
grad = -np.sign(y_true - y_pred)
hess = np.zeros(y_true.shape)
return grad, hess
def mae_loss_v2(y_true, y_pred):
grad = -np.sign(y_true - y_pred)
hess = np.ones(y_true.shape)
return grad, hess
def mae_loss_v3(y_true, y_pred):
grad = np.sign(y_true - y_pred)
hess = np.ones(y_true.shape)
return grad, hess
def mae_metric(y_true, y_pred):
return np.mean(np.abs(y_true - y_pred))
common_params = {
# turn off multi-threading (can lead to different results from numerical precision differences
"n_jobs": 1,
# always use the same type of Dataset construction
"force_row_wise": True,
# prefer the slower but deterministic form of some other operations
"deterministic": True,
# allow leaf nodes that only match a single sample
"min_data_in_leaf": 1,
# use the same set of pseudorandom numbers for anything requiring randomness inside lightGBM
"seed": 708
}
model = lgb.LGBMRegressor(**common_params, objective=mae_loss_v1)
model.fit(x, y)
print(f"mae_loss_v1: {mae_metric(y, model.predict(x))}")
# mae_loss_v1: 101.29960471190125
model = lgb.LGBMRegressor(**common_params, objective=mae_loss_v2)
model.fit(x, y)
print(f"mae_loss_v2: {mae_metric(y, model.predict(x))}")
# mae_loss_v2: 92.10924897938341
model = lgb.LGBMRegressor(**common_params, objective=mae_loss_v3)
model.fit(x, y)
print(f"mae_loss_v3: {mae_metric(y, model.predict(x))}")
# mae_loss_v3: 110.91077035541966
model = lgb.LGBMRegressor(**common_params, objective="mae")
model.fit(x, y)
print(f"built-in 'mae': {mae_metric(y, model.predict(x))}")
# built-in 'mae': 3.140136718338369 Saw these results:
There is also work planned to make |
Description
MAE loss function implemented in Python does not match results from C++ implemented MAE despite being (1) mathematically correct (mae_loss_v1), (2) adjusting for hessian of ones (mae_loss_v2), and (3) matching C++ implementation (mae_loss_v3).
Reproducible example
This python code trains four regressors, one using mae_loss_v1 (which is mathematically correct with hessian being zeros), another using mae_loss_v2 (which returns a hessian of ones,) mae_loss_v3 (matches the C++ implementation), and one using the "mae" C++ loss function.
Mathematically, the gradient of MAE ($-\frac{1}{n}\sum |y-\hat y|$ , where $\hat y$ is model output and $y$ is target) is the following: $\frac{\partial }{\partial \hat y }=-sign(y - \hat y)$ , where the hessian is $\frac{\partial }{\partial \hat y ^ 2 }=0$ .
The negative sign out front of the sign operation can be removed if MAE is rearranged to$-\frac{1}{n}\sum |\hat y-y|$ , as $\frac{\partial }{\partial \hat y }=sign(\hat y-y)$ .
I based mae_loss_v3 off the C++ code below:
https://github.com/microsoft/LightGBM/blob/master/src/objective/regression_objective.hpp#L207-L234
Specifically:
Printed results are
Environment info
LightGBM version 4.5.0
sklearn version 1.5.1
Additional Comments
There has been a long going problem of Python side implemented loss functions not matching C++ results. Most issues have no clear answer. If one is unable to match C++ results using a Python side implementation of a loss function due to some obscure internal calculation, it needs to be much more well communicated that it already has.
The text was updated successfully, but these errors were encountered: