You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now that we support JumpReLU training, the l1_coefficient is confusing since jumprelu use a l0 loss, not l1, for training. We should rename this parameter to sparsity_coefficient since it is a coefficient used to generally promote sparsity. We should also rename l1_warmup_steps to sparsity_warmup_steps.
Motivation
It is confusing to see l1_coefficient used for JumpReLU training which doesn't use L1 loss.
Alternatives
Alternatively, we could add a separate l0_coefficient / l0_warmup_steps which is only used for jumprelu training and error if l1_coefficient is provided. This would also potentially allow training a jumprelu with both L0 and L1 loss if desired.
Checklist
I have checked that there is no similar issue in the repo (required)
The text was updated successfully, but these errors were encountered:
Hi @chanind!
I've created PR #376 that implements one of the alternatives you suggested - adding a separate l0_lambda parameter specifically for JumpReLU training.
The PR:
Adds a dedicated l0_lambda parameter (default: 0.0)
Requires explicit l0_lambda specification when using JumpReLU
Maintains clear separation between l1 and l0 regularization terms
Would love to get your thoughts on this implementation!
Proposal
Now that we support JumpReLU training, the
l1_coefficient
is confusing since jumprelu use a l0 loss, not l1, for training. We should rename this parameter tosparsity_coefficient
since it is a coefficient used to generally promote sparsity. We should also renamel1_warmup_steps
tosparsity_warmup_steps
.Motivation
It is confusing to see
l1_coefficient
used for JumpReLU training which doesn't use L1 loss.Alternatives
Alternatively, we could add a separate
l0_coefficient
/l0_warmup_steps
which is only used for jumprelu training and error ifl1_coefficient
is provided. This would also potentially allow training a jumprelu with both L0 and L1 loss if desired.Checklist
The text was updated successfully, but these errors were encountered: