-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add information on use of ADMB -hess_step #131
Comments
@Rick-Methot-NOAA I can do this if you point me to where to add it |
It should probably go under "running Stock Synthesis" https://nmfs-stock-synthesis.github.io/doc/SS330_User_Manual.html#sec:RunningSS, which can be modified by editing https://github.com/nmfs-stock-synthesis/doc/blob/main/12runningSS.tex and making a pull request or just posting the new text into this issue. I'm not sure if it fits well with an existing subsection or if it needs a new one. @chantelwetzel-noaa is the guru for all things User Manual so may have additional input. |
I'm not clear what kind of detail you want here. Could we pull some text from the original issue? |
Yes, draw from that material to the extent possible. |
How's this for a start? The optimizer is designed run until the maximum absolute gradient (mgc) is small enough, e.g., 1E-05, and quit and do the uncertainty calculations. But if run for longer it cannot appreciably decrease this mgc. In many cases it would be interesting or advisable to get closer to the mode to confirm convergence of the model. A new feature as of ADMB 12.3 called "hess_step" takes Newton steps to update the MLE using the information in the Hessian calculated as MLE_new=MLE-(inverse Hessian)*(gradient), where the Hessian and gradient are calculated from the original MLE. If the mgc improves then this corroborates the optimizer has converged and that the negative log-likelihood surface is approximately quadratic at the mode as assumed in the asymptotic uncertainty calculations. The downside is the high computational cost due to the extra matrix calculations. The feature is used by optimizing as normally done, and then from the command line running Here's is what it looks likes on an example, with some SS output deleted for clarity:
Note that the final mgc is now numerically equivalent to 0 but the NLL is very similar. |
Looks good to me. Curiosity: Will FIMS with TMB have numerical cross-derivatives such that it can use hess_step throughout the estimation process? |
@Cole-Monnahan-NOAA, the new text looks good to me too and just watching this issue thread will help me remember to use this option more in the future. @Rick-Methot-NOAA, I'll have to let Cole or others speak to the equivalent of hess_step in TMB models, but I would guess that it could be added if not already there. |
regarding -hess_step. I know that in his development of MAS, Matthew was working on getting analytical cross-derivatives. So, if you have these, then I think you could switch from steepest descent, which performs slowly with highly correlated parameters, to hess_step at a much earlier stage of estimation. You incur the cost of hessian inversion, but make much more progress with each iteration. That's what I spent the 1990s trying to do with SS' numerical derivatives. I tried all kinds of tricks such as only updating the cross-derivatives and Hessian inversion infrequently on assumption that the shape of the surface was not changing that much as the model approached the best fit. |
Contours of the posterior surface are used by the "-hybrid" option in ADMB (described in detail in this doc led by Cole: https://www.admb-project.org/developers/mcmc/mcmc-guide-for-admb.pdf), but does not seem to have been adopted much. Maybe for models without bad correlations it's not an improvement, but perhaps we should turn to it more for models with slow convergence because of correlated parameters with non-linear relationships. I wonder if the recent case of log(R0) and steepness being highly correlated and slowing convergence would have been helped by the "a" and "b" parameterization as discussed in nmfs-ost/ss3-source-code#191. |
R0 and h are highly correlated when h < ~0.8 because most of the recruitment values are < R0. A re-parameterize form could use Rbar and h, then R0 would be a derived quantity. Perhaps the a,b accomplishes that, but I think not as it has the same problem by relying on slope at the origin. |
@Rick-Methot-NOAA : Yes I believe we can use them in TMB which has a way to get numerical Hessians ( @iantaylor-NOAA Yes mgc is just what is common. I just looked and I use "mag" in the newest ADMB release for the console output so we should use that to be consistent. -hybrid is HMC which is MCMC a little different (completely replaced by NUTS) |
@Rick-Methot-NOAA I forgot to mention... could someone paste the above text into the SS3 manual? I'm not sure where to put it. Thanks! |
Chantel and/or Elizabeth will get the text into the manual. Thanks for producing it. for -hess_step: My proposal would be to: |
@Rick-Methot-NOAA, I assume step c would take a change to ADMB, but I think we could iteratively apply the rest of the steps with the model we have today by iteratively running with -hess_step and a loose convergence criterion and then restarting from the .par file in the final phase. The convergence criterion could be tightened after some number of iterations if necessary, but maybe the -hess_step would take care of that adequately. I can try to set this up for the slow-converging petrale model to see if it makes things more efficient. |
Thanks Ian. I will try for Max's pandalus model. |
I did the experiment with a 1300 parameter, slow-to-converge model. I set convergence criteria high: 1.00 |
one more note on the documentation. When I run the Hessian first and then run -hess_step, ADMB prompts me to run it with -binp ss.bar. |
@Rick-Methot-NOAA, @Cole-Monnahan-NOAA, and @iantaylor-NOAA - Does the attached pdf have the documentation for the -hess_step that you would like? Searching for hess_step should put you where you need to be and then there is a hyperlinked section that explains things further. |
Looks good. The title would be better as "additional" rather than "single". |
@Rick-Methot-NOAA I tried not updating the Hess at each step and it failed for the models I tried. I suspect that for models that don't improve after a single step the mode is not close enough to quadratic and hence the curvature will change as you take steps. That's intuition though. It would be relatively easy to build in a new flag like -hess_step_update with inputs 0 or 1 to do the updates or not. That would just be added to this line. I didn't intend for hess_step to replace the optimizer, but to be used as a convergence check. But features evolve and this could be useful as Rick suggests. I don't really have the time to work on ADMB development now so would be happy for someone else to implement that. Regarding the warning to use the .bin file, I put that in there b/c there were some models where the initial gradient I calculated in the hess_step function were much larger than what was found at the end of optimization. Theoretically you should be able to put the MLE (read in from a binary file so full precision) through the model and get identical gradients. That was/is not the case and I think in some cases it has to do with initialization for some reason. It probably also has something to do with dev_vectors. I've reopened this issue to help track that on the ADMB side. @e-gugliotti-NOAA Could you swap 'mgc' with 'mag' please? Thanks for implementing! |
I just noticed that the User Manual is showing this as |
|
Wow, that was fast, thank you @kellijohnson-NOAA! |
It seems useful to help users understand this new ADMB feature. Can we add a paragraph to the manual?
The text was updated successfully, but these errors were encountered: