Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add information on use of ADMB -hess_step #131

Closed
Rick-Methot-NOAA opened this issue Jan 6, 2023 · 24 comments
Closed

add information on use of ADMB -hess_step #131

Rick-Methot-NOAA opened this issue Jan 6, 2023 · 24 comments
Assignees
Labels
3.30.21 release documentation Improvements or additions to documentation

Comments

@Rick-Methot-NOAA
Copy link
Collaborator

Rick-Methot-NOAA commented Jan 6, 2023

It seems useful to help users understand this new ADMB feature. Can we add a paragraph to the manual?

@Rick-Methot-NOAA Rick-Methot-NOAA added the documentation Improvements or additions to documentation label Jan 6, 2023
@Cole-Monnahan-NOAA
Copy link

@Rick-Methot-NOAA I can do this if you point me to where to add it

@iantaylor-NOAA
Copy link
Contributor

It should probably go under "running Stock Synthesis" https://nmfs-stock-synthesis.github.io/doc/SS330_User_Manual.html#sec:RunningSS, which can be modified by editing https://github.com/nmfs-stock-synthesis/doc/blob/main/12runningSS.tex and making a pull request or just posting the new text into this issue.

I'm not sure if it fits well with an existing subsection or if it needs a new one.

@chantelwetzel-noaa is the guru for all things User Manual so may have additional input.

@Cole-Monnahan-NOAA
Copy link

I'm not clear what kind of detail you want here. Could we pull some text from the original issue?

admb-project/admb#160

@Rick-Methot-NOAA
Copy link
Collaborator Author

Yes, draw from that material to the extent possible.
I think the manual could succinctly show the steps to using the feature and have a few sentences describing the concept and tips.

@Cole-Monnahan-NOAA
Copy link

How's this for a start?

The optimizer is designed run until the maximum absolute gradient (mgc) is small enough, e.g., 1E-05, and quit and do the uncertainty calculations. But if run for longer it cannot appreciably decrease this mgc. In many cases it would be interesting or advisable to get closer to the mode to confirm convergence of the model. A new feature as of ADMB 12.3 called "hess_step" takes Newton steps to update the MLE using the information in the Hessian calculated as MLE_new=MLE-(inverse Hessian)*(gradient), where the Hessian and gradient are calculated from the original MLE. If the mgc improves then this corroborates the optimizer has converged and that the negative log-likelihood surface is approximately quadratic at the mode as assumed in the asymptotic uncertainty calculations. The downside is the high computational cost due to the extra matrix calculations.

The feature is used by optimizing as normally done, and then from the command line running -hess_step for defaults (recommended), or -hess_step N -hess_step_tol eps where N and eps are the maximum number of steps to take and the tolerance (i.e., a very small number like 1e-10) after which to stop.

Here's is what it looks likes on an example, with some SS output deleted for clarity:

Hess step 0: Max gradient=0.00681329 (MGparm[1]) and min gradient= 1.52506e-09
Hess step 1: Max gradient=0.0237732 (SR_parm[1]) and min gradient= 1.25196e-05
             Updating Hessian for next step (output suppressed)...done
Hess step 2: Max gradient=0 (MGparm[1]) is below threshold of 1e-12 so exiting early

Redoing uncertainty calculations and updating files (output suppressed)... [deleted] done

The 2 Hessian step(s) reduced maxgrad from 0.00681329 to 0 and NLL by 5.5999e-06.
All output files should be updated, but confirm as this is experimental still.
The fact this was successful gives strong evidence of convergence to a mode
with quadratic log-likelihood surface.

Note that the final mgc is now numerically equivalent to 0 but the NLL is very similar.

@Rick-Methot-NOAA
Copy link
Collaborator Author

Looks good to me.

Curiosity: Will FIMS with TMB have numerical cross-derivatives such that it can use hess_step throughout the estimation process?
Historical: 30 years ago, early SS was doing hessstep throughout estimation using numerical derivatives and a crude sparseness detector such that the probability that a cross-derivative was numerically updated depended on the magnitude of the previous value for that cross-derivative. It sort of worked.

@iantaylor-NOAA
Copy link
Contributor

@Cole-Monnahan-NOAA, the new text looks good to me too and just watching this issue thread will help me remember to use this option more in the future.
My only question is with this bit: "maximum absolute gradient (mgc)". I'm assuming that "mgc" is just widely used an an abbreviation, so what about changing to "maximum (absolute) gradient component (mgc)"?

@Rick-Methot-NOAA, I'll have to let Cole or others speak to the equivalent of hess_step in TMB models, but I would guess that it could be added if not already there.

@Rick-Methot-NOAA
Copy link
Collaborator Author

regarding -hess_step. I know that in his development of MAS, Matthew was working on getting analytical cross-derivatives. So, if you have these, then I think you could switch from steepest descent, which performs slowly with highly correlated parameters, to hess_step at a much earlier stage of estimation. You incur the cost of hessian inversion, but make much more progress with each iteration. That's what I spent the 1990s trying to do with SS' numerical derivatives. I tried all kinds of tricks such as only updating the cross-derivatives and Hessian inversion infrequently on assumption that the shape of the surface was not changing that much as the model approached the best fit.

@iantaylor-NOAA
Copy link
Contributor

Contours of the posterior surface are used by the "-hybrid" option in ADMB (described in detail in this doc led by Cole: https://www.admb-project.org/developers/mcmc/mcmc-guide-for-admb.pdf), but does not seem to have been adopted much. Maybe for models without bad correlations it's not an improvement, but perhaps we should turn to it more for models with slow convergence because of correlated parameters with non-linear relationships.

I wonder if the recent case of log(R0) and steepness being highly correlated and slowing convergence would have been helped by the "a" and "b" parameterization as discussed in nmfs-ost/ss3-source-code#191.

@Rick-Methot-NOAA
Copy link
Collaborator Author

R0 and h are highly correlated when h < ~0.8 because most of the recruitment values are < R0. A re-parameterize form could use Rbar and h, then R0 would be a derived quantity. Perhaps the a,b accomplishes that, but I think not as it has the same problem by relying on slope at the origin.

@Cole-Monnahan-NOAA
Copy link

@Rick-Methot-NOAA : Yes I believe we can use them in TMB which has a way to get numerical Hessians (obj$he()). So plug that into an optimizer that uses it and it should work. I can't imagine it's that helpful early on in optimization (unless crazy correlations) but I do think we'll have the option to explore it. I did explore this a bit w/ ADMB by setting maxfn to a smallish value and then running hess_step and it did work. My intuition is that a handful of extra hess_steps is too expensive and you might as well just do it at the end. Could be that setting the convergence criterion to something like 0.1 and then doing hess_step would be a good middle ground? Something to explore.

@iantaylor-NOAA Yes mgc is just what is common. I just looked and I use "mag" in the newest ADMB release for the console output so we should use that to be consistent. -hybrid is HMC which is MCMC a little different (completely replaced by NUTS)

@Cole-Monnahan-NOAA
Copy link

@Rick-Methot-NOAA I forgot to mention... could someone paste the above text into the SS3 manual? I'm not sure where to put it. Thanks!

@Rick-Methot-NOAA
Copy link
Collaborator Author

Chantel and/or Elizabeth will get the text into the manual. Thanks for producing it.

for -hess_step: My proposal would be to:
a. set a loose convergence criterion, say 0.1 or higher. I have seen SS3 jobs go hundreds of iterations making 0.1 ish improvements in logL until converging.
b. calculate and invert the Hessian
c. do until converged: calc gradients, apply gradients to the Hess-1, repeat. Keep this same inverse hessian throughout these steps
d. when converged: one more calc of the Hessian, to get final variances

@iantaylor-NOAA
Copy link
Contributor

@Rick-Methot-NOAA, I assume step c would take a change to ADMB, but I think we could iteratively apply the rest of the steps with the model we have today by iteratively running with -hess_step and a loose convergence criterion and then restarting from the .par file in the final phase. The convergence criterion could be tightened after some number of iterations if necessary, but maybe the -hess_step would take care of that adequately.

I can try to set this up for the slow-converging petrale model to see if it makes things more efficient.

@Rick-Methot-NOAA
Copy link
Collaborator Author

Thanks Ian. I will try for Max's pandalus model.

@Rick-Methot-NOAA
Copy link
Collaborator Author

I did the experiment with a 1300 parameter, slow-to-converge model.

I set convergence criteria high: 1.00
With just two hess_steps, it took this down to 0.0. Very impressive.
The improvement I think we could use is to suppress some of the hessian calculations. In this case there was a Hessian update after hessstep 1, then another update after convergence. I would really like to see if we could suppress those second 2 Hessian updates because they are very expensive with 1300 parameters. We already are relying on a normal approximation assumption in the vicinity of the convergence. I doubt that the curvature of the surface changes appreciably when we go from very close to spot on the convergence.

image

@Rick-Methot-NOAA
Copy link
Collaborator Author

one more note on the documentation. When I run the Hessian first and then run -hess_step, ADMB prompts me to run it with -binp ss.bar.

@e-perl-NOAA
Copy link
Collaborator

@Rick-Methot-NOAA, @Cole-Monnahan-NOAA, and @iantaylor-NOAA - Does the attached pdf have the documentation for the -hess_step that you would like? Searching for hess_step should put you where you need to be and then there is a hyperlinked section that explains things further.
SS330_User_Manual.pdf

@Rick-Methot-NOAA
Copy link
Collaborator Author

Looks good. The title would be better as "additional" rather than "single".

@Cole-Monnahan-NOAA
Copy link

@Rick-Methot-NOAA I tried not updating the Hess at each step and it failed for the models I tried. I suspect that for models that don't improve after a single step the mode is not close enough to quadratic and hence the curvature will change as you take steps. That's intuition though. It would be relatively easy to build in a new flag like -hess_step_update with inputs 0 or 1 to do the updates or not. That would just be added to this line. I didn't intend for hess_step to replace the optimizer, but to be used as a convergence check. But features evolve and this could be useful as Rick suggests. I don't really have the time to work on ADMB development now so would be happy for someone else to implement that.

Regarding the warning to use the .bin file, I put that in there b/c there were some models where the initial gradient I calculated in the hess_step function were much larger than what was found at the end of optimization. Theoretically you should be able to put the MLE (read in from a binary file so full precision) through the model and get identical gradients. That was/is not the case and I think in some cases it has to do with initialization for some reason. It probably also has something to do with dev_vectors. I've reopened this issue to help track that on the ADMB side.

@e-gugliotti-NOAA Could you swap 'mgc' with 'mag' please? Thanks for implementing!

@iantaylor-NOAA
Copy link
Contributor

I just noticed that the User Manual is showing this as -hesstep instead of -hess_step: https://nmfs-stock-synthesis.github.io/doc/SS330_User_Manual.html#using--hessstep-to-do-additional-newton-steps-using-the-inverse-hessian so we still have more work to do to get this right. The solution in place right now is "-hess\textunderscore step": https://github.com/nmfs-stock-synthesis/doc/blob/main/12runningSS.tex#L30, which was one of many options explored by @e-gugliotti-NOAA in working through this issue.

@kellijohnson-NOAA
Copy link
Contributor

hess\textunderscore step works in the pdf but does not work in html. It appears that other underscores in the manual are defined by \_ and these latter ones compile properly both in the html and pdf. I will change them.

@kellijohnson-NOAA
Copy link
Contributor

It is working in both now

image

@iantaylor-NOAA
Copy link
Contributor

Wow, that was fast, thank you @kellijohnson-NOAA!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.30.21 release documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

6 participants