-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"esel" tool not available #174
Comments
The most important parts of the error log (failing module, return code) are output to the console. Note that these are deliberately not line numbers which makes them mostly build independent. So for the vast majority of failures that gets you to the exact point of the failure without any extra tooling, just the SOL console. The big exception here is crashes (i.e. segfaults), and those are problematic even with the full esels. The printk output is part of the log, and since it is plain ascii it should be pretty obvious to read in even a raw (unparsed) log from the BMC. With the printk and the build artifacts you can usually walk the backtrace of the failure. Even internally we don't have any data beyond that for Hostboot crashes. However, your point is valid that having the error log parser would be helpful. There is a project out there to externalize that - https://github.com/open-power/errl . Unfortunately the person behind this work left us awhile back so I think the momentum may have slowed a bit... I'll try to figure out who has the ball now to get this fully integrated into op-build. |
Understood. As you mentioned, this is mainly useful in the context of crashes, which I agree with -- we only really needed this tool when part of hostboot was crashing. I've started some initial documentation on how to parse the records without errl here https://wiki.raptorcs.com/wiki/Hostboot_Debug_Howto but as you can see it's a labor intensive process and we're throwing away a lot of data that may or may not be incidentally helpful in the process. |
The esel/errl parser doesn't provide a huge amount of value for crashes. You'll get things in a slightly more readable format, but the only useful content is pretty much the printk with the backtrace that you have to manually decode. That is what we do internally as well. |
@sampmisr is now driving the errl work. |
Good to know. We might work on tooling to make this process easier. |
We had the same problem, so wrote own errl-like utility for decode HBEL: |
For the past couple of years debugging hostboot faults has been made unnecessary hard for OEMs due to the errl tool not being available. This omission gives OEMs two choices:
1.) Revert to "shotgun debugging" (guess, modify code, insert debug printf()s, rebuild, test, repeat) -- very slow and expensive
2.) Put IBM engineers in the critical path for debugging crashes -- again, relatively slow
We need some way of analysing HBEL dumps to get origin source line numbers.
The text was updated successfully, but these errors were encountered: