-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FD-862 After factomd restart, loading blocks stucks while committing entry #663
Comments
Bug reproduced 10 times from 10 attempts. |
There are logs from factomd, nothing strange:
|
Making experiment No. 2 Result: So, the problem exists only when factomd scans directory blocks after restart and new commit accidentally arrived. |
hmm, can you try reproducing this on v6.2.1-rc2 @ilzheev? There have been an incredible number of changes in this area between 6.2.0 and 6.2.1. There was a monthlong delay going from 6.1.1 to 6.2.1 with getting the 2nd pass to download the blockchain. (this is what you had found using the when your factomd node is in ignore mode, it will not ask for things that are missing. This means it will not download new blocks. Ignore mode will only get messages that are recieved without prompting. This is enough to get federated servers booting, but isn't really good for getting follower nodes up and running quickly. your main problem is likely that you are in ignore mode, which doesn't help much as a follower. There is another change from 6.2.0 to 6.2.1 which you will notice. factomd no longer shows you how far behind it is when catching up with the blockchain. This was a security thing, and was prompted by some weirdness that we saw on the testnet a while back. |
FD-820_release_candidate_butter...FD-824_release_candidate_kraft Github isn't even able to show all the changes that are between 6.2.0 and 6.2.1 |
Problem appears on 6.2.1-rc2 as well. --
-- {"jsonrpc":"2.0","id":0,"result":{"directoryblockheight":182551,"leaderheight":182569,"entryblockheight":182551,"entryheight":182551}} |
ok, this is interesting. This might be a thing. Veena is looking to see how to replicate this in a controlled environment. The tracking ticket number is FD-862. If there are any patches for this bug they would go against this branch: https://github.com/FactomProject/factomd/tree/FD-862_boot_stall_with_API_access |
@ilzheev I am trying to replicate the issue in my environment but haven't had any success yet. Meanwhile it will be helpful for us to debug further if you share debug logs with us. You will have to start factomd with --debuglog=.* to collect these logs. so the command will look like this factomd --debuglog=.* |
There is a set of really extensive logging that can be turned on. It saves hundreds of gigabytes of text files to the harddrive where you ran the program from. I don't know how well it works in a docker environment, but it shouldn't be too hard to change the docker file around to save these kind of files. you can run factomd using command line flags that veena had mentioned. The parameter takes a regex to determine which logs to save. I run it like this: this gets all the log files. here is a list of the file names created with one of my simple local tests.
To only show election processing and network inputs, network outputs, and api commands run factomd like this.
It will only save the subset of the files which match that regular expression. |
please share how you mounted the docker container to get these log files when you succeed. |
I rebuild it & run manually:
I see in
But I can not locate this files. |
Also tried with flags |
It looks like the logging directory can't be changed beyond the directory that it gets launched in. factomd/common/messages/messageTrace.go Line 280 in 5f787fc
Thank you for the logs though. They will be helpful. |
hmm, veena couldn't replicate this issue on her setup. She couldn't replicate it with her mainnet setup nor on a local node. This sounds like a good opportunity for @ThomasMeier to get experience a) replicating the issue b) looking at the logs that the issue made c) finding a good way to fix the issue. The first step is to get it replicated in a debugger. Please see what it takes to recreate it. Pull requests can go here: https://github.com/FactomProject/factomd/tree/FD-862_boot_stall_with_API_access I'm looking forward to you showing off your development chops to the community @ThomasMeier. |
@carryforward @ThomasMeier I use default conf mainnet follower node. |
good news, @stackdump and @VeenaGondkar were able to reproduce this bug reliably. |
this reproducible with this branch. https://github.com/FactomProject/factomd/commits/FD-862_ec_commit_on_restart |
@ilzheev Thanks for the key information. I am able to reproduce the issue when I call factom lib calls. |
Have:
fully synced factomd node, v6.2.0, mainnet, default config
Reproduce issue:
Restart factomd
Wait until factomd API started, but node is still in ignore (I waited 1-2 mins after restart)
Commit entry
Issue:
e.g. (DBlock=181339/182487, EntryBlock=181339/181339)
i.e. Leader block (182487) shown as latest factom block, but all other heights are stucked immediately after commit action
Waiting about 40 minutes, node is still in ignore, heights are not progressed
Additional information 1:
Tested without factomd.conf file
Tested with factomd.conf file (default from this repo)
Tested with startdelay=600 & startdelay=0 flags
Issue exists
Additional information 2:
Used FactomProject/factom lib to commit entry.
Code is correct & works as expected on fully synced factomd node — creates entry in the chain.
Don't know if it's important, but here are used factom lib functions:
The text was updated successfully, but these errors were encountered: