-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
info.json is not be saved #830
Comments
Hi, I assume you use the |
Yes, I use the |
Do you have a minimal example that reproduces this issue? It seems to work for me. The heartbeat events are processed in a background thread. It could be that this thread dies, for some reason, before it can perform the final write. |
Thanks for your reply. I am so sorry that I can't provide a minimal example because I am dealing with a complex project about MARL. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I have the same issue (Sacred 0.8.2). Is the info dict not saved on completion? That sounds like a bug. |
The info dict is not saved on completion. It is not passed to the @vnmabus do you have a minimal example to reproduce the issue? Or does it only appear in larger experiments? |
For now only a few times, and in medium to large experiments in the cluster. I have put a |
This should be saved on completion. I have lost countless human and computing time by relaunching half completed experiments because of this. |
That's really unfortunate. Do you have extremely large data in your Line 288 in 17c5306
Saving this information on completed is not as easy as it sounds because it is a breaking change and could create a race condition with the background thread (right?). But it could still be better than half-saved files. |
Yes, I have large data in info (I store all of train and test scores and times). My proposal was to join the heartbeat thread. I was not aware that this was done using a timeout. What is the reason for that? Can the heartbeat not stop? |
I don't know the reason. It was introduced here: 95234cd which seems to be addressing this issue: #273. I believe that there is no reason for the FileStorageObserver to hang on heartbeat, but the MongoObserver seems to have issues where it sometimes doesn't exit. But I only use the FileStorageObserver, so I can't confirm. But even in that case, I would argue that a hanging experiment script is better than broken files. At least then it is obvious that something went wrong |
Hi, I use sacred for my AI experiments and it help me a lot. But recently I found something is wrong with sacred. I use the info dict to save some results of my experiments and uausally it works well. But sometimes the info.json is not saved or only half of it is saved. Is there any solution?
The version of sacred I use is 0.8.2 and on python 3.7.
The text was updated successfully, but these errors were encountered: