-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ALL + AutoPas exit error when ALL updateFrequency < total number of sim steps #244
Comments
Sounds like something that can be resolved via input validation in |
ok, ill do that now |
I revisited this issue, checked on the current master, and tested it on my own laptop and on the ARM cluster at HSU (same hardware as Fugaku). On the ARM cluster we still get the same bug, but on the laptop we don't. On the ARM cluster we have the Fujitsu compiler on clang mode:
On my pc we also have clang:
Tried with AOCC on ARM (clang), no bug here either:
Technically speaking the bug does make sense to me. Checking GeneralDomainDecomposition.cpp, the _loadbalancer object is only used four times in the file. One of those times is to call rebalance() (line 96), at the first rebalancing timestep. If we check in ALLLoadBalancer.cpp:27, we see that this function calls _all.setup(type) for the first time (line 29). The setup is not called before this point, as far as I can see. Now if we check /build/_deps/allfetch-src/include/ALL.hpp:511, the variable load_balancer_type is only set after setup() is called atleast once. Therefore it makes sense that if setup() is never called, then the destructor at ALL.hpp:279 will not exit gracefully, since load_balancer_type will remain undefined. But then why is this not a problem with the other environments, and only with ARM? Does it make sense to add a rebalancing at step 0 (GeneralDomainDecomposition.cpp:76)? Or add an init() function to ALLLoadBalancer.hpp to just call _all.setup()? |
After putting print statements in ALL.hpp, seems that if the setup is never called, other environments set load_balancer_type to 0 while ARM with fujitsu sets it to some undefined value (my runs gave me -1114371832). Therefore I guess this is undefined behaviour, and we should have some init() function, or should call the setup function in the constructor in the ALLLoadBalancer.cpp The other bandaid solution would be to change Simulation.cpp:1290 to have |
Is |
On the latest commit of ALL (https://gitlab.jsc.fz-juelich.de/SLMS/loadbalancing), |
Ok, then I'd say try it and compare the behavior with and without 🤷 |
Describe the bug
When a simulation is run with ALL loadbalancing + AutoPas with more than one rank, and the total simulation steps are less than the update frequency of ALL, the simulation finishes, but it crashes on exit with the following error:
ALLInvalidArgumentException: Unknown type of loadbalancing passed. Function: ~ALL File: /path/to/build/folder/_deps/allfetch-src/include/ALL.hpp Line: 302
To Reproduce
I've attached the xml files I used to test this to this issue(renamed to txt because that's what github wants). Please rename them back to .xml if you want to try the files yourself.
components.txt
mixing_2c.txt
buggyALL.txt
Build environment:
Tested on a single Fugaku node with 4 ranks, with gcc and fcc compiled binaries.
The text was updated successfully, but these errors were encountered: