You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the CI produces this error message "Docker Action run completed with exit code 137", it means the Github runner is running out of memory. There are many possibilities for out of memory errors. There is one that is very bizarre and hard to debug.
When SAW fails, sometimes it produces a gigantic error message (could be up to 10s of gigabytes). When it happens, SAW's memory usage grows fast, which we could do nothing about. But at the same time, in parallel.py, the subprocess.run call will try to capture the whole error message, causing the python process's memory usage to grow fast as well. If one prints using ps aux, one will see something like the following:
When the CI produces this error message "Docker Action run completed with exit code 137", it means the Github runner is running out of memory. There are many possibilities for out of memory errors. There is one that is very bizarre and hard to debug.
When SAW fails, sometimes it produces a gigantic error message (could be up to 10s of gigabytes). When it happens, SAW's memory usage grows fast, which we could do nothing about. But at the same time, in parallel.py, the
subprocess.run
call will try to capture the whole error message, causing the python process's memory usage to grow fast as well. If one prints usingps aux
, one will see something like the following:One can see that the python job is also using an unusual amount of memory.
Fix the Python script parallel.py to discard error message larger than a pre-defined size.
The text was updated successfully, but these errors were encountered: