Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory when SAW produces gigantic error message #135

Open
pennyannn opened this issue Dec 16, 2023 · 0 comments
Open

Out of memory when SAW produces gigantic error message #135

pennyannn opened this issue Dec 16, 2023 · 0 comments

Comments

@pennyannn
Copy link
Contributor

When the CI produces this error message "Docker Action run completed with exit code 137", it means the Github runner is running out of memory. There are many possibilities for out of memory errors. There is one that is very bizarre and hard to debug.

When SAW fails, sometimes it produces a gigantic error message (could be up to 10s of gigabytes). When it happens, SAW's memory usage grows fast, which we could do nothing about. But at the same time, in parallel.py, the subprocess.run call will try to capture the whole error message, causing the python process's memory usage to grow fast as well. If one prints using ps aux, one will see something like the following:

2023-12-16T04:31:22.3477406Z root        7300  2.6 52.4 34516632 34502404 ?   S    04:01   0:47 /usr/bin/python3 ./scripts/parallel.py --file ./scripts/x86_64/release_jobs.sh
2023-12-16T04:31:22.3489243Z root        7307 95.6 45.2 1074209060 29793672 ? Rl   04:01  28:24 saw proof/ECDH/verify-ECDH.saw

One can see that the python job is also using an unusual amount of memory.

Fix the Python script parallel.py to discard error message larger than a pre-defined size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant