Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing.Event.set() can be deadlocked by .is_set when called by a sigterm handler #126434

Open
ivarref opened this issue Nov 5, 2024 · 5 comments
Labels
stdlib Python modules in the Lib dir topic-multiprocessing type-bug An unexpected behavior, bug, or error

Comments

@ivarref
Copy link

ivarref commented Nov 5, 2024

Bug report

Bug description:

multiprocessing.Event.set() will acquire a lock when setting the internal flag. multiprocessing.Event.is_set() will acquire the same lock when checking the flag. Thus if a signal handler calls .set() when .is_set() is running on the same process, there will be a deadlock.

multiprocessing.Event uses a regular non-reentrantlock lock. This should be changed to a reentrant lock. Please see the pull request.

Thanks for all the work on the Python programming language. I appreciate all your efforts highly.

Kind regards.

Example program below that (sometimes) deadlocks. On my machine I typically need to run it less than 10 times before a deadlock occurs. Also included in the code block is a sample stacktrace.

import faulthandler
import multiprocessing
import os
import signal


def run_buggy():
    shutdown_event = multiprocessing.Event()

    def sigterm_handler(_signo, _stack_frame):
        try:
            print(f'sigterm_handler running')
            shutdown_event.set()
        finally:
            print(f'sigterm_handler done')

    signal.signal(signal.SIGTERM, sigterm_handler)
    signal.signal(signal.SIGINT, sigterm_handler)
    faulthandler.register(signal.SIGUSR1)

    print(f'Running process with PID {os.getpid()}')
    print(f'Dump the stack by executing:')
    print(f'kill -SIGUSR1 {os.getpid()}')
    print(f'Try to kill this process with CTRL-C or kill {os.getpid()}')
    while not shutdown_event.is_set():
        pass

if __name__ == '__main__':
    run_buggy()
# Sample stacktrace:
# Current thread 0x00000001ed74cf40 (most recent call first):
#   File "/Users/ire/code/cpython/Lib/multiprocessing/synchronize.py", line 95 in __enter__
#   File "/Users/ire/code/cpython/Lib/multiprocessing/synchronize.py", line 237 in __enter__
#   File "/Users/ire/code/cpython/Lib/multiprocessing/synchronize.py", line 342 in set
#   File "/Users/ire/code/cpython/./bug.py", line 13 in sigterm_handler
#   File "/Users/ire/code/cpython/Lib/multiprocessing/synchronize.py", line 95 in __enter__
#   File "/Users/ire/code/cpython/Lib/multiprocessing/synchronize.py", line 237 in __enter__
#   File "/Users/ire/code/cpython/Lib/multiprocessing/synchronize.py", line 335 in is_set
#   File "/Users/ire/code/cpython/./bug.py", line 25 in run_buggy
#   File "/Users/ire/code/cpython/./bug.py", line 29 in <module>

CPython versions tested on:

3.11, 3.12, CPython main branch

Operating systems tested on:

Linux, macOS

Linked PRs

@ivarref ivarref added the type-bug An unexpected behavior, bug, or error label Nov 5, 2024
@Zheaoli
Copy link
Contributor

Zheaoli commented Nov 5, 2024

Bug confirmed on main branch

@Zheaoli
Copy link
Contributor

Zheaoli commented Nov 5, 2024

@Eclips4 We need tag for this issue, I think topic-multiprocessing would be OK

BTW @ivarref are you already working on a PR?

@ivarref
Copy link
Author

ivarref commented Nov 5, 2024

Hi @Zheaoli

Thanks for the quick response. I've pushed a pull request now. What do you think?

Thanks and kind regards.

@ivarref
Copy link
Author

ivarref commented Nov 5, 2024

Update: I've added a blurb entry.

@colesbury
Copy link
Contributor

I commented on the PR, but I don't think multiprocessing.Event is safe to call from a Python signal handler, even with an RLock. I suspect that's true of a lot of Python's concurrency primitives.

The general problem is that Python can execute signal handling code asynchronously at many places that don't expect to support reentrancy.

RLock's don't solve this problem because the things they protect are often not safe to call reentrancy. In other words, calling them reentrantly breaks invariants that a (non-reentrant) lock would otherwise protect.

@ZeroIntensity ZeroIntensity added the stdlib Python modules in the Lib dir label Nov 5, 2024
@gpshead gpshead assigned gpshead and unassigned gpshead Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-multiprocessing type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants