Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grp is not thread safe #126316

Open
devdanzin opened this issue Nov 1, 2024 · 17 comments
Open

grp is not thread safe #126316

devdanzin opened this issue Nov 1, 2024 · 17 comments
Labels
extension-modules C modules in the Modules dir topic-free-threading topic-subinterpreters type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@devdanzin
Copy link
Contributor

devdanzin commented Nov 1, 2024

Crash report

What happened?

It's possible to crash grp in a no-gil build by repeatedly calling functions in threads, it only happens with PYTHON_GIL=0.

from threading import Thread
import grp

for x in range(5000):
    alive = [
        Thread(target=grp.getgrgid, args=(1,)),
        Thread(target=grp.getgrall),
        Thread(target=grp.getgrnam, args=('root',)),
    ]

    for obj in alive:
        print('START', obj)
        obj.start()

Backtrace is:

(gdb) bt
#0  __strlen_avx2 ()
    at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
#1  0x00005555557ad3cb in PyUnicode_DecodeFSDefault (
    s=s@entry=0x3838323432353a78 <error: Cannot access memory at address 0x3838323432353a78>) at Objects/unicodeobject.c:4058
#2  0x00007ffff7c3f952 in mkgrent (
    module=module@entry=0x20000795b80, p=<optimized out>)
    at ./Modules/grpmodule.c:83
#3  0x00007ffff7c3fc41 in grp_getgrall_impl (
    module=0x20000795b80) at ./Modules/grpmodule.c:291
#4  0x00007ffff7c3fcad in grp_getgrall (
    module=<optimized out>, _unused_ignored=<optimized out>)
    at ./Modules/clinic/grpmodule.c.h:83
#5  0x00005555556fca83 in cfunction_vectorcall_NOARGS (
    func=0x2000079f820, args=<optimized out>,
    nargsf=<optimized out>, kwnames=<optimized out>)
    at Objects/methodobject.c:495
#6  0x000055555567b4bc in _PyVectorcall_Call (
    tstate=tstate@entry=0x555555d2ac70,
    func=0x5555556fc8b5 <cfunction_vectorcall_NOARGS>,
    callable=callable@entry=0x2000079f820,
    tuple=tuple@entry=0x555555c5d1d8 <_PyRuntime+128984>,
    kwargs=kwargs@entry=0x2001a0400d0) at Objects/call.c:273
#7  0x000055555567b894 in _PyObject_Call (
    tstate=0x555555d2ac70,
    callable=callable@entry=0x2000079f820,
    args=args@entry=0x555555c5d1d8 <_PyRuntime+128984>,
    kwargs=kwargs@entry=0x2001a0400d0) at Objects/call.c:348
#8  0x000055555567b8f1 in PyObject_Call (
    callable=callable@entry=0x2000079f820,
    args=args@entry=0x555555c5d1d8 <_PyRuntime+128984>,
    kwargs=kwargs@entry=0x2001a0400d0) at Objects/call.c:373

Found using fusil by @vstinner.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.14.0a1+ experimental free-threading build (heads/main:d467d9246c, Nov 1 2024, 09:05:56) [GCC 11.4.0]

Linked PRs

@devdanzin devdanzin added the type-crash A hard crash of the interpreter, possibly with a core dump label Nov 1, 2024
@picnixz picnixz added extension-modules C modules in the Modules dir topic-free-threading labels Nov 2, 2024
@colesbury
Copy link
Contributor

Thanks for the bug reports. What is fusil?

@devdanzin
Copy link
Contributor Author

What is fusil?

fusil is a fuzzer written in Python by Victor Stinner around 2007 I think. Somehow it has been memory holed, but some references can still be found in the WayBack Machine:
RTD
Home
List of Python bugs found (way back when)
Articles by Victor in French

I used it in the old days to find a couple of issues and was curious whether it could still find some. The source is still available even if not so easy to find. I got it, fixed some issues, added a bit of (very rough) threading, and let it search for crashes.

Not sure what to do with the code, first I'd like to know why it disappeared. But I'd be happy to share privately.

@colesbury
Copy link
Contributor

From a quick glance, it looks like getgrall uses setgrent(), getgrent(), endgrent(), which are not thread-safe on Linux.

The macOS implementations appear to be thread-safe

Linux with glibc 2.19+ has fgetgrent_r, which is thread-safe.

@kumaraditya303
Copy link
Contributor

As a quick fix, I think the module should be marked with Py_MOD_GIL_USED

vstinner added a commit to vstinner/cpython that referenced this issue Nov 6, 2024
vstinner added a commit to vstinner/cpython that referenced this issue Nov 6, 2024
vstinner added a commit to vstinner/cpython that referenced this issue Nov 6, 2024
@vstinner
Copy link
Member

vstinner commented Nov 6, 2024

As a quick fix, I think the module should be marked with Py_MOD_GIL_USED

I wrote a fix #126488 using @critical_section.

@kumaraditya303
Copy link
Contributor

I don't think that would fix the issue, there can be two instances of module which can race with each other. The correct fix would be to have a global lock which would serialize calls across modules.

@ZeroIntensity
Copy link
Member

Yeah, this will still race if there are subinterpreters using grp at the same time. Kumar's suggestion is probably the way to go.

@ZeroIntensity ZeroIntensity changed the title grp called from multiple threads in no-gil build segfaults grp is not thread safe Nov 6, 2024
vstinner added a commit to vstinner/cpython that referenced this issue Nov 6, 2024
@vstinner
Copy link
Member

vstinner commented Nov 6, 2024

I don't think that would fix the issue, there can be two instances of module which can race with each other.

Oh, I forgot that we can have more than one extension instance, you're right.

The correct fix would be to have a global lock which would serialize calls across modules.

I wrote PR gh-126504 which adds one mutex per function. IMO it's safe to call the 3 functions in parallel, but not to call the same function in parallel. At least, it works on Linux.

@vstinner
Copy link
Member

vstinner commented Nov 6, 2024

Recently, the ssl module was fixed by using @critical_section. When @critical_section is used on a method, Py_BEGIN_CRITICAL_SECTION(self); is used which sounds correct. But when @critical_section is used on a module function, Py_BEGIN_CRITICAL_SECTION(module); is used which seems to have the issue that @kumaraditya303 described before. What if there are more than one _ssl module instance?

@ZeroIntensity
Copy link
Member

ZeroIntensity commented Nov 6, 2024

Critical sections only apply to the current interpreter (or really, they only apply to one object, but objects can't be shared directly anyway). The SSL problem was that OpenSSL doesn't support concurrent use on the same socket, but since those sockets can't be shared between interpreters, there wasn't any issue there.

@colesbury
Copy link
Contributor

I think we should use the thread-safe function, fgetgrent_r, when available, and not mess with locking here.

@kumaraditya303
Copy link
Contributor

is used which seems to have the issue that kumaraditya303 described before. What if there are more than one _ssl module instance?

Yes, that isn't thread safe, adding critical section makes it thread safe if you only have one instance of such module, if there are multiples of them then a global lock is necessary.

@colesbury
Copy link
Contributor

I am not a fan of locking around these sorts of libc functions:

  • It assumes that all calls to these libc functions go through our library code, but CPython is used with lots of C and C++ libraries that make their own libc calls (unprotected by our locks).
  • Our mkgrent function calls a into the Python API. I don't think the API calls are likely to be re-entrant, but it's hard to tell for sure. (Not triggering GC on object allocation helps.)

@vstinner
Copy link
Member

vstinner commented Nov 6, 2024

I think we should use the thread-safe function, fgetgrent_r, when available, and not mess with locking here.

I don't know how to use this function, and I'm not sure that it's available on all (non-Windows) platforms. For example, it's not available on FreeBSD.

@colesbury
Copy link
Contributor

FreeBSD's version of getgrent_r is thread-safe and setgrent and endgrent appear to use thread-local storage on FreeBSD:

vstinner added a commit to vstinner/cpython that referenced this issue Nov 6, 2024
If available, use getgrent_r() in grp.getgrall() to make the
function reentrant.
vstinner added a commit to vstinner/cpython that referenced this issue Nov 6, 2024
If available, use getgrent_r() in grp.getgrall() to make the
function reentrant.
vstinner added a commit to vstinner/cpython that referenced this issue Nov 6, 2024
@vstinner
Copy link
Member

vstinner commented Nov 6, 2024

Ok, so apparently, only grp.getgrall() is not thread-safe, and getgrent_r() can be used instead of getgrent() to make the function reentrant. I didn't want to use fgetgrent_r() to not have to open a file (which file?).

In short, I wrote PR gh-126506 to fix the issue using getgrent_r().

@colesbury
Copy link
Contributor

Ugh, getgrent_r is not thread-safe on Linux with glibc. Yuck. And fgetgrent_r isn't a good substitute if groups use LDAP or anything that's not /etc/groups.

I guess locks it is... at least for Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir topic-free-threading topic-subinterpreters type-crash A hard crash of the interpreter, possibly with a core dump
Projects
Status: Todo
Development

No branches or pull requests

6 participants