Bug 1561170
Summary: | Segfault while creating thread state | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Lon Hohberger <lhh> | ||||
Component: | python-cffi | Assignee: | Christian Heimes <cheimes> | ||||
Status: | CLOSED WONTFIX | QA Contact: | ipa-qe <ipa-qe> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.5 | CC: | apevec, cheimes, cstratak, hhorak, jjoyce, jschluet, lhh, mkosek, nsoffer, pviktori, python-maint, salmy, vstinner | ||||
Target Milestone: | rc | Keywords: | Regression | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-03-14 15:27:34 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Lon Hohberger
2018-03-27 19:51:12 UTC
Note: Downgrading the mock environment from python-2.7.5-68 to 2.7.5-58 causes everything to work correctly. Created attachment 1414884 [details]
CFFI thread test case 3 (tarball w/ code & makefile)
This is one of the failing tests. Downgrading to python-2.7.5-58.el7 (or running this on F26/F27 builds) works.
Lon, did you get a moment to test the builds? Also my initial comment said that it worked on F26 with python2-2.7.14-7.fc26; that's inaccurate, it crashes there. On F28, the code also fails with python2-2.7.14-15.fc28. (In reply to Lon Hohberger from comment #17) > Also my initial comment said that it worked on F26 with > python2-2.7.14-7.fc26; that's inaccurate, it crashes there. > > On F28, the code also fails with python2-2.7.14-15.fc28. It will fail there as well, since the fix included in the RHEL build is also included in Fedora. cc'ing the author of the patch. Also here is the commit causing the issue: https://github.com/python/cpython/commit/830daae1c82ed33deef0086b7b6323e5be0b0cc8 (In reply to Charalampos Stratakis from comment #19) > Also here is the commit causing the issue: > https://github.com/python/cpython/commit/ > 830daae1c82ed33deef0086b7b6323e5be0b0cc8 Hi Charalampos, why do you think that releasing the GIL in fileio is related to cffi? (In reply to Nir Soffer from comment #20) > (In reply to Charalampos Stratakis from comment #19) > > Also here is the commit causing the issue: > > https://github.com/python/cpython/commit/ > > 830daae1c82ed33deef0086b7b6323e5be0b0cc8 > > Hi Charalampos, why do you think that releasing the GIL in fileio is related > to > cffi? I haven't dived into the code of this patch, however from testing of the aforementioned builds, it suggests that the test fails with that patch applied. I looked briefly in the attached test case, it is very complex code, so there is good chance that the test code is doing something wrong, and the fix for fileio just expose the issue. - Is this upstream test from cffi package? - Does this happen with upstream python? It's the test code lifted directly from cffi 1.11.2 test runs. The code is somewhat complicated since it's generated code. I did write the Makefile, however, to make it easier to run (e.g. so one doesn't have to build all of CFFI). It also fails in the same location against Python 2.7.15, built from source. So, my expectation is that portions of CFFI will no longer work with 2.7.15: Thread 3 "thread3-test" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff68c4700 (LWP 18472)] new_threadstate (init=1, interp=0x0) at Python/pystate.c:202 202 tstate->next = interp->tstate_head; (gdb) bt #0 new_threadstate (init=1, interp=0x0) at Python/pystate.c:202 #1 PyThreadState_New (interp=0x0) at Python/pystate.c:213 #2 PyGILState_Ensure () at Python/pystate.c:604 #3 0x0000000000406336 in _cffi_initialize_python () #4 0x00000000004067e3 in _cffi_start_python () #5 0x0000000000406857 in _cffi_start_and_call_python () #6 0x0000000000406952 in add3 () #7 0x0000000000406b34 in start_routine_3 () #8 0x00007ffff7bbf36d in start_thread () from /lib64/libpthread.so.0 #9 0x00007ffff71d6b4f in clone () from /lib64/libc.so.6 (gdb) up #1 PyThreadState_New (interp=0x0) at Python/pystate.c:213 213 return new_threadstate(interp, 1); (gdb) list 208 } 209 210 PyThreadState * 211 PyThreadState_New(PyInterpreterState *interp) 212 { 213 return new_threadstate(interp, 1); 214 } 215 216 PyThreadState * 217 _PyThreadState_Prealloc(PyInterpreterState *interp) (gdb) p interp $1 = (PyInterpreterState *) 0x0 (gdb) up #2 PyGILState_Ensure () at Python/pystate.c:604 604 tcur = PyThreadState_New(autoInterpreterState); (gdb) p autoInterpreterState $2 = (PyInterpreterState *) 0x7ffff00008f0 (In reply to Lon Hohberger from comment #23) > It's the test code lifted directly from cffi 1.11.2 test runs. > > The code is somewhat complicated since it's generated code. > > I did write the Makefile, however, to make it easier to run (e.g. so one > doesn't have to build all of CFFI). > > It also fails in the same location against Python 2.7.15, built from source. > So, my expectation is that portions of CFFI will no longer work with 2.7.15: > > Thread 3 "thread3-test" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7ffff68c4700 (LWP 18472)] > new_threadstate (init=1, interp=0x0) at Python/pystate.c:202 > 202 tstate->next = interp->tstate_head; > (gdb) bt > #0 new_threadstate (init=1, interp=0x0) at Python/pystate.c:202 > #1 PyThreadState_New (interp=0x0) at Python/pystate.c:213 > #2 PyGILState_Ensure () at Python/pystate.c:604 > #3 0x0000000000406336 in _cffi_initialize_python () > #4 0x00000000004067e3 in _cffi_start_python () > #5 0x0000000000406857 in _cffi_start_and_call_python () > #6 0x0000000000406952 in add3 () > #7 0x0000000000406b34 in start_routine_3 () > #8 0x00007ffff7bbf36d in start_thread () from /lib64/libpthread.so.0 > #9 0x00007ffff71d6b4f in clone () from /lib64/libc.so.6 > (gdb) up > #1 PyThreadState_New (interp=0x0) at Python/pystate.c:213 > 213 return new_threadstate(interp, 1); > (gdb) list > 208 } > 209 > 210 PyThreadState * > 211 PyThreadState_New(PyInterpreterState *interp) > 212 { > 213 return new_threadstate(interp, 1); > 214 } > 215 > 216 PyThreadState * > 217 _PyThreadState_Prealloc(PyInterpreterState *interp) > (gdb) p interp > $1 = (PyInterpreterState *) 0x0 > (gdb) up > #2 PyGILState_Ensure () at Python/pystate.c:604 > 604 tcur = PyThreadState_New(autoInterpreterState); > (gdb) p autoInterpreterState > $2 = (PyInterpreterState *) 0x7ffff00008f0 Ideally this should be resolved upstream first. Could you create an issue at bugs.python.org ? Victor, if you get some time, could you investigate this one? I'm not sure that the bug is related, but Python 2.7.15 contains the following fix: * "PyGILState_Ensure on non-Python thread causes fatal error" * https://bugs.python.org/issue20891 * https://github.com/python/cpython/commit/be6b74c0795b709c7a04e2187a7e32d08f5155f6 Ok, I'm able to reproduce the bug on Fedora 28 using attached "CFFI thread test case 3". The bug is a race condition in _cffi_initialize_python() of the test case. The first thread calls Py_InitializeEx(), but then other threads call PyGILState_Ensure(). If PyGILState_Ensure() is called before Py_InitializeEx() completes, you get the crash. Example of crash. Thread A is still running Py_InitializeEx() while Thread B calls PyGILState_Ensure(). Thread A: #0 0x00007ffff77d82f0 in PyDict_GetItem (op=0x7ffff7f74060, key=0x7ffff7f65ee0) at Objects/dictobject.c:714 #1 0x00007ffff77dd5fe in PyDict_GetItemString (v=0x7ffff7f74060, key=0x7ffff790de80 "__doc__") at Objects/dictobject.c:2509 #2 0x00007ffff7819276 in PyType_Ready (type=0x7ffff7b5a8e0 <_PyExc_LookupError>) at Objects/typeobject.c:4260 #3 0x00007ffff77a89b5 in _PyExc_Init () at Objects/exceptions.c:2078 #4 0x00007ffff78bc60e in Py_InitializeEx (install_sigs=0) at Python/pythonrun.c:265 #5 0x000000000040182c in _cffi_py_initialize () at _add2_cffi.c:782 #6 0x0000000000401851 in _cffi_initialize_python () at _add2_cffi.c:805 #7 0x0000000000401e6a in _cffi_start_python () at _add2_cffi.c:1103 #8 0x0000000000401ede in _cffi_start_and_call_python (externpy=0x604200 <_cffi_externpy__add2>, args=0x7ffff6bceea0 "(") at _add2_cffi.c:1143 #9 0x0000000000401fc9 in add2 (a0=40, a1=2, a2=100) at _add2_cffi.c:1209 #10 0x0000000000402c15 in start_routine_2 (arg=0x0) at thread3-test.c:14 #11 0x00007ffff7bbf594 in start_thread () from /lib64/libpthread.so.0 #12 0x00007ffff746500f in clone () from /lib64/libc.so.6 Thread B: #0 0x00007ffff73a1f2b in raise () from /lib64/libc.so.6 #1 0x00007ffff738c561 in abort () from /lib64/libc.so.6 #2 0x00007ffff738c431 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 #3 0x00007ffff739a692 in __assert_fail () from /lib64/libc.so.6 #4 0x00007ffff78bbe7b in PyGILState_Ensure () at Python/pystate.c:600 #5 0x000000000040233b in _cffi_initialize_python () at _add3_cffi.c:815 #6 0x0000000000402940 in _cffi_start_python () at _add3_cffi.c:1098 #7 0x00000000004029b4 in _cffi_start_and_call_python (externpy=0x604280 <_cffi_externpy__add3>, args=0x7ffff63cde90 "\350\003") at _add3_cffi.c:1138 #8 0x0000000000402aaf in add3 (a0=1000, a1=200, a2=30, a3=4) at _add3_cffi.c:1206 #9 0x0000000000402c91 in start_routine_3 (arg=0x0) at thread3-test.c:26 #10 0x00007ffff7bbf594 in start_thread () from /lib64/libpthread.so.0 #11 0x00007ffff746500f in clone () from /lib64/libc.so.6 Code of _cffi_carefully_make_gil(): https://bitbucket.org/cffi/cffi/src/af3a1e8a51e01affd0542727bb9e90fc3b90dafb/cffi/_embedding.h?at=default&fileviewer=file-view-default#_embedding.h-248 It seems like the bug has been fixed in cffi 1.11.5 by the following commit: https://bitbucket.org/cffi/cffi/commits/7b46e415e208 I reassign the bug to python-cffi. Upstream bug report: https://bitbucket.org/cffi/cffi/issues/358/ I'm considering to close this BZ as WONTFIX. The crash is just a test artefact. Supported production cases are not affected by the issue. RHEL 8 contains a newer version of CFFI that does not observe the crash, too. I don't want to risk updating CFFI just to silence a test problem. As I saw no concerns with Christian's plan and I agree with him, I am closing the bug. |