Bug 1561170
| Summary: | Segfault while creating thread state | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Lon Hohberger <lhh> | ||||
| Component: | python-cffi | Assignee: | Christian Heimes <cheimes> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | ipa-qe <ipa-qe> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.5 | CC: | apevec, cheimes, cstratak, hhorak, jjoyce, jschluet, lhh, mkosek, nsoffer, pviktori, python-maint, salmy, vstinner | ||||
| Target Milestone: | rc | Keywords: | Regression | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-03-14 15:27:34 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Note: Downgrading the mock environment from python-2.7.5-68 to 2.7.5-58 causes everything to work correctly. Created attachment 1414884 [details]
CFFI thread test case 3 (tarball w/ code & makefile)
This is one of the failing tests. Downgrading to python-2.7.5-58.el7 (or running this on F26/F27 builds) works.
Lon, did you get a moment to test the builds? Also my initial comment said that it worked on F26 with python2-2.7.14-7.fc26; that's inaccurate, it crashes there. On F28, the code also fails with python2-2.7.14-15.fc28. (In reply to Lon Hohberger from comment #17) > Also my initial comment said that it worked on F26 with > python2-2.7.14-7.fc26; that's inaccurate, it crashes there. > > On F28, the code also fails with python2-2.7.14-15.fc28. It will fail there as well, since the fix included in the RHEL build is also included in Fedora. cc'ing the author of the patch. Also here is the commit causing the issue: https://github.com/python/cpython/commit/830daae1c82ed33deef0086b7b6323e5be0b0cc8 (In reply to Charalampos Stratakis from comment #19) > Also here is the commit causing the issue: > https://github.com/python/cpython/commit/ > 830daae1c82ed33deef0086b7b6323e5be0b0cc8 Hi Charalampos, why do you think that releasing the GIL in fileio is related to cffi? (In reply to Nir Soffer from comment #20) > (In reply to Charalampos Stratakis from comment #19) > > Also here is the commit causing the issue: > > https://github.com/python/cpython/commit/ > > 830daae1c82ed33deef0086b7b6323e5be0b0cc8 > > Hi Charalampos, why do you think that releasing the GIL in fileio is related > to > cffi? I haven't dived into the code of this patch, however from testing of the aforementioned builds, it suggests that the test fails with that patch applied. I looked briefly in the attached test case, it is very complex code, so there is good chance that the test code is doing something wrong, and the fix for fileio just expose the issue. - Is this upstream test from cffi package? - Does this happen with upstream python? It's the test code lifted directly from cffi 1.11.2 test runs.
The code is somewhat complicated since it's generated code.
I did write the Makefile, however, to make it easier to run (e.g. so one doesn't have to build all of CFFI).
It also fails in the same location against Python 2.7.15, built from source. So, my expectation is that portions of CFFI will no longer work with 2.7.15:
Thread 3 "thread3-test" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff68c4700 (LWP 18472)]
new_threadstate (init=1, interp=0x0) at Python/pystate.c:202
202 tstate->next = interp->tstate_head;
(gdb) bt
#0 new_threadstate (init=1, interp=0x0) at Python/pystate.c:202
#1 PyThreadState_New (interp=0x0) at Python/pystate.c:213
#2 PyGILState_Ensure () at Python/pystate.c:604
#3 0x0000000000406336 in _cffi_initialize_python ()
#4 0x00000000004067e3 in _cffi_start_python ()
#5 0x0000000000406857 in _cffi_start_and_call_python ()
#6 0x0000000000406952 in add3 ()
#7 0x0000000000406b34 in start_routine_3 ()
#8 0x00007ffff7bbf36d in start_thread () from /lib64/libpthread.so.0
#9 0x00007ffff71d6b4f in clone () from /lib64/libc.so.6
(gdb) up
#1 PyThreadState_New (interp=0x0) at Python/pystate.c:213
213 return new_threadstate(interp, 1);
(gdb) list
208 }
209
210 PyThreadState *
211 PyThreadState_New(PyInterpreterState *interp)
212 {
213 return new_threadstate(interp, 1);
214 }
215
216 PyThreadState *
217 _PyThreadState_Prealloc(PyInterpreterState *interp)
(gdb) p interp
$1 = (PyInterpreterState *) 0x0
(gdb) up
#2 PyGILState_Ensure () at Python/pystate.c:604
604 tcur = PyThreadState_New(autoInterpreterState);
(gdb) p autoInterpreterState
$2 = (PyInterpreterState *) 0x7ffff00008f0
(In reply to Lon Hohberger from comment #23) > It's the test code lifted directly from cffi 1.11.2 test runs. > > The code is somewhat complicated since it's generated code. > > I did write the Makefile, however, to make it easier to run (e.g. so one > doesn't have to build all of CFFI). > > It also fails in the same location against Python 2.7.15, built from source. > So, my expectation is that portions of CFFI will no longer work with 2.7.15: > > Thread 3 "thread3-test" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7ffff68c4700 (LWP 18472)] > new_threadstate (init=1, interp=0x0) at Python/pystate.c:202 > 202 tstate->next = interp->tstate_head; > (gdb) bt > #0 new_threadstate (init=1, interp=0x0) at Python/pystate.c:202 > #1 PyThreadState_New (interp=0x0) at Python/pystate.c:213 > #2 PyGILState_Ensure () at Python/pystate.c:604 > #3 0x0000000000406336 in _cffi_initialize_python () > #4 0x00000000004067e3 in _cffi_start_python () > #5 0x0000000000406857 in _cffi_start_and_call_python () > #6 0x0000000000406952 in add3 () > #7 0x0000000000406b34 in start_routine_3 () > #8 0x00007ffff7bbf36d in start_thread () from /lib64/libpthread.so.0 > #9 0x00007ffff71d6b4f in clone () from /lib64/libc.so.6 > (gdb) up > #1 PyThreadState_New (interp=0x0) at Python/pystate.c:213 > 213 return new_threadstate(interp, 1); > (gdb) list > 208 } > 209 > 210 PyThreadState * > 211 PyThreadState_New(PyInterpreterState *interp) > 212 { > 213 return new_threadstate(interp, 1); > 214 } > 215 > 216 PyThreadState * > 217 _PyThreadState_Prealloc(PyInterpreterState *interp) > (gdb) p interp > $1 = (PyInterpreterState *) 0x0 > (gdb) up > #2 PyGILState_Ensure () at Python/pystate.c:604 > 604 tcur = PyThreadState_New(autoInterpreterState); > (gdb) p autoInterpreterState > $2 = (PyInterpreterState *) 0x7ffff00008f0 Ideally this should be resolved upstream first. Could you create an issue at bugs.python.org ? Victor, if you get some time, could you investigate this one? I'm not sure that the bug is related, but Python 2.7.15 contains the following fix: * "PyGILState_Ensure on non-Python thread causes fatal error" * https://bugs.python.org/issue20891 * https://github.com/python/cpython/commit/be6b74c0795b709c7a04e2187a7e32d08f5155f6 Ok, I'm able to reproduce the bug on Fedora 28 using attached "CFFI thread test case 3".
The bug is a race condition in _cffi_initialize_python() of the test case. The first thread calls Py_InitializeEx(), but then other threads call PyGILState_Ensure(). If PyGILState_Ensure() is called before Py_InitializeEx() completes, you get the crash.
Example of crash. Thread A is still running Py_InitializeEx() while Thread B calls PyGILState_Ensure().
Thread A:
#0 0x00007ffff77d82f0 in PyDict_GetItem (op=0x7ffff7f74060, key=0x7ffff7f65ee0) at Objects/dictobject.c:714
#1 0x00007ffff77dd5fe in PyDict_GetItemString (v=0x7ffff7f74060, key=0x7ffff790de80 "__doc__") at Objects/dictobject.c:2509
#2 0x00007ffff7819276 in PyType_Ready (type=0x7ffff7b5a8e0 <_PyExc_LookupError>) at Objects/typeobject.c:4260
#3 0x00007ffff77a89b5 in _PyExc_Init () at Objects/exceptions.c:2078
#4 0x00007ffff78bc60e in Py_InitializeEx (install_sigs=0) at Python/pythonrun.c:265
#5 0x000000000040182c in _cffi_py_initialize () at _add2_cffi.c:782
#6 0x0000000000401851 in _cffi_initialize_python () at _add2_cffi.c:805
#7 0x0000000000401e6a in _cffi_start_python () at _add2_cffi.c:1103
#8 0x0000000000401ede in _cffi_start_and_call_python (externpy=0x604200 <_cffi_externpy__add2>, args=0x7ffff6bceea0 "(") at _add2_cffi.c:1143
#9 0x0000000000401fc9 in add2 (a0=40, a1=2, a2=100) at _add2_cffi.c:1209
#10 0x0000000000402c15 in start_routine_2 (arg=0x0) at thread3-test.c:14
#11 0x00007ffff7bbf594 in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffff746500f in clone () from /lib64/libc.so.6
Thread B:
#0 0x00007ffff73a1f2b in raise () from /lib64/libc.so.6
#1 0x00007ffff738c561 in abort () from /lib64/libc.so.6
#2 0x00007ffff738c431 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3 0x00007ffff739a692 in __assert_fail () from /lib64/libc.so.6
#4 0x00007ffff78bbe7b in PyGILState_Ensure () at Python/pystate.c:600
#5 0x000000000040233b in _cffi_initialize_python () at _add3_cffi.c:815
#6 0x0000000000402940 in _cffi_start_python () at _add3_cffi.c:1098
#7 0x00000000004029b4 in _cffi_start_and_call_python (externpy=0x604280 <_cffi_externpy__add3>, args=0x7ffff63cde90 "\350\003") at _add3_cffi.c:1138
#8 0x0000000000402aaf in add3 (a0=1000, a1=200, a2=30, a3=4) at _add3_cffi.c:1206
#9 0x0000000000402c91 in start_routine_3 (arg=0x0) at thread3-test.c:26
#10 0x00007ffff7bbf594 in start_thread () from /lib64/libpthread.so.0
#11 0x00007ffff746500f in clone () from /lib64/libc.so.6
Code of _cffi_carefully_make_gil():
https://bitbucket.org/cffi/cffi/src/af3a1e8a51e01affd0542727bb9e90fc3b90dafb/cffi/_embedding.h?at=default&fileviewer=file-view-default#_embedding.h-248
It seems like the bug has been fixed in cffi 1.11.5 by the following commit: https://bitbucket.org/cffi/cffi/commits/7b46e415e208 I reassign the bug to python-cffi. Upstream bug report: https://bitbucket.org/cffi/cffi/issues/358/ I'm considering to close this BZ as WONTFIX. The crash is just a test artefact. Supported production cases are not affected by the issue. RHEL 8 contains a newer version of CFFI that does not observe the crash, too. I don't want to risk updating CFFI just to silence a test problem. As I saw no concerns with Christian's plan and I agree with him, I am closing the bug. |
Description of problem: The python-cffi package 1.11.2 has a test which spawns 20 threads at once, doing a bunch of adds. Nothing too serious. The problem seems to be that, when spawning 20 threads at once, Python itself crashes: #0 new_threadstate (interp=0x0, init=init@entry=1) at /usr/src/debug/Python-2.7.5/Python/pystate.c:202 #1 0x00007ffff78f2c1a in PyThreadState_New (interp=<optimized out>) at /usr/src/debug/Python-2.7.5/Python/pystate.c:213 #2 0x00007ffff78f347c in PyGILState_Ensure () at /usr/src/debug/Python-2.7.5/Python/pystate.c:600 #3 0x0000000000401b8b in _cffi_initialize_python () at _add3_cffi.c:815 #4 0x0000000000402035 in _cffi_start_python () at _add3_cffi.c:1098 #5 0x00000000004020a9 in _cffi_start_and_call_python (externpy=0x604200 <_cffi_externpy__add3>, args=0x7ffff651aeb0 "\350\003") at _add3_cffi.c:1138 #6 0x00000000004021a0 in add3 (a0=1000, a1=200, a2=30, a3=4) at _add3_cffi.c:1206 #7 0x000000000040174c in start_routine_3 (arg=0x0) at thread3-test.c:26 #8 0x00007ffff7bc6dd5 in start_thread () from /lib64/libpthread.so.0 #9 0x00007ffff7524b3d in clone () from /lib64/libc.so.6 Version-Release number of selected component (if applicable): python-2.7.5-68.el7 How reproducible: 50% Steps to Reproduce: 1. Run checks for python-cffi-1.11.2 Actual results: Segmentation fault. Additional info: Inspecting the stack is peculiar: At frame 0: interp = NULL, causing segfault on line 202 in pystate.c At frame 1: #1 0x00007ffff78f2c1a in PyThreadState_New (interp=<optimized out>) at /usr/src/debug/Python-2.7.5/Python/pystate.c:213 213 return new_threadstate(interp, 1); At frame 2: #2 0x00007ffff78f347c in PyGILState_Ensure () at /usr/src/debug/Python-2.7.5/Python/pystate.c:600 600 tcur = PyThreadState_New(autoInterpreterState); (gdb) p autoInterpreterState $16 = (PyInterpreterState *) 0x7ffff00008f0 It looks like something's wiping it out on the stack (compiler optimizing too much?). Placing explicit thread locks around bits to serialize calls into PyGILState_Ensure() to ensure threads didn't enter that code simultaneously didn't seem to help at all. The same code works correctly on F26 with 2.7.14-7.fc26. I'm willing to patch up python-cffi as a workaround, but the code isn't terribly familiar yet.