1561170 – Segfault while creating thread state

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1561170 - Segfault while creating thread state

Summary: Segfault while creating thread state

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	python-cffi
Sub Component:
Version:	7.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Christian Heimes
QA Contact:	ipa-qe
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-27 19:51 UTC by Lon Hohberger
Modified:	2019-03-14 15:27 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-03-14 15:27:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
CFFI thread test case 3 (tarball w/ code & makefile) (22.49 KB, application/x-gzip) 2018-03-29 19:38 UTC, Lon Hohberger	no flags	Details
View All

Description Lon Hohberger 2018-03-27 19:51:12 UTC

Description of problem:

The python-cffi package 1.11.2 has a test which spawns 20 threads at once, doing a bunch of adds. Nothing too serious.

The problem seems to be that, when spawning 20 threads at once, Python itself crashes:

#0  new_threadstate (interp=0x0, init=init@entry=1) at /usr/src/debug/Python-2.7.5/Python/pystate.c:202
#1  0x00007ffff78f2c1a in PyThreadState_New (interp=<optimized out>) at /usr/src/debug/Python-2.7.5/Python/pystate.c:213
#2  0x00007ffff78f347c in PyGILState_Ensure () at /usr/src/debug/Python-2.7.5/Python/pystate.c:600
#3  0x0000000000401b8b in _cffi_initialize_python () at _add3_cffi.c:815
#4  0x0000000000402035 in _cffi_start_python () at _add3_cffi.c:1098
#5  0x00000000004020a9 in _cffi_start_and_call_python (externpy=0x604200 <_cffi_externpy__add3>, 
    args=0x7ffff651aeb0 "\350\003") at _add3_cffi.c:1138
#6  0x00000000004021a0 in add3 (a0=1000, a1=200, a2=30, a3=4) at _add3_cffi.c:1206
#7  0x000000000040174c in start_routine_3 (arg=0x0) at thread3-test.c:26
#8  0x00007ffff7bc6dd5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007ffff7524b3d in clone () from /lib64/libc.so.6

Version-Release number of selected component (if applicable): python-2.7.5-68.el7

How reproducible: 50%

Steps to Reproduce:
1. Run checks for python-cffi-1.11.2

Actual results: Segmentation fault.

Additional info:

Inspecting the stack is peculiar:

At frame 0:

interp = NULL, causing segfault on line 202 in pystate.c

At frame 1: 

#1  0x00007ffff78f2c1a in PyThreadState_New (interp=<optimized out>) at /usr/src/debug/Python-2.7.5/Python/pystate.c:213
213	    return new_threadstate(interp, 1);

At frame 2:

#2  0x00007ffff78f347c in PyGILState_Ensure () at /usr/src/debug/Python-2.7.5/Python/pystate.c:600
600	        tcur = PyThreadState_New(autoInterpreterState);
(gdb) p autoInterpreterState
$16 = (PyInterpreterState *) 0x7ffff00008f0


It looks like something's wiping it out on the stack (compiler optimizing too much?).  Placing explicit thread locks around bits to serialize calls into PyGILState_Ensure() to ensure threads didn't enter that code simultaneously didn't seem to help at all.

The same code works correctly on F26 with 2.7.14-7.fc26.

I'm willing to patch up python-cffi as a workaround, but the code isn't terribly familiar yet.

Comment 2 Lon Hohberger 2018-03-27 20:07:00 UTC

Note: Downgrading the mock environment from python-2.7.5-68 to 2.7.5-58 causes everything to work correctly.

Comment 9 Lon Hohberger 2018-03-29 19:38:40 UTC

Created attachment 1414884 [details]
CFFI thread test case 3 (tarball w/ code & makefile)

This is one of the failing tests.  Downgrading to python-2.7.5-58.el7 (or running this on F26/F27 builds) works.

Comment 11 Petr Viktorin (pviktori) 2018-04-11 10:12:15 UTC

Lon, did you get a moment to test the builds?

Comment 17 Lon Hohberger 2018-05-02 18:23:50 UTC

Also my initial comment said that it worked on F26 with python2-2.7.14-7.fc26; that's inaccurate, it crashes there.

On F28, the code also fails with python2-2.7.14-15.fc28.

Comment 18 Charalampos Stratakis 2018-05-03 10:06:24 UTC

(In reply to Lon Hohberger from comment #17)
> Also my initial comment said that it worked on F26 with
> python2-2.7.14-7.fc26; that's inaccurate, it crashes there.
> 
> On F28, the code also fails with python2-2.7.14-15.fc28.

It will fail there as well, since the fix included in the RHEL build is also included in Fedora.

cc'ing the author of the patch.

Comment 19 Charalampos Stratakis 2018-05-03 10:43:41 UTC

Also here is the commit causing the issue: https://github.com/python/cpython/commit/830daae1c82ed33deef0086b7b6323e5be0b0cc8

Comment 20 Nir Soffer 2018-05-03 11:22:03 UTC

(In reply to Charalampos Stratakis from comment #19)
> Also here is the commit causing the issue:
> https://github.com/python/cpython/commit/
> 830daae1c82ed33deef0086b7b6323e5be0b0cc8

Hi Charalampos, why do you think that releasing the GIL in fileio is related to 
cffi?

Comment 21 Charalampos Stratakis 2018-05-03 11:27:15 UTC

(In reply to Nir Soffer from comment #20)
> (In reply to Charalampos Stratakis from comment #19)
> > Also here is the commit causing the issue:
> > https://github.com/python/cpython/commit/
> > 830daae1c82ed33deef0086b7b6323e5be0b0cc8
> 
> Hi Charalampos, why do you think that releasing the GIL in fileio is related
> to 
> cffi?

I haven't dived into the code of this patch, however from testing of the aforementioned builds, it suggests that the test fails with that patch applied.

Comment 22 Nir Soffer 2018-05-03 11:41:35 UTC

I looked briefly in the attached test case, it is very complex code, so there is 
good chance that the test code is doing something wrong, and the fix for fileio
just expose the issue.

- Is this upstream test from cffi package?
- Does this happen with upstream python?

Comment 23 Lon Hohberger 2018-05-29 19:41:20 UTC

It's the test code lifted directly from cffi 1.11.2 test runs.

The code is somewhat complicated since it's generated code.

I did write the Makefile, however, to make it easier to run (e.g. so one doesn't have to build all of CFFI).

It also fails in the same location against Python 2.7.15, built from source. So, my expectation is that portions of CFFI will no longer work with 2.7.15:

Thread 3 "thread3-test" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff68c4700 (LWP 18472)]
new_threadstate (init=1, interp=0x0) at Python/pystate.c:202
202	        tstate->next = interp->tstate_head;
(gdb) bt
#0  new_threadstate (init=1, interp=0x0) at Python/pystate.c:202
#1  PyThreadState_New (interp=0x0) at Python/pystate.c:213
#2  PyGILState_Ensure () at Python/pystate.c:604
#3  0x0000000000406336 in _cffi_initialize_python ()
#4  0x00000000004067e3 in _cffi_start_python ()
#5  0x0000000000406857 in _cffi_start_and_call_python ()
#6  0x0000000000406952 in add3 ()
#7  0x0000000000406b34 in start_routine_3 ()
#8  0x00007ffff7bbf36d in start_thread () from /lib64/libpthread.so.0
#9  0x00007ffff71d6b4f in clone () from /lib64/libc.so.6
(gdb) up
#1  PyThreadState_New (interp=0x0) at Python/pystate.c:213
213	    return new_threadstate(interp, 1);
(gdb) list
208	}
209	
210	PyThreadState *
211	PyThreadState_New(PyInterpreterState *interp)
212	{
213	    return new_threadstate(interp, 1);
214	}
215	
216	PyThreadState *
217	_PyThreadState_Prealloc(PyInterpreterState *interp)
(gdb) p interp
$1 = (PyInterpreterState *) 0x0
(gdb) up
#2  PyGILState_Ensure () at Python/pystate.c:604
604	        tcur = PyThreadState_New(autoInterpreterState);
(gdb) p autoInterpreterState
$2 = (PyInterpreterState *) 0x7ffff00008f0

Comment 24 Charalampos Stratakis 2018-06-06 11:53:45 UTC

(In reply to Lon Hohberger from comment #23)
> It's the test code lifted directly from cffi 1.11.2 test runs.
> 
> The code is somewhat complicated since it's generated code.
> 
> I did write the Makefile, however, to make it easier to run (e.g. so one
> doesn't have to build all of CFFI).
> 
> It also fails in the same location against Python 2.7.15, built from source.
> So, my expectation is that portions of CFFI will no longer work with 2.7.15:
> 
> Thread 3 "thread3-test" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff68c4700 (LWP 18472)]
> new_threadstate (init=1, interp=0x0) at Python/pystate.c:202
> 202	        tstate->next = interp->tstate_head;
> (gdb) bt
> #0  new_threadstate (init=1, interp=0x0) at Python/pystate.c:202
> #1  PyThreadState_New (interp=0x0) at Python/pystate.c:213
> #2  PyGILState_Ensure () at Python/pystate.c:604
> #3  0x0000000000406336 in _cffi_initialize_python ()
> #4  0x00000000004067e3 in _cffi_start_python ()
> #5  0x0000000000406857 in _cffi_start_and_call_python ()
> #6  0x0000000000406952 in add3 ()
> #7  0x0000000000406b34 in start_routine_3 ()
> #8  0x00007ffff7bbf36d in start_thread () from /lib64/libpthread.so.0
> #9  0x00007ffff71d6b4f in clone () from /lib64/libc.so.6
> (gdb) up
> #1  PyThreadState_New (interp=0x0) at Python/pystate.c:213
> 213	    return new_threadstate(interp, 1);
> (gdb) list
> 208	}
> 209	
> 210	PyThreadState *
> 211	PyThreadState_New(PyInterpreterState *interp)
> 212	{
> 213	    return new_threadstate(interp, 1);
> 214	}
> 215	
> 216	PyThreadState *
> 217	_PyThreadState_Prealloc(PyInterpreterState *interp)
> (gdb) p interp
> $1 = (PyInterpreterState *) 0x0
> (gdb) up
> #2  PyGILState_Ensure () at Python/pystate.c:604
> 604	        tcur = PyThreadState_New(autoInterpreterState);
> (gdb) p autoInterpreterState
> $2 = (PyInterpreterState *) 0x7ffff00008f0

Ideally this should be resolved upstream first. Could you create an issue at bugs.python.org ?

Comment 25 Petr Viktorin (pviktori) 2018-06-07 14:27:55 UTC

Victor, if you get some time, could you investigate this one?

Comment 26 Victor Stinner 2018-06-13 16:12:12 UTC

I'm not sure that the bug is related, but Python 2.7.15 contains the following fix:
* "PyGILState_Ensure on non-Python thread causes fatal error"
* https://bugs.python.org/issue20891
* https://github.com/python/cpython/commit/be6b74c0795b709c7a04e2187a7e32d08f5155f6

Comment 27 Victor Stinner 2018-06-15 10:16:02 UTC

Ok, I'm able to reproduce the bug on Fedora 28 using attached "CFFI thread test case 3".

The bug is a race condition in _cffi_initialize_python() of the test case. The first thread calls Py_InitializeEx(), but then other threads call PyGILState_Ensure(). If PyGILState_Ensure() is called before Py_InitializeEx() completes, you get the crash.

Example of crash. Thread A is still running Py_InitializeEx() while Thread B calls PyGILState_Ensure().

Thread A:

#0  0x00007ffff77d82f0 in PyDict_GetItem (op=0x7ffff7f74060, key=0x7ffff7f65ee0) at Objects/dictobject.c:714
#1  0x00007ffff77dd5fe in PyDict_GetItemString (v=0x7ffff7f74060, key=0x7ffff790de80 "__doc__") at Objects/dictobject.c:2509
#2  0x00007ffff7819276 in PyType_Ready (type=0x7ffff7b5a8e0 <_PyExc_LookupError>) at Objects/typeobject.c:4260
#3  0x00007ffff77a89b5 in _PyExc_Init () at Objects/exceptions.c:2078
#4  0x00007ffff78bc60e in Py_InitializeEx (install_sigs=0) at Python/pythonrun.c:265
#5  0x000000000040182c in _cffi_py_initialize () at _add2_cffi.c:782
#6  0x0000000000401851 in _cffi_initialize_python () at _add2_cffi.c:805
#7  0x0000000000401e6a in _cffi_start_python () at _add2_cffi.c:1103
#8  0x0000000000401ede in _cffi_start_and_call_python (externpy=0x604200 <_cffi_externpy__add2>, args=0x7ffff6bceea0 "(") at _add2_cffi.c:1143
#9  0x0000000000401fc9 in add2 (a0=40, a1=2, a2=100) at _add2_cffi.c:1209
#10 0x0000000000402c15 in start_routine_2 (arg=0x0) at thread3-test.c:14
#11 0x00007ffff7bbf594 in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffff746500f in clone () from /lib64/libc.so.6

Thread B:

#0  0x00007ffff73a1f2b in raise () from /lib64/libc.so.6
#1  0x00007ffff738c561 in abort () from /lib64/libc.so.6
#2  0x00007ffff738c431 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007ffff739a692 in __assert_fail () from /lib64/libc.so.6
#4  0x00007ffff78bbe7b in PyGILState_Ensure () at Python/pystate.c:600
#5  0x000000000040233b in _cffi_initialize_python () at _add3_cffi.c:815
#6  0x0000000000402940 in _cffi_start_python () at _add3_cffi.c:1098
#7  0x00000000004029b4 in _cffi_start_and_call_python (externpy=0x604280 <_cffi_externpy__add3>, args=0x7ffff63cde90 "\350\003") at _add3_cffi.c:1138
#8  0x0000000000402aaf in add3 (a0=1000, a1=200, a2=30, a3=4) at _add3_cffi.c:1206
#9  0x0000000000402c91 in start_routine_3 (arg=0x0) at thread3-test.c:26
#10 0x00007ffff7bbf594 in start_thread () from /lib64/libpthread.so.0
#11 0x00007ffff746500f in clone () from /lib64/libc.so.6

Code of _cffi_carefully_make_gil():

https://bitbucket.org/cffi/cffi/src/af3a1e8a51e01affd0542727bb9e90fc3b90dafb/cffi/_embedding.h?at=default&fileviewer=file-view-default#_embedding.h-248

Comment 28 Victor Stinner 2018-06-15 10:35:30 UTC

It seems like the bug has been fixed in cffi 1.11.5 by the following commit:
https://bitbucket.org/cffi/cffi/commits/7b46e415e208

I reassign the bug to python-cffi.

Comment 29 Christian Heimes 2018-06-15 10:51:48 UTC

Upstream bug report: https://bitbucket.org/cffi/cffi/issues/358/

Comment 33 Christian Heimes 2019-02-27 10:54:51 UTC

I'm considering to close this BZ as WONTFIX. The crash is just a test artefact. Supported production cases are not affected by the issue. RHEL 8 contains a newer version of CFFI that does not observe the crash, too.

I don't want to risk updating CFFI just to silence a test problem.

Comment 34 Martin Kosek 2019-03-14 15:27:34 UTC

As I saw no concerns with Christian's plan and I agree with him, I am closing the bug.

Note You need to log in before you can comment on or make changes to this bug.