1972293 – Python36 crashes with libgcc_s.so.1 must be installed for pthread_cancel to work

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1972293 - Python36 crashes with libgcc_s.so.1 must be installed for pthread_cancel to work

Summary: Python36 crashes with libgcc_s.so.1 must be installed for pthread_cancel to work

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	python3
Sub Component:
Version:	8.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	unspecified
Target Milestone:	beta
Target Release:	---
Assignee:	Python Maintainers
QA Contact:	Lukáš Zachar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-15 15:34 UTC by Jiri Danek
Modified:	2021-11-10 08:06 UTC (History)
CC List:	4 users (show)
Fixed In Version:	python3-3.6.8-39.el8
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-11-09 19:39:37 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
threads.py (353 bytes, text/plain) 2021-06-15 15:34 UTC, Jiri Danek	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Python	44434	0	None	None	None	2021-07-09 22:13:53 UTC
Red Hat Product Errata	RHSA-2021:4399	0	None	None	None	2021-11-09 19:39:54 UTC

Description Jiri Danek 2021-06-15 15:34:39 UTC

Created attachment 1791284 [details]
threads.py

Description of problem:
***********************

Python crashes with a core dump when I run the attached reproducer program after setting limit on file descriptors.

In practice, this issue was originally encountered as

https://issues.redhat.com/browse/ENTMQCL-1699
https://issues.redhat.com/browse/ENTMQCL-2787

This issue manifests both on RHEL 7 and RHEL 8 with Python version 3.6. It does not manifest with Python version 2.7 (on RHEL 7).

Steps to reproduce:
*******************

# bash
# prlimit --pid $$ --nofile=5:5
# python theads.py

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "loop.py", line 6, in run_in_thread
Exception: aaaaaaaaaa

libgcc_s.so.1 must be installed for pthread_cancel to work

Workaround:
***********

# export LD_PRELOAD=/usr/lib64/libgcc_s.so.1

Version-Release number of selected component (if applicable):
*************************************************************

latest docker run --rm -it registry.access.redhat.com/ubi8/ubi
python36 3.6.8-2.module+el8.1.0+3334+5cb623d7 from ubi-8-appstream

How reproducible:
*****************

Intermittently, but happens fairly frequently using the attached reproducer. About 10 attempts at running the reproduction steps should be sufficient to reproduce.

Stacktrace:
***********

There is a stacktrace in comments for https://issues.redhat.com/browse/ENTMQCL-1699. I was not able to get a corefile now, when reproducing the issue in docker. The core file is not created and coredump ctl does not report any cores.

Comment 1 Jiri Danek 2021-06-15 15:38:50 UTC

When searching bugzilla, I found two similar issues, neither seems to be a duplicate, or provide to me any hints.

https://bugzilla.redhat.com/show_bug.cgi?id=767094
https://bugzilla.redhat.com/show_bug.cgi?id=104173

Comment 2 Miro Hrončok 2021-06-15 20:29:51 UTC

For the record, I see this in Fedora Rawhide container as well with Python 3.6 as well as Python 3.9 but not with Python 3.10.

Comment 3 Miro Hrončok 2021-06-15 20:33:37 UTC

https://bugs.python.org/issue18748 might be relevant

Comment 4 Miro Hrončok 2021-06-15 20:40:56 UTC

(In reply to Miro Hrončok from comment #2)
> For the record, I see this in Fedora Rawhide container as well with Python
> 3.6 as well as Python 3.9 but not with Python 3.10.

Actually, I was piping the output to `more` and when I don't do that, I cannot reproduce this with Python 3.9.

I can reproduce this on Fedora Rawhide with Python 3.6 and 3.7, but not in 3.8+.

That kinda supports the idea that this was fixed via https://bugs.python.org/issue18748

Comment 5 Victor Stinner 2021-06-16 15:05:26 UTC

> This issue manifests both on RHEL 7 and RHEL 8 with Python version 3.6. It does not manifest with Python version 2.7 (on RHEL 7).

Oh right, in Python 2.7, _thread.start_new_thread() doesn't call pthread_cancel() at the thread exit. It does in Python 3.6.

The pthread_cancel() call is redundant and can be removed. Removing the call fixes this race condition.

I proposed exactly that in Python upstream:

* https://bugs.python.org/issue44434
* https://github.com/python/cpython/pull/26758


"How reproducible: Intermittently, but happens fairly frequently using the attached reproducer. About 10 attempts at running the reproduction steps should be sufficient to reproduce."

Right, even if the file descriptor limit is very low (5), it remains hard to trigger the issue with 1000 threads. The race condition is hard to trigger. I attached 2 different reproducer scripts to https://bugs.python.org/issue44434 which make the race condition more likely.

It seems like sometimes the libgcc_s library is loaded early during Python startup. Sometimes, it only loaded when the first thread exits. Sometimes, it goes fine. Sometimes, I get the abort() call with error message.


"Workaround: export LD_PRELOAD=/usr/lib64/libgcc_s.so.1"

Another is to use a larger file descriptor limit, but it only makes the race condition less likely, it doesn't fully fix it.

Comment 6 Victor Stinner 2021-06-21 12:32:25 UTC

Ok, the issue is now fixed in Python upstream in 3.9, 3.10 and main branches: https://bugs.python.org/issue44434

> This issue manifests both on RHEL 7 and RHEL 8 with Python version 3.6. It does not manifest with Python version 2.7 (on RHEL 7).

Jiri Danek: Do you need a backport to Python 3.6 of RHEL7 and RHEL8, or is the "export LD_PRELOAD=/usr/lib64/libgcc_s.so.1" workaround acceptable for your use case?

Comment 7 Jiri Danek 2021-06-22 14:53:40 UTC

> Jiri Danek: Do you need a backport to Python 3.6 of RHEL7 and RHEL8 [...]?

TBH, I don't know. We only hit this issue during testing, it does not have an associated customer case. For testing the EMFILE error handling in Qpid Proton Python library, I feel that `export LD_PRELOAD=/usr/lib64/libgcc_s.so.1` workaround is perfectly satisfactory; now that we understand what's actually happening. Whether there is sufficient value in fixing the CPython interpreter itself, I can't tell. Proton in general was never all that good at handling resource exhaustion cases and given that prior experience, no-one really expects it to excel in this area. Meaning this sort of resiliency is not a crucial feature of the product. I will ask around the team and I will update here.

Comment 8 Jiri Danek 2021-06-30 12:29:23 UTC

We discussed this on AMQ Clients project meeting. We think this issue should be fixed as part of regular RHEL bugfix erratas since it potentially affects all Python 3.6 users.

Comment 16 errata-xmlrpc 2021-11-09 19:39:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: python3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4399

Note You need to log in before you can comment on or make changes to this bug.