Bug 1972293
Summary: | Python36 crashes with libgcc_s.so.1 must be installed for pthread_cancel to work | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Jiri Danek <jdanek> | ||||
Component: | python3 | Assignee: | Python Maintainers <python-maint> | ||||
Status: | CLOSED ERRATA | QA Contact: | Lukáš Zachar <lzachar> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 8.4 | CC: | cstratak, pematous, pviktori, vstinner | ||||
Target Milestone: | beta | Keywords: | Triaged | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | python3-3.6.8-39.el8 | Doc Type: | No Doc Update | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-11-09 19:39:37 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jiri Danek
2021-06-15 15:34:39 UTC
When searching bugzilla, I found two similar issues, neither seems to be a duplicate, or provide to me any hints. https://bugzilla.redhat.com/show_bug.cgi?id=767094 https://bugzilla.redhat.com/show_bug.cgi?id=104173 For the record, I see this in Fedora Rawhide container as well with Python 3.6 as well as Python 3.9 but not with Python 3.10. https://bugs.python.org/issue18748 might be relevant (In reply to Miro Hrončok from comment #2) > For the record, I see this in Fedora Rawhide container as well with Python > 3.6 as well as Python 3.9 but not with Python 3.10. Actually, I was piping the output to `more` and when I don't do that, I cannot reproduce this with Python 3.9. I can reproduce this on Fedora Rawhide with Python 3.6 and 3.7, but not in 3.8+. That kinda supports the idea that this was fixed via https://bugs.python.org/issue18748 > This issue manifests both on RHEL 7 and RHEL 8 with Python version 3.6. It does not manifest with Python version 2.7 (on RHEL 7). Oh right, in Python 2.7, _thread.start_new_thread() doesn't call pthread_cancel() at the thread exit. It does in Python 3.6. The pthread_cancel() call is redundant and can be removed. Removing the call fixes this race condition. I proposed exactly that in Python upstream: * https://bugs.python.org/issue44434 * https://github.com/python/cpython/pull/26758 "How reproducible: Intermittently, but happens fairly frequently using the attached reproducer. About 10 attempts at running the reproduction steps should be sufficient to reproduce." Right, even if the file descriptor limit is very low (5), it remains hard to trigger the issue with 1000 threads. The race condition is hard to trigger. I attached 2 different reproducer scripts to https://bugs.python.org/issue44434 which make the race condition more likely. It seems like sometimes the libgcc_s library is loaded early during Python startup. Sometimes, it only loaded when the first thread exits. Sometimes, it goes fine. Sometimes, I get the abort() call with error message. "Workaround: export LD_PRELOAD=/usr/lib64/libgcc_s.so.1" Another is to use a larger file descriptor limit, but it only makes the race condition less likely, it doesn't fully fix it. Ok, the issue is now fixed in Python upstream in 3.9, 3.10 and main branches: https://bugs.python.org/issue44434 > This issue manifests both on RHEL 7 and RHEL 8 with Python version 3.6. It does not manifest with Python version 2.7 (on RHEL 7). Jiri Danek: Do you need a backport to Python 3.6 of RHEL7 and RHEL8, or is the "export LD_PRELOAD=/usr/lib64/libgcc_s.so.1" workaround acceptable for your use case? > Jiri Danek: Do you need a backport to Python 3.6 of RHEL7 and RHEL8 [...]?
TBH, I don't know. We only hit this issue during testing, it does not have an associated customer case. For testing the EMFILE error handling in Qpid Proton Python library, I feel that `export LD_PRELOAD=/usr/lib64/libgcc_s.so.1` workaround is perfectly satisfactory; now that we understand what's actually happening. Whether there is sufficient value in fixing the CPython interpreter itself, I can't tell. Proton in general was never all that good at handling resource exhaustion cases and given that prior experience, no-one really expects it to excel in this area. Meaning this sort of resiliency is not a crucial feature of the product. I will ask around the team and I will update here.
We discussed this on AMQ Clients project meeting. We think this issue should be fixed as part of regular RHEL bugfix erratas since it potentially affects all Python 3.6 users. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: python3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4399 |