Bug 1979990

Summary: glibc: pthread_cond_wait missed wakeup (swbz#25847)
Product: Red Hat Enterprise Linux 8 Reporter: Andrew Mike <amike>
Component: glibcAssignee: Carlos O'Donell <codonell>
Status: CLOSED MIGRATED QA Contact: qe-baseos-tools-bugs
Severity: urgent Docs Contact:
Priority: high    
Version: 8.4CC: aoliva, ashankar, cdeardor, codonell, dj, edwin+bugs, extras-qa, fweimer, jwright, law, mfabian, michael.bacarella, pfrankli, rth, schwerin, sipoyare
Target Milestone: betaKeywords: Bugfix, MigratedToJIRA, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1889892 Environment:
Last Closed: 2023-09-23 22:23:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1889892    
Bug Blocks:    

Description Andrew Mike 2021-07-07 14:44:13 UTC
+++ This bug was initially created as a clone of Bug #1889892 +++

Description of problem:

This bug was submitted by Qin Li to glibc bugzilla earlier this year, with a one-line patch, though it hasn't been merged into glibc yet:

https://sourceware.org/bugzilla/show_bug.cgi?id=25847

Version-Release number of selected component: glibc-2.27 onwards

How reproducible: reliably, try the repro from the sourceware url above

Actual results: deadlocks after 30-120 minutes on a 4-core Fedora 32 box

Expected results: should never deadlock

Additional info:

This bug in pthread conditions will deadlock the OCaml runtime, as well as Python and .NET applications.

The bug was introduced in glibc 2.27 and is still present in glibc 2.31.

I confirm the repro from the above deadlocks on Fedora 32. Takes about 30-180 minutes on a 4 core server.

I further confirm that the one-line fix to glibc at the above applies cleanly to Fedora 32's glibc source rpm, and does not deadlock after running the repro for more than 30 hours.

Please kindly consider merging the one-line fix into Fedora glibc.

More background about this bug, for the sake of future internet searchers:
* https://discuss.ocaml.org/t/is-there-a-known-recent-linux-locking-bug-that-affects-the-ocaml-runtime

--- Additional comment from Michael Bacarella on 2020-10-20 20:34:52 UTC ---

will deadlock

--- Additional comment from Michael Bacarella on 2020-10-20 20:35:47 UTC ---



--- Additional comment from Carlos O'Donell on 2020-10-27 13:21:56 UTC ---

We are looking to fix this for Fedora and Red Hat Enterprise Linux 8 as this has impact to users on both platforms.

--- Additional comment from Török Edwin on 2020-11-01 17:59:31 UTC ---

Small modification to upstream testcase that abort()s when the loop is stuck for several iterations.

--- Additional comment from Carlos O'Donell on 2020-11-10 14:25:18 UTC ---

Delaying the review of this until the end of November when we have more time to review upstream patches.

--- Additional comment from Fedora Program Management on 2021-04-29 17:06:51 UTC ---

This message is a reminder that Fedora 32 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 32 on 2021-05-25.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '32'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 32 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

--- Additional comment from Carlos O'Donell on 2021-04-29 20:14:38 UTC ---

Still a bug, and still in Rawhide.

Comment 1 Carlos O'Donell 2021-07-09 13:36:04 UTC
The goal is to review and improve the situation with pthread convdvar wakeup in upstream glibc 2.34 (releasing August 2021).

Any fixes that go upstream can then be considered for inclusion into RHEL8 from that point onwards.

I'm working upstream with a colleague at IBM to review the correctness of the fixes being proposed.

Comment 7 RHEL Program Management 2023-09-23 22:22:24 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 8 RHEL Program Management 2023-09-23 22:23:38 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.