Description of problem:
This bug was submitted by Qin Li to glibc bugzilla earlier this year, with a one-line patch, though it hasn't been merged into glibc yet:
Version-Release number of selected component: glibc-2.27 onwards
How reproducible: reliably, try the repro from the sourceware url above
Actual results: deadlocks after 30-120 minutes on a 4-core Fedora 32 box
Expected results: should never deadlock
This bug in pthread conditions will deadlock the OCaml runtime, as well as Python and .NET applications.
The bug was introduced in glibc 2.27 and is still present in glibc 2.31.
I confirm the repro from the above deadlocks on Fedora 32. Takes about 30-180 minutes on a 4 core server.
I further confirm that the one-line fix to glibc at the above applies cleanly to Fedora 32's glibc source rpm, and does not deadlock after running the repro for more than 30 hours.
Please kindly consider merging the one-line fix into Fedora glibc.
More background about this bug, for the sake of future internet searchers:
Created attachment 1722977 [details]
test case repro from sourceware entry
Created attachment 1722978 [details]
one-line patch to glibc that fixes the deadlock
We are looking to fix this for Fedora and Red Hat Enterprise Linux 8 as this has impact to users on both platforms.
Created attachment 1725573 [details]
testcase with abort() on stuck
Small modification to upstream testcase that abort()s when the loop is stuck for several iterations.