Bug 737387 - strange pthread/fork deadlock
Summary: strange pthread/fork deadlock
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 16
Hardware: All
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Andreas Schwab
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-11 17:56 UTC by Allison Karlitskaya
Modified: 2017-02-09 11:19 UTC (History)
4 users (show)

Fixed In Version: glibc-2.14.90-10
Clone Of:
: 738665 (view as bug list)
Environment:
Last Closed: 2011-10-02 18:13:15 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
test case demonstrating the issue (810 bytes, text/x-csrc)
2011-09-11 17:56 UTC, Allison Karlitskaya
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNOME Bugzilla 657891 0 None None None Never
Sourceware 13699 0 None None None 2017-02-09 11:19:18 UTC

Description Allison Karlitskaya 2011-09-11 17:56:01 UTC
Created attachment 522617 [details]
test case demonstrating the issue

There appears to be a strange bug in glibc that causes deadlocks when calling fork() from threads. We had a testcase in GLib failing from time to time because of this.

I've attached a minimal testcase that uses only pure pthreads + libc. Compile it with -pthread and run it. It should fill your screen with dots for a while, then hang when it hits the bug (which happens randomly anywhere between 1 dot and hundreds). I've already received independent verification that this testcase hangs on several people's computers.

I believe this to be an upstream issue since this bug is visible on Ubuntu as well, but the glibc website says I should file bugs against distributions first. I also believe the issue to be a regression since older Fedora and RHEL releases are unaffected.  The problem appears to affect both 32 and 64bits.
Description of problem:

Some notes:

 - compiling the testcase with -static has the side-effect of causing the
   bug to go away

 - compiling the testcase with -DFORK_DIRECTLY also appears to solve the
   problem

 - replacing the execv() with a direct exit(0) doesn't solve the problem
   but causes the frequency to change

The fact that both static linking and making the fork() syscall directly cause the problem to disappear leads me to believe that this is a libc bug rather than a kernel bug (which is the only other possibility). I'm not 100% sure of that, though, since libc actually uses the clone() syscall to implement fork(), so there could be a different inside the kernel because of that.

Comment 1 Fedora Update System 2011-09-16 14:28:37 UTC
glibc-2.14.90-9 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/glibc-2.14.90-9

Comment 2 Allison Karlitskaya 2011-09-16 21:11:20 UTC
Thanks for the awesome turnaround.  I installed the update from testing on my F16 system and it appears to fix the problem.

Comment 3 Fedora Update System 2011-09-17 19:34:50 UTC
Package glibc-2.14.90-9:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing glibc-2.14.90-9'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/glibc-2.14.90-9
then log in and leave karma (feedback).

Comment 4 Fedora Update System 2011-09-28 18:52:27 UTC
Package glibc-2.14.90-10:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing glibc-2.14.90-10'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/glibc-2.14.90-10
then log in and leave karma (feedback).

Comment 5 Fedora Update System 2011-10-02 18:12:47 UTC
glibc-2.14.90-10 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.