Bug 90036
Summary: | race/deadlock in fork() with signal handler. | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Jay Fenlason <fenlason> | ||||
Component: | glibc | Assignee: | Roland McGrath <roland> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 9 | CC: | a.keusch, alberto, astrand, benm, brett.porter, bugzilla, chaos, djh, drepper, esimmonds, fweimer, g-man, gneeki, grenoml, ivo, j1, jerry, jfeeney, jung, lamont_gilbert, list, lists, mail, matt, mcauleyt, mikeraz, mitr, myoung, nixuser, redhat_bugzilla, rpm, schlegel, seth.fischer, shishz, t8m, target, tkokchi | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 2.3.2-27.9.4 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2003-11-20 17:32:01 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jay Fenlason
2003-05-01 14:57:33 UTC
*** Bug 89643 has been marked as a duplicate of this bug. *** *** Bug 88841 has been marked as a duplicate of this bug. *** *** Bug 82820 has been marked as a duplicate of this bug. *** *** Bug 89197 has been marked as a duplicate of this bug. *** The fork function is not signal-safe, which is a bug. In the smbmount case what happens is that the fork child runs before the parent and sends its parent a signal. The parent's signal handler calls exit, which deadlocks with an internal lock held by the interrupted fork. I'm attaching a trivial test program. When that's linked with a library that has a destructor, it hangs. e.g. "gcc -o forkloser -g forkloser.c -lanl" Created attachment 91467 [details]
test case for glibc/nptl fork bug.
Link with some library that has destructors to demonstrate the bug.
e.g. "gcc -o forkloser -g forkloser.c -lm" is what I tried.
Hang may depend on child-runs-first, but I saw it on an smp kernel as well.
I've checked into the nptl cvs archive a patch which removes the lock for calling the registered handlers in fork. It'll be in the next binary RPMs we'll publish. Sorry - trying to add myself to cc list :~) This is serious problem for me. Any ideas on when the fix will be released? We are experiencing smbmount hangs about 80% of the time using RH9. This is a serious issue for us. Is there an estimated timeframe for a fix on this? Can we recompile using LD_ASSUME_KERNEL=2.2.5 to avoid this problem? Which specific packages should be recompiled? This is problem is killing me. I have had good luck rolling back to kernel 2.4.18 from RedHat 8.0, so I would recommend trying that out. The RPM is easy to find ( https://rhn.redhat.com/errata/RHSA-2003-098.html ). Changing the kernel version may change the scheduler behavior (and thus the chance that the child process will run before the parent), but will not address the actual bug. Only fixing glibc will 100% prevent this hang. *** Bug 97325 has been marked as a duplicate of this bug. *** I've used the following on my rh9 :- mv /usr/bin/smbmount /usr/bin/_smbmount more <<EOT > /usr/bin/smbmount #!/bin/sh /usr/bin/_smbmount \$1 \$2 \$3 \$4 \$5 \$6 \$7 \$8 \$9 & sleep 1 kill -QUIT \$! > /dev/null EOT chmod 0755 /usr/bin/smbmount The following commands apparently fixes the problem for me: mv /usr/bin/smbmount /usr/bin/smbmount.orig cat <<EOF >/usr/bin/smbmount #!/bin/bash export LD_ASSUME_KERNEL=2.2.5 exec /usr/bin/smbmount.orig "$@" EOF chmod 755 /usr/bin/smbmount -Yenya Is this related to Bug 88599? What is the ETA on the new binary RPMs? (per comment #7) None of the posted workarounds work for me, and this is becoming a serious problem. HA! Forget it. This bug is so old I have had time to install Gentoo. Wait for RedHat X. Seriously though, roll back to the 2.4.18 kernel from 8.0. It doesn't fix the problem (as noted above), just stops you from EVER seeing it again (on the 3 boxen I have tried it on). You could try a custom kernel too I guess. *** Bug 97743 has been marked as a duplicate of this bug. *** the workaround from KAS, comment #16, worked in our environemnt. RH9, Shrike, as repackedged by KRUD, Sept. 2003 edition. *** Bug 103202 has been marked as a duplicate of this bug. *** Is this bug un-fixable or something? It's obviously not obscure since so many other bugs have been marked as duplicates, so lots of folks are running into problems with it. (none of the fixes work for me so I'm ranting a bit) Seriously though, what gives? 100% reproducible for me on Dell Inspirion 4550 RH 9.0. I mount from the /etc/fstab, so that means that my system doesn't boot Bug #89589 is also a dupe of this one. For those who are still suffering under this: the workaround in comment #11 does work. Sorry, I was terribly unclear. I meant "the workaround in comment #11 of bug #89589 does work". Give the code at ftp://people.redhat.com/jakub/glibc/errata/2.3.2-27.9.4/ a try. It should have the fix for this bug (among others). Beautiful - works great so far! Thanks for the pointer to the new rpms. (comment #27) Closing as fixed in current version. What exactly is the current version? And which package are you speaking of? Samba, glibc, or the kernel? I Just installed RH9 over the weekend (12/20/03) and upgraded all packages RHN suggested. I am having oplock issues with my shares, not with mounting, but with file locking it would seem from smbd.log. Trying to get the fix so I can leave oplocks on hopefully. Thanks. Looks like these aren't the drones... My issue matches bug 98861 better. The link is no longer available for: ftp://people.redhat.com/jakub/glibc/errata/2.3.2-27.9.4/ The closest there are are: I presume it is now: ftp://people.redhat.com/jakub/glibc/errata/2.3.2-95.11/ |