Bug 90036

Summary: race/deadlock in fork() with signal handler.
Product: [Retired] Red Hat Linux Reporter: Jay Fenlason <fenlason>
Component: glibcAssignee: Roland McGrath <roland>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 9CC: a.keusch, alberto, astrand, benm, brett.porter, bugzilla, chaos, djh, drepper, esimmonds, fweimer, g-man, gneeki, grenoml, ivo, j1, jerry, jfeeney, jung, lamont_gilbert, list, lists, mail, matt, mcauleyt, mikeraz, mitr, myoung, nixuser, redhat_bugzilla, rpm, schlegel, seth.fischer, shishz, t8m, target, tkokchi
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: 2.3.2-27.9.4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-11-20 17:32:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test case for glibc/nptl fork bug. none

Description Jay Fenlason 2003-05-01 14:57:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Description of problem:
Random hangs in smbmount.  roland says:
It looks like the loser case is the parent doing fork, and getting the SIGTERM
as it returns from the syscall (because the child is scheduled first).  Then the
signal handler calling exit deadlocks with a lock that fork holds.


Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1.Add an entry to /etc/fstab like //192.168.48.120/samba1
/mnt/smb/rhl-8-0/samba1 smbfs uid=3616,gid=3616,password=samba1,username=samba1 0 0
2.mount -a -t smbfs
3. If the mount succeeds, do umount -a -t smbfs and repeat steps 2 and 3 until
it hangs
    

Actual Results:  Sometimes the mount hangs.  On some boxes it hangs often (like
rhl-9.lab.boston.redhat.com)  On others it (almost?) never does.

Expected Results:  The mount should never hang.

Additional info:

Comment 1 Jay Fenlason 2003-05-01 15:15:02 UTC
*** Bug 89643 has been marked as a duplicate of this bug. ***

Comment 2 Jay Fenlason 2003-05-01 15:16:26 UTC
*** Bug 88841 has been marked as a duplicate of this bug. ***

Comment 3 Jay Fenlason 2003-05-01 15:25:00 UTC
*** Bug 82820 has been marked as a duplicate of this bug. ***

Comment 4 Jay Fenlason 2003-05-01 15:26:24 UTC
*** Bug 89197 has been marked as a duplicate of this bug. ***

Comment 5 Roland McGrath 2003-05-02 00:50:10 UTC
The fork function is not signal-safe, which is a bug.
In the smbmount case what happens is that the fork child runs 
before the parent and sends its parent a signal.  The parent's signal
handler calls exit, which deadlocks with an internal lock held by 
the interrupted fork.  I'm attaching a trivial test program.
When that's linked with a library that has a destructor, it hangs.
e.g. "gcc -o forkloser -g forkloser.c -lanl"



Comment 6 Roland McGrath 2003-05-02 00:53:36 UTC
Created attachment 91467 [details]
test case for glibc/nptl fork bug.  

Link with some library that has destructors to demonstrate the bug.
e.g. "gcc -o forkloser -g forkloser.c -lm" is what I tried.
Hang may depend on child-runs-first, but I saw it on an smp kernel as well.

Comment 7 Ulrich Drepper 2003-05-09 03:17:42 UTC
I've checked into the nptl cvs archive a patch which removes the lock for
calling the registered handlers in fork.  It'll be in the next binary RPMs we'll
publish.

Comment 8 vagabond 2003-05-16 13:43:13 UTC
Sorry - trying to add myself to cc list :~)


Comment 9 Edward Simmonds 2003-05-16 20:52:28 UTC
This is serious problem for me.   Any ideas on when the fix will be released?


Comment 10 G. Reno 2003-05-24 16:53:48 UTC
We are experiencing smbmount hangs about 80% of the time using RH9.  This is a
serious issue for us.  Is there an estimated timeframe for a fix on this?



Comment 11 Edward Simmonds 2003-05-30 15:42:31 UTC
Can we recompile using LD_ASSUME_KERNEL=2.2.5 to avoid this problem?  Which
specific packages should be recompiled?  This is problem is killing me.


Comment 12 Derek Anderson 2003-05-30 16:16:47 UTC
I have had good luck rolling back to kernel 2.4.18 from RedHat 8.0, so I would
recommend trying that out. The RPM is easy to find (
https://rhn.redhat.com/errata/RHSA-2003-098.html ).

Comment 13 Jay Fenlason 2003-06-03 16:13:28 UTC
Changing the kernel version may change the scheduler behavior (and thus the
chance that the child process will run before the parent), but will not address
the actual bug.  Only fixing glibc will 100% prevent this hang.

Comment 14 Jay Fenlason 2003-06-13 19:14:20 UTC
*** Bug 97325 has been marked as a duplicate of this bug. ***

Comment 15 Ivan Wilks 2003-06-14 13:26:28 UTC
I've used the following on my rh9 :-

 mv /usr/bin/smbmount  /usr/bin/_smbmount
 more <<EOT > /usr/bin/smbmount
 #!/bin/sh
 /usr/bin/_smbmount \$1 \$2 \$3 \$4 \$5 \$6 \$7 \$8 \$9  &        
 sleep 1
 kill -QUIT \$! > /dev/null
 EOT
 chmod 0755 /usr/bin/smbmount


Comment 16 Jan "Yenya" Kasprzak 2003-06-16 19:56:56 UTC
The following commands apparently fixes the problem for me:

mv /usr/bin/smbmount /usr/bin/smbmount.orig
cat <<EOF >/usr/bin/smbmount
#!/bin/bash
export LD_ASSUME_KERNEL=2.2.5
exec /usr/bin/smbmount.orig "$@"
EOF
chmod 755 /usr/bin/smbmount

-Yenya

Comment 17 Steven Weigand 2003-07-23 13:51:39 UTC
Is this related to Bug 88599?

Comment 18 matt 2003-08-06 17:42:04 UTC
What is the ETA on the new binary RPMs?  (per comment #7)  None of the posted 
workarounds work for me, and this is becoming a serious problem.

Comment 19 Derek Anderson 2003-08-08 22:59:25 UTC
HA! Forget it. This bug is so old I have had time to install Gentoo. Wait for
RedHat X.

Seriously though, roll back to the 2.4.18 kernel from 8.0. It doesn't fix the
problem (as noted above), just stops you from EVER seeing it again (on the 3
boxen I have tried it on).

You could try a custom kernel too I guess.

Comment 20 Jay Fenlason 2003-09-09 14:50:03 UTC
*** Bug 97743 has been marked as a duplicate of this bug. ***

Comment 21 Need Real Name 2003-09-24 19:38:06 UTC
the workaround from KAS, comment #16, worked in our environemnt.  RH9, Shrike, 
as repackedged by KRUD, Sept. 2003 edition. 

Comment 22 Jay Fenlason 2003-10-06 21:46:32 UTC
*** Bug 103202 has been marked as a duplicate of this bug. ***

Comment 23 matt 2003-10-14 19:13:55 UTC
Is this bug un-fixable or something?  It's obviously not obscure since so many 
other bugs have been marked as duplicates, so lots of folks are running into 
problems with it.  (none of the fixes work for me so I'm ranting a bit)  
Seriously though, what gives?

Comment 24 Andrey Jivsov 2003-10-18 22:39:40 UTC
100% reproducible for me on Dell Inspirion 4550 RH 9.0. I mount from 
the /etc/fstab, so that means that my system doesn't boot

Comment 25 Zenon Panoussis 2003-11-01 11:51:27 UTC
Bug #89589 is also a dupe of this one. For those who are still
suffering under this: the workaround in comment #11 does work. 

Comment 26 Zenon Panoussis 2003-11-01 11:54:20 UTC
Sorry, I was terribly unclear. I meant "the workaround in comment #11
of bug #89589 does work". 

Comment 27 Ulrich Drepper 2003-11-04 21:26:43 UTC
Give the code at

  ftp://people.redhat.com/jakub/glibc/errata/2.3.2-27.9.4/           
                                                                    

a try.  It should have the fix for this bug (among others).

Comment 28 matt 2003-11-05 02:56:52 UTC
Beautiful - works great so far!  Thanks for the pointer to the new 
rpms.  (comment #27)


Comment 29 Ulrich Drepper 2003-11-20 17:32:01 UTC
Closing as fixed in current version.

Comment 30 dnoyeb 2003-12-20 23:17:46 UTC
What exactly is the current version?  And which package are you
speaking of? Samba, glibc, or the kernel?

I Just installed RH9 over the weekend (12/20/03) and upgraded all
packages RHN suggested.  I am having oplock issues with my shares, not
with mounting, but with file locking it would seem from smbd.log.

Trying to get the fix so I can leave oplocks on hopefully.

Thanks.

Comment 31 dnoyeb 2003-12-21 03:18:06 UTC
Looks like these aren't the drones...
My issue matches bug 98861 better.

Comment 32 Alex Scott 2004-02-20 14:49:34 UTC
The link is no longer available for:
ftp://people.redhat.com/jakub/glibc/errata/2.3.2-27.9.4/ 
The closest there are are: 

I presume it is now:
ftp://people.redhat.com/jakub/glibc/errata/2.3.2-95.11/