Bug 90902

Summary: starting amd randomly hangs (possibly futex bug)
Product: [Retired] Red Hat Linux Reporter: Nicolas Turro <nicolas.turro>
Component: am-utilsAssignee: Peter Vrabec <pvrabec>
Status: CLOSED CURRENTRELEASE QA Contact: Jay Turner <jturner>
Severity: medium Docs Contact:
Priority: medium    
Version: 9CC: karlamrhein, srevivo, tim
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-04-13 08:24:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Version 9 kernel causes amd to hang on boot. none

Description Nicolas Turro 2003-05-15 08:31:12 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030314

Description of problem:
i am running Redhat 9.0 ( kernel 2.4.20 )
and am-utils (am-utils-6.0.9-2)  (because i need the browsing feature
that automount doen't support).

Unfortunatelly, amd sometimes hangs at boot time during its
initialization (/etc/rc.d/init.d/amd ).
I can reproduce this bug with /etc/rc.d/init.d/amd start / stop 
sequences, sometimes the start hangs sometimes it works.
This bug occurs on ALL RedHat 9.0 boxes we have (7 PC with totally
different hardware).

When hanging i can observe the following processes :

root      2444  1911  0 17:14 pts/0    00:00:00 /bin/bash
/etc/rc.d/init.d/amd start
root      2453  2444  0 17:14 pts/0    00:00:00 initlog -q -c
/usr/sbin/amd -F /etc/amd.conf
root      2454  2453  0 17:14 pts/0    00:00:00 /usr/sbin/amd -F
/etc/amd.conf
root      2455  2454  0 17:14 ?        00:00:00 /usr/sbin/amd -F
/etc/amd.conf

with the following traces :

[root@redhat-serv root]# strace -p 2453
wait4(2454, 0xbfffd9cc, WNOHANG, NULL)  = 0
nanosleep({0, 500000}, NULL)            = 0
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}], 2,
500) = 0
wait4(2454, 0xbfffd9cc, WNOHANG, NULL)  = 0
nanosleep({0, 500000}, NULL)            = 0
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}], 2,
500) = 0
wait4(2454, 0xbfffd9cc, WNOHANG, NULL)  = 0
nanosleep({0, 500000}, NULL)            = 0
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}], 2,
500) = 0
wait4(2454, 0xbfffd9cc, WNOHANG, NULL)  = 0
nanosleep({0, 500000}, NULL)            = 0
poll( <unfinished ...>


[root@redhat-serv root]# strace -p 2454
futex(0x4212e1c8, FUTEX_WAIT, -2, NULL <unfinished ...>


[root@redhat-serv root]# strace -p 2455
select(1024, [4 5 6 7], NULL, NULL, {932, 980000} <unfinished ...>

Despite the fact the "father" process hangs, the child amd works
and the service amd runs perfectly, BUT at boot time, the boot
sequence would block in /etc/rc.d/init.d/amd

The SAME configuration worked perfectly with RH 8.0, am-utils-6.0.7-9,
kernel 2.4.18

I wasn't able to reproduce the bug with a RH 8.0 kernel (2.4.18-14) 
on top of my RH 9.0 install. Sounds like a kernel bug or a misusage of futexes ?

Version-Release number of selected component (if applicable):
am-utils-6.0.9-2

How reproducible:
Sometimes

Steps to Reproduce:
1.service amd stop
2.service amd start
3.goto 1 until amd hangs
    

Actual Results:  amd forks but the father process doesn't die

Expected Results:  amd should go in the background and the boot process should
continue

Additional info:

Seems to be a futex/kernel/libc bug since reverting to an older kernel
fixes it.

Comment 1 Jim Waldram 2003-05-15 21:18:32 UTC
Created attachment 91705 [details]
Version 9 kernel causes amd to hang on boot.

Comment 2 Karl Amrhein 2003-05-27 21:12:55 UTC
ksa

Comment 3 Karl Amrhein 2003-05-27 21:14:52 UTC
We (at SLAC) confirm this problem described by Nicolas.

Comment 4 Tim Tregubov 2003-06-11 22:31:08 UTC
We at dartmouth college also confirm this problem, happens on every one of our
RH9 machines...  Any ETA on a fix?

Thanks much!

Cheers,
Tim

Comment 5 diego.santacruz 2003-07-16 08:30:10 UTC
The following workaround apparently solves the problem (by disabling futex use).

Add to the top of /etc/init.d/amd the line

export LD_ASSUME_KERNEL=2.4.1


Could this (temporary) fix be included in an updated version of the am-utils
package until the correct fix is found?