Bug 191453

Summary: race condition with getkey in appears to hang machine
Product: [Fedora] Fedora Reporter: Toshio Kuratomi <toshio>
Component: initscriptsAssignee: Bill Nottingham <notting>
Status: CLOSED RAWHIDE QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: mitr, rvokal
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 8.38-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-03 00:35:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Toshio Kuratomi 2006-05-12 04:48:01 UTC
Description of problem:
In some circumstances, rc.sysinit attempts to kill getkey before it has started.
 When getkey is started later, there is nothing to kill it so the computer
appears to hang.
Version-Release number of selected component (if applicable):
initscripts-8.31.1-1
kernel-2.6.16-1.2111_FC5
How reproducible:
100% on selected machines.  (a Dell GX150 Desktop and a Dell GX240 small form
factor (has a laptop cdrom) have the problem.

Steps to Reproduce:
1. Create a livecd with kadischi targetted at FC5 + updates
2. Boot the computer with the livecd
3.
  
Actual results:
After printing "Enabling Swap [ok]" the computer appears to hang.  Pressing "I"
causes the computer to move forward in running the boot scripts.

Expected results:
Boot will continue to a login prompt without user intervention.

Additional info:

I have been generating livecds with kadischi for a kiosk project.  On some
machines, these livecds run fine.  On others, the livecd freezes during the boot
after printing "Enabling swap [OK]".  Unplugging the network cable from the
network card allowed one to boot.  Removing the PCI riser card from another got
things to work.  Reverting to the original FC5 kernel also worked for one of
these machines but not the other.  A third (our prototype kiosk) didn't boot at
all under these conditions.

After some troubleshooting, I found that the computer was not actually hung. 
Instead, it was waiting on getkey.  Pressing "I" at this point would make getkey
happy and boot would continue.

Removing the /dev/null redirection from the kill -TERM `/sbin/pidof getkey` line
showed that kill was failing without finding a pid for getkey.  So there's
apparently a race between getkey being started and kill being invoked to
terminate it which is being triggered with a livecd on these machines.

I'm working on a different part of the project right now but will get back to
this later this month.  I can also generate and test new livecd's at any time if
you have a new initscripts package you want me to test.

Comment 1 Miloslav Trmač 2006-07-30 01:39:37 UTC
Fixed in CVS.  Thanks for your report.