Bug 495107

Summary: boot hangs between starting udev and initializing hardware
Product: Red Hat Enterprise Linux 4 Reporter: Nate Straz <nstraz>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 4.8Keywords: Regression, TestBlocker
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-04-09 20:01:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nate Straz 2009-04-09 19:03:02 UTC
Description of problem:

Our x86 test systems no longer boot with kernel-2.6.9-86.EL.  They did boot with kernel-2.6.9-85.EL.

The serial console output looks like:

                Welcome to Red Hat Enterprise Linux AS
                Press 'I' to enter interactive startup.
Setting clock  (utc): Thu Apr  9 13:23:23 CDT 2009 [  OK  ]
Starting udev:  [  OK  ]
*HANG*

Next expected output:

Initializing hardware...  storage network audio done[  OK  ]

I tried to collect SysRq-T output and reset the system with SysRq-B, but it did not work.  I was able to see "SysRq : Show State" but nothing after that.  SysRq-B prints "SysRq: Resetting" but the system does not reboot.

Version-Release number of selected component (if applicable):
kernel-2.6.9-86.EL

How reproducible:
every time on my hardware

Steps to Reproduce:
1. Install RHEL4-U8-re20090401.0 on an i386 system with a serial console?
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Nate Straz 2009-04-09 19:23:31 UTC
I added some tracing to rc.sysinit and then start_udev and finally straced udevsettle to find this:

nanosleep({0, 500000000}, NULL)         = 0
access("/dev/.udevsettle", F_OK)        = 0
nanosleep({0, 500000000}, NULL)         = 0
access("/dev/.udevsettle", F_OK)        = 0
nanosleep({0, 500000000}, NULL)         = 0
access("/dev/.udevsettle", F_OK)        = 0
nanosleep({0, 500000000},

The check for .udevsettle repeated 23 times, then the nanosleep hung.  I was able to get backtraces this time, but showPc output is more interesting

SysRq : Show Regs

Pid: 2982, comm:                 udev
EIP: 0060:[<022d457a>] CPU: 0
EIP is at _spin_lock_irqsave+0x38/0x45
 EFLAGS: 00000286    Not tainted  (2.6.9-86.ELhugemem)
EAX: c1f7c2d8 EBX: 00000246 ECX: 00000015 EDX: 00000001
ESI: c1f7c2d8 EDI: c1f7c258 EBP: c1f7c0f8 DS: 007b ES: 007b
CR0: 8005003b CR2: 08549054 CR3: 003ce000 CR4: 000006f0
 [<c2894450>] qla2x00_sysfs_read_nvram+0x4a/0x7a [qla2xxx]
 [<0219275a>] fill_read+0x1e/0x22
 [<021927f8>] read+0x9a/0xd6
 [<0215c7f1>] vfs_read+0xb6/0xe2
 [<0215ca06>] sys_read+0x3c/0x62

This system does have a QLogic HBA.

Comment 2 Nate Straz 2009-04-09 19:45:16 UTC
I tried kernel-2.6.9-87.EL and that boots on my systems.  Perhaps this was fixed by 476704.

Comment 3 Nate Straz 2009-04-09 20:01:27 UTC
Marking this a dup of 476704 since kernel-2.6.9-87.EL boots where kernel-2.6.9-86.EL would not and 476704 is the only qla2xxx related fix in -87.

*** This bug has been marked as a duplicate of bug 476704 ***