Bug 130561

Summary: udev doesn't create raid devices (early enough?)
Product: [Fedora] Fedora Reporter: Alexandre Oliva <oliva>
Component: mkinitrdAssignee: Jeremy Katz <katzj>
Status: CLOSED RAWHIDE QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: ellenshull
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-08-31 14:42:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 123268    

Description Alexandre Oliva 2004-08-21 21:38:31 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2)
Gecko/20040809

Description of problem:
On systems whose root devices are on (logical volumes on) raid 1
devices, generating initrd with UDEV_INITRD="yes", raid devices are
not available at the time linuxrc attempts to raidautorun them.

Version-Release number of selected component (if applicable):
mkinitrd-4.1.1-1

How reproducible:
Always

Steps to Reproduce:
1.Run mkinitrd with any raid device active, having UDEV_INITRD=yes
2.Reboot

Actual Results:  raidautorun says the raid device is not available

Expected Results:  it should be available

Additional info:

Comment 1 Ellen Shull 2004-08-22 08:34:02 UTC
I'm seeing this too.  Gets as far as "Loading ext3.ko module" 
happily, then 
 
raidautorun: failed to open /dev/md0: 2 
Creating root device 
Mounting root filesystem 
mount: error 6 mounting ext3 
mount: error 2 mounting none 
Switching to new root 
switchroot: mount failed: 22 
umount /initrd/dev failed: 2 
kernel panic: Attempted to kill init! 
 
My root is raid 5 instead of raid 1 as above, but same idea.  Didn't 
try turning off UDEV_INITRD (didn't know about udev.conf) but revert 
to 4.0.6 and remake initrd and it works. 

Comment 2 Jeremy Katz 2004-08-24 20:15:57 UTC
Fixed in mkinitrd 4.1.4.

Now we check for if the device exists and if not, nash will create it
before trying to do the raidautorun.

Comment 3 Ellen Shull 2004-08-26 08:28:43 UTC
Tested 4.1.4, and it does indeed fix the problem for me.

Comment 4 Aleksandar Milivojevic 2004-12-16 15:47:09 UTC
I've just had problem with this on an FC3 machine.  mkinitrd-4.1.18-2.
 Actually, two problems.

First was that SCSI devices are detected asynchroniously, second was
that udev seems to be working asynchroniously.  Quick and dirty
workaround was adding "sleep 10" after "insmod /lib/sym53c8xx.ko"
line, and another "sleep 10" after "/sbin/udevstart" in init script
(inside initrd image).

Without the "sleep" lines, init script would simply load the modules,
and execute udevstart withoug waiting for things be initialized
properly.  When raidautorun is called, the devices are not yet
detected, so it fails to initialize RAID1 device that holds physical
volume (under LVM) with root partition on it.

I've asked about this on Linux kernel mailing list (LKML).  I got
answer that this should be delt within init script that is generated
by mkinitrd.  In short, PCI is hot-pluggable, and everything on it is
asynchronious.  When driver loads, it starts asynchroniously searching
for devices, which sometimes can take long time.  Making assumption
that things are synchronious (as mkinitrd seems to be making) is
simply wrong.  It takes about 5 seconds for sym53c8xx driver to detect
disk devices on my dual SCSI card (integragted into motherboard) after
it is loaded.  One person replied that FreeBSD is waiting 15 seconds
for SCSI buses to initialize.  Linux dosn't waste time waiting, nor
init script does.

I guess with most devices and drivers this race condition that exists
in mkinitrd generated init script won't be hit (drivers are fast
enough in detecting the hardware).  But the race condition exists, and
can be a problem on certain hardware configurations.

The fix made in 4.1.4 dosn't seem to affect this case.  The problem
here is not non-existing device node, the problem is that SCSI driver
is slow in detecting disks.  Should this bug be reopened?

Comment 5 Alexandre Oliva 2004-12-16 17:09:36 UTC
From your description, it doesn't look like the same bug at all, it
just happens to have a similar symptom.  So, open a new bug instead of
hijacking this one ;-)