Bug 520828

Summary: failure to start up root on raid 0.
Product: [Fedora] Fedora Reporter: Dave Jones <davej>
Component: dracutAssignee: Harald Hoyer <harald>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 12CC: harald, pfrields, Sascha.Zorn
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 004-4.fc12 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-01-28 00:53:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screen capture of failure to mount root none

Description Dave Jones 2009-09-02 15:21:43 UTC
I did a kickstart install which sets up a series of raid 0 stripes on two disks.
This worked fine with mkinitrd, but it seems to confuse dracut.

the kickstart file can be found at http://davej.fedorapeople.org/nwo.ks
just point any 2 disk machine at that ks, and change the nfs mountpoint to a different package repo

jpg attached of the info onscreen failing to mount.

sidenote: why doesn't dracut automatically drop me to a shell instead of uselessly waiting forever?

Comment 1 Dave Jones 2009-09-02 15:22:27 UTC
Created attachment 359547 [details]
screen capture of failure to mount root

Comment 2 Harald Hoyer 2009-09-02 15:29:57 UTC
> sidenote: why doesn't dracut automatically drop me to a shell instead of
> uselessly waiting forever?  

security.. just add "rdshell" to the kernel command line

oh, and please retry with dracut-001 which was built today.

Comment 3 Harald Hoyer 2009-09-02 15:30:32 UTC
oh, and you might want to add "rdinfo" or "rdinitdebug"

Comment 4 Dave Jones 2009-09-03 14:14:03 UTC
with dracut-001, it fails similarly, but I now also see ..

/initqueue/mdraid_start.sh:23: grep: not found a dozen or so times.

Comment 5 Harald Hoyer 2009-09-03 15:09:02 UTC
... gah.. sry, please build with

# dracut -a debug <....same options as before...>

Comment 6 Harald Hoyer 2009-09-03 15:49:14 UTC
or use dracut-001-2

Comment 7 Dave Jones 2009-09-11 18:14:05 UTC
with todays tree, it does actually mount, and fully boot the installed system.
Looking at dmesg though, it looks like we're doing this in a suboptimal manner.

http://davej.fedorapeople.org/nwo-dmesg.txt

Note the 'already has disks' messages around 5 seconds in.
Also, at 7 seconds, we seem to be tearing everything down

Then at 14 seconds in, the real initscripts reactivate them again.

Comment 8 Dave Jones 2009-09-11 20:58:25 UTC
ugh, it also isn't reliable. every few boots I see this ..

[    6.457314] raid0: too few disks (1 of 2) - aborting!
[    6.462445] md: pers->run() failed ...
[    6.484136] raid0: too few disks (1 of 2) - aborting!
[    6.489488] md: pers->run() failed ...
[    6.499121] raid0: too few disks (1 of 2) - aborting!
[    6.504337] md: pers->run() failed ...

No root device found

Comment 9 Dave Jones 2009-09-12 16:07:56 UTC
I figured out the reliability thing.  If the box crashes, and uncleanly shuts down, the next boot will fail as in comment #8.  Rebooting again makes it work.

dracut seems to not handle unclean arrays very well right now.

Comment 10 Harald Hoyer 2009-09-14 06:58:26 UTC
(In reply to comment #7)
> with todays tree, it does actually mount, and fully boot the installed system.
> Looking at dmesg though, it looks like we're doing this in a suboptimal manner.
> 
> http://davej.fedorapeople.org/nwo-dmesg.txt
> 
> Note the 'already has disks' messages around 5 seconds in.

yes, we incrementally build the arrays

> Also, at 7 seconds, we seem to be tearing everything down

right, because we don't have mdadm.conf and can build them correctly with mdadm.conf later on.

> 
> Then at 14 seconds in, the real initscripts reactivate them again.

Comment 11 Harald Hoyer 2009-09-14 06:59:00 UTC
(In reply to comment #9)
> I figured out the reliability thing.  If the box crashes, and uncleanly shuts
> down, the next boot will fail as in comment #8.  Rebooting again makes it work.
> 
> dracut seems to not handle unclean arrays very well right now.  

and why are they clean on a reboot?

Comment 12 Dave Jones 2009-09-14 16:55:10 UTC
no idea.  my theory is that whatever dracut does when it fails is marking them as clean again.

Comment 13 Harald Hoyer 2009-09-15 14:25:30 UTC
For the advanced user, here is a scratch version to test:

# rpm -e '*dracut*' --nodeps
# rpm -ivh 'http://koji.fedoraproject.org/koji/getfile?taskID=1680533&name=dracut-001-10.git4d924752.fc12.noarch.rpm'

Comment 14 Harald Hoyer 2009-09-17 11:09:59 UTC
Please test dracut-001-12.git0f7e10ce.fc12.
Either wait for it to appear in rawhide or do:
# yum install koji
# cd $(mktemp -d)
# koji download-build 132403
# rpm -Fvh *.rpm

and recreate the image with

# dracut /boot/<image> <kernel version>

Note: in recent installs the <image> is named initramfs-<kernel version>.img

Comment 15 Harald Hoyer 2009-11-05 12:34:01 UTC
any updates?

Comment 16 Bug Zapper 2009-11-16 11:54:57 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 17 Harald Hoyer 2009-11-26 09:51:13 UTC
(In reply to comment #15)
> any updates?

Comment 18 Fedora Update System 2010-01-26 10:48:35 UTC
dracut-004-4.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/dracut-004-4.fc12

Comment 19 Fedora Update System 2010-01-27 01:05:53 UTC
dracut-004-4.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update dracut'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2010-1088

Comment 20 Fedora Update System 2010-01-28 00:51:12 UTC
dracut-004-4.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 21 Sascha Zorn 2012-11-26 17:07:16 UTC
First, I'm not sure if I should open a new bug so please apologize if I should.

I'm experiencing exactly the same problem on an Fedora 17 system with kernel 3.6.6-1.fc17 and dracut-018-105.git20120927.fc17. 

I have two RAID0 partitions on the first two discs. / and /home. / (md126) gets assembled every time, but every now and then (mostly I see this after unclean shutdowns; but could be coincidence) /home (md127) fails to start.

[    2.092624] md: bind<sdb5>
[    2.205798] md: bind<sda5>
[    2.212655] md: raid0 personality registered for level 0
[    2.213694] bio: create slab <bio-1> at 1
[    2.213703] md/raid0:md126: md_size is 225275904 sectors.
[    2.213706] md: RAID0 configuration for md126 - 1 zone
[    2.213707] md: zone0=[sda5/sdb5]
[    2.213712]       zone-offset=         0KB, device-offset=         0KB, size= 112637952KB
[    2.213720] md126: detected capacity change from 0 to 115341262848

much later:
[    9.081491] md: bind<sda6>
[    9.103518] Adding 3071996k swap on /dev/sdb1.  Priority:1 extents:1 across:3071996k
[    9.739603] md/raid0:md127: too few disks (1 of 2) - aborting!
[    9.739604] md: pers->run() failed ...
[    9.742785] md127: ADD_NEW_DISK not supported

So sdb6 fails to bind and md127 can't be started. I'm wondering about the ADD_NEW_DISK failure. In the emergency shell I tried
"mdadm /dev/md127 --re-add /dev/sdb6" which resulted in exactly the same log entry:
[  185.978615] md127: ADD_NEW_DISK not supported

I guess dracut is trying something similar.

After mdadm --stop; mdadm --assemble /dev/md127 my raid works perfectly.

[  225.468147] md: md127 stopped.
[  225.468155] md: unbind<sda6>
[  225.475257] md: export_rdev(sda6)
[  230.335420] md: md127 stopped.
[  230.356038] md: bind<sdb6>
[  230.356224] md: bind<sda6>
[  230.357923] md/raid0:md127: md_size is 1474555904 sectors.
[  230.357926] md: RAID0 configuration for md127 - 1 zone
[  230.357927] md: zone0=[sda6/sdb6]
[  230.357932]       zone-offset=         0KB, device-offset=         0KB, size= 737277952KB
[  230.357933] 
[  230.357939] md127: detected capacity change from 0 to 754972622848

I tried rebuilding the initrd with dracut --force already. The content of the /etc/fstab looks like this:
# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md126 level=raid0 num-devices=2 UUID=0821bb9f:b0d66882:3780e2e1:d8445e47
ARRAY /dev/md127 level=raid0 num-devices=2 UUID=def4ecd3:73a543e3:6afefdf7:39f608c7

mdadm --detail --scan output is:
ARRAY /dev/md126 metadata=1.2 name=localhost.localdomain:126 UUID=0821bb9f:b0d66882:3780e2e1:d8445e47
ARRAY /dev/md127 metadata=1.2 name=localhost.localdomain:127 UUID=def4ecd3:73a543e3:6afefdf7:39f608c7

Comment 22 Sascha Zorn 2012-11-26 17:16:28 UTC
After reading a bit in dracut RAID related mails I guess this could be related to the "mdadm -A" mentioned here: Bug 751667 Comment 14 (sorry I don't know the bugzilla syntax for comment links).

I guess dracut should really start raids in incremental mode.

Comment 23 Sascha Zorn 2012-11-26 17:39:57 UTC
I mean it should start RAID 0's in incremental mode. But I don't know if you can read out the md metadata before assembling the device.

I've seen that I already have rd.md.uuid=0821bb9f:b0d66882:3780e2e1:d8445e47 in my kernelopts. Can I add multiple devices that should get assembled incrementally to my kernel line?

Comment 24 Harald Hoyer 2012-11-28 10:47:56 UTC
(In reply to comment #23)
> I mean it should start RAID 0's in incremental mode. But I don't know if you
> can read out the md metadata before assembling the device.
> 
> I've seen that I already have rd.md.uuid=0821bb9f:b0d66882:3780e2e1:d8445e47
> in my kernelopts. Can I add multiple devices that should get assembled
> incrementally to my kernel line?

Yes, you can, and if there are rd.md.uuid, then dracut does not (should not) touch any other raid arrays.

Are you sure, that dracut is assembling /home and not the mdadm udev rules from the real system.

Comment 25 Sascha Zorn 2012-11-28 15:36:44 UTC
To be honest, I'm not sure. How can I see who is assembling the RAID? In my current log I can't see. Would rd.debug help?

Also shouldn't there be some udev rules that wait for all devices to be there before assembling the RAID. What bothers me the most is that the device must be available already, otherwise / with /dev/sdb5 wouldn't start. Just /home complains about a missing /dev/sdb6 and has to be stopped and reassembled manually.

Comment 26 Harald Hoyer 2012-11-29 09:53:17 UTC
(In reply to comment #25)
> To be honest, I'm not sure. How can I see who is assembling the RAID? In my
> current log I can't see. Would rd.debug help?
> 
> Also shouldn't there be some udev rules that wait for all devices to be
> there before assembling the RAID. What bothers me the most is that the
> device must be available already, otherwise / with /dev/sdb5 wouldn't start.
> Just /home complains about a missing /dev/sdb6 and has to be stopped and
> reassembled manually.

Well, I would say, open a new bugzilla for this, then.