Bug 520828
Summary: | failure to start up root on raid 0. | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Dave Jones <davej> | ||||
Component: | dracut | Assignee: | Harald Hoyer <harald> | ||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 12 | CC: | harald, pfrields, Sascha.Zorn | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 004-4.fc12 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-01-28 00:53:25 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Dave Jones
2009-09-02 15:21:43 UTC
Created attachment 359547 [details]
screen capture of failure to mount root
> sidenote: why doesn't dracut automatically drop me to a shell instead of
> uselessly waiting forever?
security.. just add "rdshell" to the kernel command line
oh, and please retry with dracut-001 which was built today.
oh, and you might want to add "rdinfo" or "rdinitdebug" with dracut-001, it fails similarly, but I now also see .. /initqueue/mdraid_start.sh:23: grep: not found a dozen or so times. ... gah.. sry, please build with # dracut -a debug <....same options as before...> or use dracut-001-2 with todays tree, it does actually mount, and fully boot the installed system. Looking at dmesg though, it looks like we're doing this in a suboptimal manner. http://davej.fedorapeople.org/nwo-dmesg.txt Note the 'already has disks' messages around 5 seconds in. Also, at 7 seconds, we seem to be tearing everything down Then at 14 seconds in, the real initscripts reactivate them again. ugh, it also isn't reliable. every few boots I see this .. [ 6.457314] raid0: too few disks (1 of 2) - aborting! [ 6.462445] md: pers->run() failed ... [ 6.484136] raid0: too few disks (1 of 2) - aborting! [ 6.489488] md: pers->run() failed ... [ 6.499121] raid0: too few disks (1 of 2) - aborting! [ 6.504337] md: pers->run() failed ... No root device found I figured out the reliability thing. If the box crashes, and uncleanly shuts down, the next boot will fail as in comment #8. Rebooting again makes it work. dracut seems to not handle unclean arrays very well right now. (In reply to comment #7) > with todays tree, it does actually mount, and fully boot the installed system. > Looking at dmesg though, it looks like we're doing this in a suboptimal manner. > > http://davej.fedorapeople.org/nwo-dmesg.txt > > Note the 'already has disks' messages around 5 seconds in. yes, we incrementally build the arrays > Also, at 7 seconds, we seem to be tearing everything down right, because we don't have mdadm.conf and can build them correctly with mdadm.conf later on. > > Then at 14 seconds in, the real initscripts reactivate them again. (In reply to comment #9) > I figured out the reliability thing. If the box crashes, and uncleanly shuts > down, the next boot will fail as in comment #8. Rebooting again makes it work. > > dracut seems to not handle unclean arrays very well right now. and why are they clean on a reboot? no idea. my theory is that whatever dracut does when it fails is marking them as clean again. For the advanced user, here is a scratch version to test: # rpm -e '*dracut*' --nodeps # rpm -ivh 'http://koji.fedoraproject.org/koji/getfile?taskID=1680533&name=dracut-001-10.git4d924752.fc12.noarch.rpm' Please test dracut-001-12.git0f7e10ce.fc12. Either wait for it to appear in rawhide or do: # yum install koji # cd $(mktemp -d) # koji download-build 132403 # rpm -Fvh *.rpm and recreate the image with # dracut /boot/<image> <kernel version> Note: in recent installs the <image> is named initramfs-<kernel version>.img any updates? This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle. Changing version to '12'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping (In reply to comment #15) > any updates? dracut-004-4.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/dracut-004-4.fc12 dracut-004-4.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update dracut'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2010-1088 dracut-004-4.fc12 has been pushed to the Fedora 12 stable repository. If problems still persist, please make note of it in this bug report. First, I'm not sure if I should open a new bug so please apologize if I should. I'm experiencing exactly the same problem on an Fedora 17 system with kernel 3.6.6-1.fc17 and dracut-018-105.git20120927.fc17. I have two RAID0 partitions on the first two discs. / and /home. / (md126) gets assembled every time, but every now and then (mostly I see this after unclean shutdowns; but could be coincidence) /home (md127) fails to start. [ 2.092624] md: bind<sdb5> [ 2.205798] md: bind<sda5> [ 2.212655] md: raid0 personality registered for level 0 [ 2.213694] bio: create slab <bio-1> at 1 [ 2.213703] md/raid0:md126: md_size is 225275904 sectors. [ 2.213706] md: RAID0 configuration for md126 - 1 zone [ 2.213707] md: zone0=[sda5/sdb5] [ 2.213712] zone-offset= 0KB, device-offset= 0KB, size= 112637952KB [ 2.213720] md126: detected capacity change from 0 to 115341262848 much later: [ 9.081491] md: bind<sda6> [ 9.103518] Adding 3071996k swap on /dev/sdb1. Priority:1 extents:1 across:3071996k [ 9.739603] md/raid0:md127: too few disks (1 of 2) - aborting! [ 9.739604] md: pers->run() failed ... [ 9.742785] md127: ADD_NEW_DISK not supported So sdb6 fails to bind and md127 can't be started. I'm wondering about the ADD_NEW_DISK failure. In the emergency shell I tried "mdadm /dev/md127 --re-add /dev/sdb6" which resulted in exactly the same log entry: [ 185.978615] md127: ADD_NEW_DISK not supported I guess dracut is trying something similar. After mdadm --stop; mdadm --assemble /dev/md127 my raid works perfectly. [ 225.468147] md: md127 stopped. [ 225.468155] md: unbind<sda6> [ 225.475257] md: export_rdev(sda6) [ 230.335420] md: md127 stopped. [ 230.356038] md: bind<sdb6> [ 230.356224] md: bind<sda6> [ 230.357923] md/raid0:md127: md_size is 1474555904 sectors. [ 230.357926] md: RAID0 configuration for md127 - 1 zone [ 230.357927] md: zone0=[sda6/sdb6] [ 230.357932] zone-offset= 0KB, device-offset= 0KB, size= 737277952KB [ 230.357933] [ 230.357939] md127: detected capacity change from 0 to 754972622848 I tried rebuilding the initrd with dracut --force already. The content of the /etc/fstab looks like this: # mdadm.conf written out by anaconda MAILADDR root AUTO +imsm +1.x -all ARRAY /dev/md126 level=raid0 num-devices=2 UUID=0821bb9f:b0d66882:3780e2e1:d8445e47 ARRAY /dev/md127 level=raid0 num-devices=2 UUID=def4ecd3:73a543e3:6afefdf7:39f608c7 mdadm --detail --scan output is: ARRAY /dev/md126 metadata=1.2 name=localhost.localdomain:126 UUID=0821bb9f:b0d66882:3780e2e1:d8445e47 ARRAY /dev/md127 metadata=1.2 name=localhost.localdomain:127 UUID=def4ecd3:73a543e3:6afefdf7:39f608c7 After reading a bit in dracut RAID related mails I guess this could be related to the "mdadm -A" mentioned here: Bug 751667 Comment 14 (sorry I don't know the bugzilla syntax for comment links). I guess dracut should really start raids in incremental mode. I mean it should start RAID 0's in incremental mode. But I don't know if you can read out the md metadata before assembling the device. I've seen that I already have rd.md.uuid=0821bb9f:b0d66882:3780e2e1:d8445e47 in my kernelopts. Can I add multiple devices that should get assembled incrementally to my kernel line? (In reply to comment #23) > I mean it should start RAID 0's in incremental mode. But I don't know if you > can read out the md metadata before assembling the device. > > I've seen that I already have rd.md.uuid=0821bb9f:b0d66882:3780e2e1:d8445e47 > in my kernelopts. Can I add multiple devices that should get assembled > incrementally to my kernel line? Yes, you can, and if there are rd.md.uuid, then dracut does not (should not) touch any other raid arrays. Are you sure, that dracut is assembling /home and not the mdadm udev rules from the real system. To be honest, I'm not sure. How can I see who is assembling the RAID? In my current log I can't see. Would rd.debug help? Also shouldn't there be some udev rules that wait for all devices to be there before assembling the RAID. What bothers me the most is that the device must be available already, otherwise / with /dev/sdb5 wouldn't start. Just /home complains about a missing /dev/sdb6 and has to be stopped and reassembled manually. (In reply to comment #25) > To be honest, I'm not sure. How can I see who is assembling the RAID? In my > current log I can't see. Would rd.debug help? > > Also shouldn't there be some udev rules that wait for all devices to be > there before assembling the RAID. What bothers me the most is that the > device must be available already, otherwise / with /dev/sdb5 wouldn't start. > Just /home complains about a missing /dev/sdb6 and has to be stopped and > reassembled manually. Well, I would say, open a new bugzilla for this, then. |