Cloning for rawhide +++ This bug was initially created as a clone of Bug #736387 +++ +++ This bug was initially created as a clone of Bug #729205 +++ --- Additional comment from michael.wuersch on 2011-09-02 10:54:14 EDT --- I have exactly the same problem, which occurred after updating from fc14 to fc15 and thus getting a new kernel. However, my kernel is 2.6.40.3-0.fc15.x86_64. I followed the advice above and executed: su -c 'yum update --enablerepo=updates-testing mdadm-3.2.2-9.fc15' Then I rebuilt the initramfs image with: sudo dracut initramfs-2.6.40.3-0.fc15.x86_64.img 2.6.40.3-0.fc15.x86_64 --force Error persists after reboot. --- Additional comment from michael.wuersch on 2011-09-02 11:05:02 EDT --- Sorry, just noticed that the output of dmesg differs slightly: dracut: Autoassembling MD Raid dracut Warning: No root device "block:/dev/disk/by-uuid/812eb062-d765-4065-be34-4a2cf4160064" found --- Additional comment from dledford on 2011-09-02 13:44:29 EDT --- --- Additional comment from michael.wuersch on 2011-09-05 11:18:03 EDT --- Thanks, Dough, for your time. Below's the output, when I remove the rhgb and quiet options: ... dracut: dracut-009-12.fc15 udev[164]: starting version 167 dracut: Starting plymouth daemon pata_jmicron 0000:0500.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 scsi6: pata_jmicron scsi7: pata_jmicron ata7: PATA max UDMA/100 cmd 0xr400 ctl 0xec400 bdma 0xe480 irq 16 ata8: PATA max UDMA/100 cmd 0xr400 ctl 0xec880 bdma 0xe488 irq 16 firewire_ohci 0000:06:05.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 firewire_ohci: Added fw-ohci device 0000:06:05.0, OHCI v1.10, 4 IRQ +9 IT contexts, quirks 0x2 firewire_core: created device fw0 GUID 0030480000206d38, S400 dracut: Autoassembling MD Raid dracut Warning: No root device "block:/dev/disk/by-uuid/812eb062-d756-4065-be34-4a2cf4160064"found Dropping to debug shell. sh: can't access tty; job control turned off dracut:/# Kernel 2.6.35.14-95.fc14.x86_64 boots perfectly with the same kernel parameters. Let me know, if I can provide any other helpful information. Michael --- Additional comment from michael.wuersch on 2011-09-07 05:57:35 EDT --- Same problem with Kernel 2.6.40.4-5.fc15.x86_64. Regards, Michael --- Additional comment from dledford on 2011-09-07 10:56:21 EDT --- OK, this bug is getting overly confusing because we are getting different problems reported under the same bug. First, Rodney, you're original bug was this: dracut: mdadm: Container /dev/md127 has been assembled with 2 drives dracut: mdadm (IMSM): Unsupported attributes: 40000000 dracut: mdadm IMSM metadata load not allowed due to attribute incompatibility In response to that specific bug (about the unsupported attributes) I built a new mdadm with a patch to fix the issue. Your system still doesn't boot now, so the question is why. You then posted these messages: md: raid1 personality registered for level 1 bio: create slab <bio-1> at 1 [Not sure this is relevant, but it's here in the middle of the others.] dracut: mdadm: array /dev/md126 now has 2 devices dracut Warning: No root device "block:/dev/mapper/vg_hostname-lv_root" found dracut Warning: LVM vg_host/lv_root not found dracut Warning: LVM vg_host/lv_swap not found The important thing to note here is that mdadm is no longer rejecting your array, and in fact it started your raid device. Now, what's happening is that the lvm PV on top of your raid device isn't getting started. Regardless of the fact that your system isn't up and running yet, the original bug in the bug report *has* been fixed and verified. So, this bug is no longer appropriate for any other problem reports because the specific issue in this bug is resolved. Of course, that doesn't get yours or any of the other poster's systems running, so we need to open a new bug(s) for tracking the remaining issues. I've not heard back from Charlweed on what his problem is. Rodney, your new problem appears to be that the raid device is started, but the lvm PV on top of your raid device is not. Michael, unless you edited lines out of your debug messages you posted, I can't see where your hard drives are being detected and can't see where the raid array is even attempting to start. Dracut is starting md autoassembly, but it's not finding anything to assemble and so it does nothing. So I'll clone this twice to track the two different issues. This bug, however, is now verified and ready to be closed out when the package is pushed live. --- Additional comment from michael.wuersch on 2011-09-07 13:46:55 EDT --- Thanks for cloning the bug - I am not familiar with the internals of the early linux boot process and therefore, up to now, I was not aware that the bugs weren't related. I did not edit any lines out after the first line (i.e., the line 'dracut: dracut-009-12.fc15'). Can I contribute anything else to help in resolving this issue? Michael --- Additional comment from dledford on 2011-09-08 20:45:39 EDT --- In the other bug I cloned from this one a fact came up that might be relevant here. Can you try grabbing the dracut package from your install media and downgrading your copy of dracut to what was shipped with f15, then rebuild the initramfs that fails to boot with the old dracut and try booting again? --- Additional comment from michael.wuersch on 2011-09-09 02:51:17 EDT --- I did not use any media but instead relied on PreUpgrade to get to fc15. But I will download an ISO quickly and try as advised. --- Additional comment from michael.wuersch on 2011-09-09 03:32:06 EDT --- No luck, so far. I have checked the version of dracut on the DVD: dracut-009-10.fc15.noarch.rpm, whereas I had installed 009-12.fc15. Since I can boot with 2.6.35.14-95.fc14.x86_64, I bootet and ran: sudo yum downgrade dracut Output: ... Running Transaction Installing : dracut-009-10.fc15.noarch Cleanup : dracut-009-12.fc15.noarch Removed: dracut.noarch 0:009-12.fc15 Installed: dracut.noarch 0:009-10.fc15 Then I ran: sudo dracut initramfs-2.6.40.4-5.fc15.x86_64.img 2.6.40.4-5.fc15.x86_64 --force and did a reboot. Same error message as before. Michael --- Additional comment from dledford on 2011-09-09 12:02:12 EDT --- For some reason, on your system, the hard drives are not being found. Can you boot into the working kernel, then run dmesg and post the output of that into this bug please? --- Additional comment from michael.wuersch on 2011-09-09 13:29:28 EDT --- Created attachment 522371 [details] dmesg output I have attached the log. --- Additional comment from dledford on 2011-09-09 14:24:36 EDT --- OK, so when the machine boots up successfully, it is starting drives sda and sdb as an imsm raid array, so when you try to boot the new kernel, it drops you to a debug shell. From that debug shell, I need you to do a few things. First, verify that /dev/sda and /dev/sdb exist. Next, if they exist, try to assemble them using mdadm via the following commands: /sbin/mdadm -I /dev/sda /sbin/mdadm -I /dev/sdb If those commands work, then you should now have a new md device. Try running this command on that new device: /sbin/mdadm -I /dev/md<device_number> If that gets you your raid array up and running, then the question becomes "Why isn't this happening automatically like it's supposed to?" To try and answer that, make sure that the files /lib/udev/rules.d/64-md-raid.rules and /lib/udev/rules.d/65-md-incremental.rules exist. Let me know what you find out. --- Additional comment from pb on 2011-09-11 11:54:57 EDT --- Perhaps my note https://bugzilla.redhat.com/show_bug.cgi?id=729205#c15 helps, at least in my case downgrade to mdadm-3.1.5-2 and recreate initramfs files will result in a proper working newer kernel. initramfs containing mdadm binary from mdadm-3.2.2-6 nor 3.2.2-9 will not work in my case and result in a broken boot. Any hints how to debug the mdadm problem in dracut shell? --- Additional comment from michael.wuersch on 2011-09-12 03:49:00 EDT --- I booted into dracut debug shell and entered: /sbin/mdadm -I /dev/sda /sbin/mdadm -I /dev/sdb Output was: mdam: no RAID superblock on /dev/sda and /dev/sdb, respectively. /lib/udev/rules.d/64-md-raid.rules does exist, whereas /lib/udev/rules.d/65-md-incremental.rules does not. Here's the raid info from the "good" kernel: --- [user ~]$ sudo mdadm --detail /dev/md0 /dev/md0: Version : imsm Raid Level : container Total Devices : 2 Working Devices : 2 Member Arrays : /dev/md127 Number Major Minor RaidDevice 0 8 0 - /dev/sda 1 8 16 - /dev/sdb --- sudo mdadm --detail /dev/md127 /dev/md127: Container : /dev/md0, member 0 Raid Level : raid1 Array Size : 1953511424 (1863.01 GiB 2000.40 GB) Used Dev Size : 1953511556 (1863.01 GiB 2000.40 GB) Raid Devices : 2 Total Devices : 2 Update Time : Mon Sep 12 09:16:20 2011 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Number Major Minor RaidDevice State 1 8 0 0 active sync /dev/sda 0 8 16 1 active sync /dev/sdb --- cat /proc/mdstat Personalities : [raid1] md127 : active raid1 sda[1] sdb[0] 1953511424 blocks super external:/md0/0 [2/2] [UU] md0 : inactive sdb[1](S) sda[0](S) 4514 blocks super external:imsm unused devices: <none> --- Additional comment from dledford on 2011-09-12 11:19:48 EDT --- Peter, Michael: if you boot into an initramfs that does not work, then what do you get when you run mdadm -E /dev/sda? Does it simply say there is no superblock at all, or does it say it finds one but it's invalid, and if it does say it's invalid, does it say why? --- Additional comment from pb on 2011-09-12 15:25:47 EDT --- mdadm -E /dev/sda shows proper output like /dev/sda Magic: Intel Raid ISM Cfg. Sig. ... Attributes: All supported ... [OS] (name of configured RAID1 set in BIOS) ... Migrate State: repair (because of all this failed boots...) ... cat /proc/mdstat tells md127 : inactive sda[1] sdb[0] ... blocks super exsternal:-md0/0 md0 : inactive sdb[1](S) sda[0](S) .. blocks super external: imsm For me it looks like that the new version of mdadm simply forget to activate the RAID, while the old version does --- Additional comment from michael.wuersch on 2011-09-16 03:13:19 EDT --- Sorry for the delay, here's the output of mdadm: dracut:/# /sbin/mdadm -E /dev/sd? /dev/sda: Magic : Intel Raid ISM Cfg Sig. Version : 1.1.00 Orig Family : 0932e0b0 Family : 0932e0b0 Generation : 00261fa8 Attributes : All supported UUID : ...:...:... Checksum : 045764af correct MPB Sectors : 1 Disks : 2 RAID Devices : 1 Disk00 Serial : JK11A8B9JL8X5F State : active Id : 00000000 Usable Size : 3907023112 (1863.01 GiB 2000.40 GB) [System:] UUID : ...:...:... RAID LEVEL : 1 Members : 2 SLOTS : [UU] FAILED DISK : none This Slot : 0 Array Size : 3907022848 (1863.01 GiB 2000.40 GB) Per Dev Size : 3907023112 (1863.01 GiB 2000.40 GB) Sector Offset : 0 Num Stripes : 15261808 Chunk Size : 64 KiB Reserved : 0 Migrate State : idle Map State : normal Dirty State : dirty Disk00 Serial : JK11A8B9JL8X5F State : active Id : 00000000 Usable Size : 3907023112 (1863.01 GiB 2000.40 GB) ... (pretty much the same for /dev/sdb, as above) --- Additional comment from dledford on 2011-09-16 14:01:17 EDT --- Peter, you left out part of the contents of /proc/mdstat, what does the personality line read on a failed boot? (And I would like the same info from you Michael, aka the full contents of /proc/mdstat on a failed boot) --- Additional comment from pb on 2011-09-16 14:18:54 EDT --- Next notes: 1. always successful boot with old mdadm: Personalities : [raid1] 2. did now successful boot to a "NORMAL" (BIOS) array also with new mdadm. But here the resync starts immediately. [ 3.260537] md: md0 stopped. [ 3.263234] md: bind<sda> [ 3.263338] md: bind<sdb> [ 3.263490] dracut: mdadm: Container /dev/md0 has been assembled with 2 drives [ 3.272304] md: md127 stopped. [ 3.272514] md: bind<sdb> [ 3.272653] md: bind<sda> [ 3.273900] md: raid1 personality registered for level 1 [ 3.274490] md/raid1:md127: not clean -- starting background reconstruction ^^^^ BIOS told "NORMAL" ! [ 3.274564] md/raid1:md127: active with 2 out of 2 mirrors [ 3.274643] md127: detected capacity change from 0 to 160038912000 [ 3.282507] md: md127 switched to read-write mode. [ 3.282761] md: resync of RAID array md127 [ 3.282790] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 3.282826] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync. [ 3.282882] md: using 128k window, over a total of 156288132k. [ 3.292149] dracut: mdadm: Started /dev/md127 with 2 devices [ 3.401892] md127: p1 p2 p3 p4 < p5 p6 p7 p8 > [ 3.724356] md: md1 stopped. [ 3.727722] md: bind<sdc1> [ 3.730487] md: bind<sdd1> [ 3.734248] md/raid1:md1: active with 2 out of 2 mirrors [ 3.736817] md1: detected capacity change from 0 to 160039174144 [ 3.739379] dracut: mdadm: /dev/md1 has been started with 2 drives. [ 3.743099] md1: unknown partition table Just note here, I ran 2 RAID1 with 4 drives /dev/sd{a,b} is IMSM (dual boot with Windows) /dev/sd{c,d} is a Linux only software RAID 3. Reboot now during this running resync results in BIOS "VERIFY" (just note that I think during shutdown something like store of current sync position is shown. Booting with new mdadm results now in broken boot, where Personalities : [raid1] and md1 (the Linux software RAID) is active, while md127 is inactive So as other also already have seen, if the IMSM RAID is in "VERIFY" mode, mdadm will not start the RAID. --- Additional comment from michael.wuersch on 2011-09-20 05:18:15 EDT --- cat /proc/mdstat does not list anything when dropped to the dracut debug shell, i.e.: dracut:/# cat /proc/mdstat Personalities : unused devices: <none> Output for the old kernel (the one which is able to boot) is: Personalities : [raid1] md127 : active raid1 sda[1] sdb[0] 1953511424 blocks super external:/md0/0 [2/2] [UU] [>....................] resync = 0.0% (1727872/1953511556) finish=5236.1min speed=6212K/sec md0 : inactive sdb[1](S) sda[0](S) 4514 blocks super external:imsm unused devices: <none> --- Additional comment from dledford on 2011-09-20 15:54:44 EDT --- OK, I've got enough info to try and reproduce it here. I'll see if I can work up a fix to this. It seems that the mdadm-3.2.2 binary is misinterpreting some of the bits in the imsm superblock so that it doesn't assemble arrays in VERIFY state and when the BIOS thinks an array is clean, mdadm thinks it is dirty and starts a rebuild. --- Additional comment from pb on 2011-10-06 14:07:08 EDT --- I ran additional tests because also after downgrading to mdadm-3.1.5-2.fc15.i686 the rebuild starts even on a clean array, which keeps my system after each reboot very busy for 90 minutes. Crossdowngrading to mdadm-3.1.3-0.git20100804.3.fc14 of F14 and creating a special new ramdisk finally solves the issue. Please check all changes from 3.1.3 to 3.1.5/3.2.2