Bug 729205

Summary: kernel-2.6.40-4.fc15.x86_64 fails to boot due to RAID
Product: [Fedora] Fedora Reporter: Rodney Barnett <rhbugzilla>
Component: mdadmAssignee: Doug Ledford <dledford>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 15CC: agk, dev, dledford, mbroz, michael.wuersch, msmsms10079, pb, serge
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: mdadm-3.2.2-9.fc15 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 736386 736387 (view as bug list) Environment:
Last Closed: 2011-09-10 19:59:11 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Rodney Barnett 2011-08-08 23:13:20 EDT
Description of problem:

After installing kernel-2.6.40-4.fc15.x86_64 package, system fails during boot with message "Dropping into debug shell".

dmesg command shows the following...
dracut: Autoassembling mDRaid
md: md127 stopped
md: bind <sda>
md: bind <sdb>
dracut: mdadm: Container /dev/md127 has been assembled with 2 drives
dracut: mdadm (IMSM): Unsupported attributes: 40000000
dracut: mdadm IMSM metadata load not allowed due to attribute incompatibility
dracut Warning: No root device "block:/dev/mapper/vg_hostname-lv_root" found

Version-Release number of selected component (if applicable):


How reproducible:

fails to boot every time.

Steps to Reproduce:
1. Install kernel-2.6.40-4.fc15.x86_64
2. Reboot
  
Actual results:

failure during boot

Expected results:

a running system

Additional info:

System contains only one pair of disks in RAID 1 configuration with Intel Storage Matrix "controller".
Comment 1 Doug Ledford 2011-08-09 12:58:20 EDT
This has been reported upstream and will likely have a final fix soon.  In the meantime, you could remake the arrays to clear the bit that's set that mdadm doesn't expect to be set, although doing so is risky so don't undertake this workaround if you aren't comfortable doing so.  You would need to boot up a live cd or rescue image, use mdadm -E to get the exact info for your array, then recreate the array with the exact same options as it currently uses but passing in the --assume-clean option, then reboot to your arrays with the trouble causing bit cleared.  Or you can wait for the next mdadm build.
Comment 2 Charlweed Hymerfan 2011-08-11 23:19:15 EDT
I also fail-to-boot.
systemD sends me into emergency mode.
WhenI try "mdadm -D -s" I get a kernel error


md: could not open unknown-block(8,81).
md: could not open unknown-block(8,97).
Comment 3 Doug Ledford 2011-08-31 15:58:28 EDT
I modified the proposed patch according to comments made upstream and included it in the 3.2.2-9 build.
Comment 4 Fedora Update System 2011-08-31 16:03:12 EDT
mdadm-3.2.2-9.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/mdadm-3.2.2-9.fc15
Comment 5 Fedora Update System 2011-09-02 01:24:28 EDT
Package mdadm-3.2.2-9.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing mdadm-3.2.2-9.fc15'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/mdadm-3.2.2-9.fc15
then log in and leave karma (feedback).
Comment 6 Michael Würsch 2011-09-02 10:54:14 EDT
I have exactly the same problem, which occurred after updating from fc14 to fc15 and thus getting a new kernel. However, my kernel is 2.6.40.3-0.fc15.x86_64.

I followed the advice above and executed:

su -c 'yum update --enablerepo=updates-testing mdadm-3.2.2-9.fc15'

Then I rebuilt the initramfs image with:

sudo dracut initramfs-2.6.40.3-0.fc15.x86_64.img 2.6.40.3-0.fc15.x86_64 --force

Error persists after reboot.
Comment 7 Michael Würsch 2011-09-02 11:05:02 EDT
Sorry, just noticed that the output of dmesg differs slightly:

dracut: Autoassembling MD Raid
dracut Warning: No root device "block:/dev/disk/by-uuid/812eb062-d765-4065-be34-4a2cf4160064" found
Comment 8 Doug Ledford 2011-09-02 13:44:29 EDT
Michael, Charlweed: in order to help either of you two, you will need to remove the rhgb and quiet options from the kernel boot command and try to boot your system.  With those options removed, we should be able to see some helpful messages in terms of debugging.
Comment 9 Rodney Barnett 2011-09-03 01:02:27 EDT
I installed the mdadm-3.2.2-9.fc15 package and rebooted as directed.  It made no difference.  Looking further, it seems that the initramfs needs to be rebuilt, so I did that and copied it into /boot.

Now, booting still fails and the following messages from dmesg seem relevant...

dracut: Autoassembling MD Raid
md: md127 stopped.
md: bind <sda>
md: bind <sdb>
dracut: mdadm: Container /dev/md127 has been assembled with 2 devices
md: md126 stopped
md: bind <sda>
md: bind <sdb>
md: raid1 personality registered for level 1
bio: create slab <bio-1> at 1 [Not sure this is relevant, but it's here in the middle of the others.]
dracut: mdadm: array /dev/md126 now has 2 devices
dracut Warning: No root device "block:/dev/mapper/vg_hostname-lv_root" found
dracut Warning: LVM vg_host/lv_root not found
dracut Warning: LVM vg_host/lv_swap not found

Assuming the numbers that precede the lines are seconds, there's about a 23 second lag between the first "dracut Warning:" line and the previous line.

Rodney
Comment 10 Michael Würsch 2011-09-05 11:18:03 EDT
Thanks, Dough, for your time. Below's the output, when I remove the rhgb and quiet options:

...
dracut: dracut-009-12.fc15
udev[164]: starting version 167
dracut: Starting plymouth daemon
pata_jmicron 0000:0500.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
scsi6: pata_jmicron
scsi7: pata_jmicron
ata7: PATA max UDMA/100 cmd 0xr400 ctl 0xec400 bdma 0xe480 irq 16
ata8: PATA max UDMA/100 cmd 0xr400 ctl 0xec880 bdma 0xe488 irq 16
firewire_ohci 0000:06:05.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
firewire_ohci: Added fw-ohci device 0000:06:05.0, OHCI v1.10, 4 IRQ +9 IT contexts, quirks 0x2
firewire_core: created device fw0 GUID 0030480000206d38, S400
dracut: Autoassembling MD Raid
dracut Warning: No root device "block:/dev/disk/by-uuid/812eb062-d756-4065-be34-4a2cf4160064"found


Dropping to debug shell.

sh: can't access tty; job control turned off
dracut:/#


Kernel 2.6.35.14-95.fc14.x86_64 boots perfectly with the same kernel parameters. Let me know, if I can provide any other helpful information.

Michael
Comment 11 Michael Würsch 2011-09-07 05:57:35 EDT
Same problem with Kernel 2.6.40.4-5.fc15.x86_64.

Regards,

Michael
Comment 12 Doug Ledford 2011-09-07 10:56:21 EDT
OK, this bug is getting overly confusing because we are getting different problems reported under the same bug.

First, Rodney, you're original bug was this:
dracut: mdadm: Container /dev/md127 has been assembled with 2 drives
dracut: mdadm (IMSM): Unsupported attributes: 40000000
dracut: mdadm IMSM metadata load not allowed due to attribute incompatibility

In response to that specific bug (about the unsupported attributes) I built a new mdadm with a patch to fix the issue.  Your system still doesn't boot now, so the question is why.  You then posted these messages:
md: raid1 personality registered for level 1
bio: create slab <bio-1> at 1 [Not sure this is relevant, but it's here in the
middle of the others.]
dracut: mdadm: array /dev/md126 now has 2 devices
dracut Warning: No root device "block:/dev/mapper/vg_hostname-lv_root" found
dracut Warning: LVM vg_host/lv_root not found
dracut Warning: LVM vg_host/lv_swap not found

The important thing to note here is that mdadm is no longer rejecting your array, and in fact it started your raid device.  Now, what's happening is that the lvm PV on top of your raid device isn't getting started.  Regardless of the fact that your system isn't up and running yet, the original bug in the bug report *has* been fixed and verified.  So, this bug is no longer appropriate for any other problem reports because the specific issue in this bug is resolved.

Of course, that doesn't get yours or any of the other poster's systems running, so we need to open a new bug(s) for tracking the remaining issues.

I've not heard back from Charlweed on what his problem is.  Rodney, your new problem appears to be that the raid device is started, but the lvm PV on top of your raid device is not.  Michael, unless you edited lines out of your debug messages you posted, I can't see where your hard drives are being detected and can't see where the raid array is even attempting to start.  Dracut is starting md autoassembly, but it's not finding anything to assemble and so it does nothing.  So I'll clone this twice to track the two different issues.  This bug, however, is now verified and ready to be closed out when the package is pushed live.
Comment 13 Fedora Update System 2011-09-10 19:59:00 EDT
mdadm-3.2.2-9.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 14 Serge Droz 2011-09-11 07:12:48 EDT
Just a quick comment. I had the same problem as Michael, even after updating to mdadm-3.2.2-9.fc15. In betwee I did boot to WinXP where Intel's utility did, what it did. I now boot fine inot 2.40. 
Before that after a boot inot an older kernel the raid system would resync. So I presume that the raid has to be in order for the boot process to go through. 

Also not, the issue in this bug is probably the same as in 727696.
Comment 15 Peter Bieringer 2011-09-11 11:46:28 EDT
I ran into the same problem here after upgrading from F14 to F15

While initial kernel boot works, suddenly I can't use newer kernels with error shown below.

Digging through this I found that upgrade from mdadm-3.1.5-2 to mdadm-3.2.2-6 breaks the boot.

Downgrading to mdadm-3.1.5-2 and recreation of the ramdisk results in successfully boot.

Upgrading then to mdadm-3.2.2-9 and creating a test ramdisk results in a broken boot.

Analyzing the contents of initramfs of good and broken boot, the result shows, that the mdadm binary itself must be buggy, nothing else is different.

diff -urN initramfs-2.6.40.4-5.fc15.i686.PAE.good/etc/ld.so.conf.d/kernel-2.6.40.3-0.fc15.i686.PAE.conf initramfs-2.6.40.4-5.fc15.i686.PAE.broken/etc/ld.so.conf.d/kernel-2.6.40.3-0.fc15.i686.PAE.conf
--- initramfs-2.6.40.4-5.fc15.i686.PAE.good/etc/ld.so.conf.d/kernel-2.6.40.3-0.fc15.i686.PAE.conf	2011-09-11 17:43:13.258433278 +0200
+++ initramfs-2.6.40.4-5.fc15.i686.PAE.broken/etc/ld.so.conf.d/kernel-2.6.40.3-0.fc15.i686.PAE.conf	1970-01-01 01:00:00.000000000 +0100
@@ -1 +0,0 @@
-# Placeholder file, no vDSO hwcap entries used in this kernel.
diff -urN initramfs-2.6.40.4-5.fc15.i686.PAE.good/lib/udev/rules.d/64-md-raid.rules initramfs-2.6.40.4-5.fc15.i686.PAE.broken/lib/udev/rules.d/64-md-raid.rules
--- initramfs-2.6.40.4-5.fc15.i686.PAE.good/lib/udev/rules.d/64-md-raid.rules	2011-09-11 17:43:13.434433274 +0200
+++ initramfs-2.6.40.4-5.fc15.i686.PAE.broken/lib/udev/rules.d/64-md-raid.rules	2011-09-11 17:43:19.497433424 +0200
@@ -2,11 +2,13 @@
 
 SUBSYSTEM!="block", GOTO="md_end"
 
+# In Fedora we handle the raid components in 65-md-incremental.rules so that
+# we can do things like honor anaconda command line options and such
 # handle potential components of arrays
-# Note: in Fedora we handle incremental assembly in 65-incremental.rules so
-#       we can do things like honor anaconda install options
-#ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="remove", RUN+="/sbin/mdadm -If $name"
+#ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="remove", RUN+="/sbin/mdadm -If $name --path $env{ID_PATH}"
 #ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="add", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"
+#ENV{ID_FS_TYPE}=="isw_raid_member", ACTION=="remove", RUN+="/sbin/mdadm -If $name --path $env{ID_PATH}"
+#ENV{ID_FS_TYPE}=="isw_raid_member", ACTION=="add", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"
 
 # handle md arrays
 ACTION!="add|change", GOTO="md_end"
Binary files initramfs-2.6.40.4-5.fc15.i686.PAE.good/sbin/mdadm and initramfs-2.6.40.4-5.fc15.i686.PAE.broken/sbin/mdadm differ
Binary files initramfs-2.6.40.4-5.fc15.i686.PAE.good/sbin/mdmon and initramfs-2.6.40.4-5.fc15.i686.PAE.broken/sbin/mdmon differ
Comment 16 Serge Droz 2011-09-12 16:51:56 EDT
Update:

If the raild controler indicates a problem (it displays Verify), booting 2.40 fails. With RHGB it just hangs. 
Without quiet and rhgb I get 

[    5.946126] dracut: Starting plymouth daemon
[    5.970392] dracut: rd.dm=0: removing DM RAID activation
[    6.233912] dracut: Autoassembling MD Raid
[    6.264176] md: md0 stopped.
[    6.316583] md: bind<sdb>
[    6.326476] md: bind<sda>
[    6.336194] dracut: mdadm: Container /dev/md0 has been assembled with 2 drives
[    6.357909] md: md127 stopped.
[    6.369001] md: bind<sdb>
[    6.378693] md: bind<sda>
[    6.389563] md: raid1 personality registered for level 1
[    6.398963] bio: create slab <bio-1> at 1
[    6.407940] dracut: mdadm: array /dev/md127 now has 2 devices

... stuff removed that has to do with card-readers and external drive

[   29.471672] dracut Warning: No root device "block:/dev/disk/by-uuid/0705af6a-8279-48f5-ba54-1f9dccfd8cd1" found


It the controller says ok, the system boots fine.