Bug 649038 - mdadm crashes with default rd_NO_MD rd_NO_DM options when RAID present; and system hangs on boot
mdadm crashes with default rd_NO_MD rd_NO_DM options when RAID present; and s...
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: mdadm (Show other bugs)
14
x86_64 Linux
low Severity urgent
: ---
: ---
Assigned To: Doug Ledford
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-11-02 15:52 EDT by Alex G.
Modified: 2011-07-14 19:34 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-07-14 19:34:16 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
strace of mdadm -S /dev/md5 (6.63 KB, application/octet-stream)
2010-11-10 07:41 EST, Marco Colombo
no flags Details

  None (edit)
Description Alex G. 2010-11-02 15:52:55 EDT
Description of problem:

After installation of Fedora 14 on a system containing a previously created MD array, the system hangs on boot, showing stack traces from mdadm.

Version-Release number of selected component (if applicable):


How reproducible:

Always

Steps to Reproduce:
1. Obtain a system with several hard drives and multiple controllers.
2. Create an MD array.
3. Install Fedora on a drive not on the MD array
(I used a drive on an HP Smart Array controller - cciss driver).
4. Boot the newly installed system.
  
Actual results:
Console shows numerous stack traces. It attempts to boot, but gets stuck at different places in the boot process, most often at
Starting Avahi Daemon
or
Starting HAL Daemon

Expected results:
System boots normally.

Additional info:
Same issue when upgrading from F13

My storage setup:
ICH10R
    - 4 x 500GB HDD in MD RAID10

HP Smart Array E200
  - 36GB logical drive (2 physical drives in RAID 1)
    - 600MB /boot ext4
    - 35GB / ext4
  - 100GB logical drive (4 phy in RAID 5)
    - 30GB /home ext4
    - 30GB unused ext4
    - free space

Removing the rd_NO_MD and rd_NO_DM options fixes the issue

I am setting this to urgent severity as it has the potential to affect all users using mdadm RAID.
Comment 1 Marco Colombo 2010-11-03 09:10:37 EDT
Same here, althought removing rd_NO_DM, does not solve it. My current cmdline is: ro root=/dev/mapper/vg_f14-root.

The only workaround, so far, is to comment out lines in /etc/mdadm.conf. Here's mine:

MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=4a1d4b02:b2e3f20d:149097a3:8678bd12
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=26774216:c8a4b2a9:fdee8349:10d37933
#ARRAY /dev/md2 level=raid1 num-devices=2 UUID=651e1894:7b52b825:cb43bc40:1bd8181c
#ARRAY /dev/md3 level=raid1 num-devices=2 UUID=96056171:cead6563:ed436f7e:2f5ccc2a
#ARRAY /dev/md4 level=raid1 num-devices=2 UUID=bea792bf:ad631cf7:848c3025:45788489
#ARRAY /dev/md5 level=raid1 num-devices=2 UUID=01c80031:bbbfaa4f:8b3ea10d:9aa80dc7
ARRAY /dev/md6 level=raid1 num-devices=2 UUID=8025e0ca:271fcea2:4a4104b3:0244a4c3
#ARRAY /dev/md7 level=raid1 num-devices=2 UUID=784926e7:b9024dd5:57d48fe0:61902bc4
#ARRAY /dev/md8 level=raid1 num-devices=2 UUID=a3e444a7:15c2bcef:e2d94411:a488ba98
#ARRAY /dev/md9 level=raid1 num-devices=2 UUID=89ef50a0:2a6655d8:cbe895a6:85481c56
#ARRAY /dev/md10 level=raid1 num-devices=2 UUID=144f0b00:cc7abb47:253e53bd:c0494f83
#ARRAY /dev/md11 level=raid1 num-devices=2 UUID=c976cdc9:7bf615e2:f213e8dc:32bc2286

Luckily enough, those 3 devices is all I need to get a working system. I have another config file, /etc/mdadm-full.conf, with all entries uncommented.

In order to get the system to boot, I had to disable fsck in /etc/fstab for devices (LVs) which are now not available at boot time. Also added 'noauto' to flags.

After boot, I do:
# mdadm --assemble --config /etc/mdadm-full.conf --scan
# vgchange -a y
# mount -a

It's kind weird that mdadm sigsegvs at boot (_after_ switching root, so it's the same mdadm I use), but not later. Must be something "environmental".
Comment 2 Phil Smith 2010-11-03 13:42:11 EDT
Same here.
Installed F14 (not upgraded) on one disk and the new installation crashes with fsck problems amd mdadm problems when trying to use the previously setup RAID 10 on the other 4 disks (created with F12).
Booting from DVD and doing rescue can mount the RAID OK.

Error during boot is:
fsck.ext4: No such file or directory while trying to open /dev/mapper/luks-...

I enter the emergency shell
fsck can't find the RAID
exit shell, Linux crashes.

Did a yum update as of 2010-11-03T15:30:00 and no better.

Does boot OK after commenting out the RAID from /etc/fstab, but with errors:
Starting udev: udevd-work[550]:'/sbin/madm -I /dev/sdc1' unexpected exit with status 0x000b
mdadm: failed to start array /dev/md0: Input/output error

Fixed by removing rd_NO_MD from linux kernel line in /boot/grub/grub.conf

This option should not really be in grub.conf because md0 was added in the installation setup so the installer "knew" there would be a raid array there.
Comment 3 Alex G. 2010-11-03 14:04:08 EDT
> This option should not really be in grub.conf because md0 was added in the
> installation setup so the installer "knew" there would be a raid array there.

Talking about QA... If it ain't broken, don't fix it.
Comment 4 Marco Colombo 2010-11-04 05:41:02 EDT
Well, anyway here the rd_NO_MD wasn't there. The original cmdline (the one set up by anaconda) was:

ro root=/dev/mapper/vg_f14-root rd_MD_UUID=26774216:c8a4b2a9:fdee8349:10d37933 rd_LVM_LV=vg_f14/root rd_LVM_LV=vg_f14/swap rd_NO_LUKS rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us
(note there is no rd_NO_MD)

current is:
ro root=/dev/mapper/vg_f14-root rd_NO_LUKS

and makes no difference. Tried w/o rd_NO_LUKS as well, BTW.

It seems the number of md devices it has to start plays a role. It is possible that with rd_NO_MD, in your cases, mdadm has to start more md devices.

Actually, looking at dmesg, my system activates /dev/md0, /dev/md1, /dev/md6 early in the boot process, before "dracut: Switching root". Must be in the initramfs image. Later, simply mdadm has no md devices to activate.

As far as I can see, md devices are either activated in early (pre root switch) boot stages, or they can't be activated later (/etc/rc.d/init.d/netfs, I think). In short, mdadm crashes in that script.

For both of you, maybe removing rd_NO_MD causes all md devices to be started by the initramfs image. I have md devices that are not involved at all during the boot process.
Comment 5 Marco Colombo 2010-11-05 12:41:24 EDT
Ok, I think I kinda narrowed it down.

Problems arise if 3 or more md devices are activated in rc.sysinit. Actually, it's udevd, everything happens during /sbin/start_udev invoked by rc.sysinit.

mdadm crashes with segfault:

[   15.398378] mdadm[1330]: segfault at 0 ip 0000003f3c867314 sp 00007fffc488d920 error 4 in libc-2.12.90.so[3f3c800000+199000]
[   15.433963] mdadm[1346]: segfault at 0 ip 0000003f3c867314 sp 00007fffc1591b30 error 4 in libc-2.12.90.so[3f3c800000+199000]
[   15.595522] mdadm[1360]: segfault at 0 ip 0000003f3c867314 sp 00007fffd3b51060 error 4 in libc-2.12.90.so[3f3c800000+199000]

Looking at the times, I think they are run concurrently. Maybe that's what makes them crash. Anyway, I've found a way to 'fix' it. Edited /sbin/start_udev and changed:

/sbin/udevd -d

into

/sbin/udevd -d --children-max=1

This way everything works. I guess mdadm invokations get serialized this way. I still have no idea why it crashes (should never happen) at startup time and not later. Maybe it has some kind of locking normally, which doesn't work at boot time.

I should stress this is a workaround only. Someone who knows mdadm and udevd better than me should look at mdadm source and have it print a nice error message instead of crashing, and instruct udevd not to invoke many mdadms at the same time at boot time, or fix mdadm so it can be run concurrently at boot time.

I've experienced no ill effects so far, but maybe --children-max=1 makes your system slower under certain conditions.


In short, to reproduce this, I think you should add some (at least 3, the more the better) md devices to /etc/mdadm.conf, and make sure that dracut doesn't start them before switching to the real root (NOT rebuilding initramfs should do, as mdadm.conf gets copied when you run mkinitrd). You should see mdadm crash when udevd is run by rc.sysinit.
Comment 6 Marco Colombo 2010-11-10 07:41:43 EST
Created attachment 459434 [details]
strace of mdadm -S /dev/md5
Comment 7 Marco Colombo 2010-11-10 07:46:15 EST
Comment on attachment 459434 [details]
strace of mdadm -S /dev/md5

Just added an attachment of a strace of mdadm. I found myself dropped to the FS recovery shell this morning, and was trying to stop some mds. While the situation is different, this is the same segmentation fault, so I think it's related.

BTW, it happened for all the mds that I stopped.
Comment 8 David Jansen 2010-11-12 06:45:50 EST
Something similar here, although in my case, there is only one md device, and I see no crash happening (but also no raid):

I recently installed Fedora 14 on a couple of computers that were previously running Fedora 12. These computers have 3 disks, sda holding /, /usr, swap and a partition called /data1, and sdb and sdc were configured as a software RAID1 (/dev/md0).

F14 anaconda correcly recognizes the setup, and I could specify /data2 as the mountpoint for /dev/md0, and installation proceeded as expected.

However, after reboot, the raid is broken, and /dev/md0 only has one member (in this case /dev/sdb1 but in another case it was /dev/sdc1).

I first saw this on an installation using a kickstart file, but retrying an interactive install gave the same results.

In /var/log/messages:

Nov 12 10:00:01 zegerplas kernel: [   12.396148] md: array md0 already has disks!
Nov 12 10:00:01 zegerplas kernel: [   13.287474] md/raid1:md0: active with 1 out of 2 mirrors
Nov 12 10:00:01 zegerplas kernel: [   13.287494] md0: detected capacity change from 0 to 500105150464
Nov 12 10:00:01 zegerplas kernel: [   13.289101]  md0: unknown partition table
Nov 12 10:00:01 zegerplas kernel: [   14.567086] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)

# cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sdb1[0]
      488383936 blocks [2/1] [U_]
      
Fixable by re-adding /dev/sdb1 to the raid, but after the next reboot, the problem was back.
Now I noticed that /boot/grub/grub.conf included options rd_NO_MD (and rd_NO_LUKS and a few more). The problem seems to be fixed if I remove those options, re-add /dev/sdb1 to the raid, and then reboot. So there is a workaround, but that's not an optimal solution.

Two questions: 
1. is this a correct workaround, or was the rd_NO_MD boot option there for a reason and will removing it cause problems?
2. Is there a way to remove this option during installation, to ensure the raid never gets broken? I think that is a lot safer than allowing the mirror to break, and then forcing a re-add. It's also more convenient and there is no risk of forgetting to rebuild the mirror.
Can this be done from a kickstart file? I know the 'bootloader' command in a kickstart file can add boot options, but how to get rid of unwanted boot options that seem to be added by default (same is trye when doing an interactive install, one can add boot options, but that's not what I seem to need here).
Comment 9 Marco Colombo 2010-11-18 05:16:44 EST
Regarding my 'fix' (adding --children-max=1), of course it works until someone releases a new update for udev. Being /sbin/start-udev NOT marked as %config, it simply (and silenty) gets replaced by the stock one, and the fix is gone. BTW, the update was udev-161-6.fc14. It hadn't occurred to me yesterday when I ran 'yum update', it did hit me this morning when 9 out of 12 md devices didn't start. It even took me a while to realize what happened (well, you know, low caffeine mode is no good for debugging).

Anyway, new strategy. Adding udev.children-max=1 to the kernel command line in grub.conf yields to the same result and it's even update-proof.
Comment 10 Alex G. 2010-11-18 05:22:22 EST
Anyone from QA care to comment?
Comment 11 Doug Ledford 2010-11-18 11:11:05 EST
I'll take a look at this Monday when I get back to my test environment.
Comment 12 Doug Ledford 2010-11-23 12:59:04 EST
Can you install the mdadm package from updates-testing and rebuild your initramfs images and see if the problem goes away?
Comment 13 Marco Colombo 2010-11-24 06:34:56 EST
It seems to work.

Now, my initrd image (dracut) starts all md devices (and all LVs) before switching root. This leaves nothing to do to udev (well, as far as mdadm is concerned). I don't know if it's the intended behaviour, but it works.

Anyway, in order to trigger my problem, I fed mkinitrd with a fake /etc/mdadm.conf, with only 3 devices defined (the 3 I need for the system to boot). As I expected, dracut activates only those 3, and the rest is activated later by udev, with no errors. I tried twice, just to be sure.

In short, I tried everything but I wasn't able to make mdadm crash anymore.
Comment 14 David Lai 2010-12-08 14:03:21 EST
Summary: evidence that udec.children-max=1 fixes the RAID problem.

I tested a system with 8 SATA hard drives, and 2 software raids.  One software raid1 with 4 devices, and one software raid5 with 8 devices.  The system has 4 onboard SATA ports, and an add-in Adaptec card with 4 additional ports.

I ran a script which reboots the computer every 5 minutes and records the status of the hard drives and the status of the RAID.  I ran the test with 90 reboots with and without the "udev.children-max=1" kernel boot option.  The results are very consistent: with the udev.children-max=1 option the system comes up clean every time, without the udev.children-max=1 option, the system only comes up clean  one time out of 90 reboots!

I am using fedora14 kernel 2.6.35.9-64.fc14.x86_64 #1 SMP Fri Dec 3 12:19:41 UT
C 2010 x86_64 x86_64 x86_64 GNU/Linux

The boot options are:

ro root=UUID=e04b8008-b938-445a-a09e-b999265466af rd_NO_LUKS rd_NO_LVM rd_NO_MD 
rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us udev.children-ma
x=1

and

ro root=UUID=e04b8008-b938-445a-a09e-b999265466af rd_NO_LUKS rd_NO_LVM rd_NO_MD 
rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us

I am testing 2 things on each reboot:

 1) does the disks /dev/sda thru /dev/sdh come up consistently mapped to the
    same physical drives?  Summary: with the udev option the drives are always
    consistently mapped.  Without the udev option the drives map in
    a somewhat random order.
 2) do the 2 RAID devices come up clean?  Summary: with the udev option its always clean; without the udev option it almost never comes up clean.

I am expecting the disk map to be M0 M1 M2 M3 A0 A1 A2 A3; meaning
M0 = the hard drive attached to motherboard sata port 0
...
A3 = the hard drive attached to adaptec card port 3

I am expecting the 2 raid devices to come up clean, ie. [UUUU] and [UUUUUUUU]


Here is the raw results:

without the udev.children-max=1:

Error out.001: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [__UU]
Error out.002: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUU_] [UUUUUUUU]
Error out.003: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UU_U] [UUUUUUU_]
Error out.004: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: 
Error out.005: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU]
Error out.006: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [___U]
Error out.007: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU]
Error out.008: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_U_U] [UU_UUUUU]
Error out.009: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [_UUU]
Error out.010: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_UUU]
Error out.011: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_U_U]
Error out.012: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [__U_]
Error out.013: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [U__U] [UUUUU_UU]
Error out.014: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [__U_]
Error out.015: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_UUU]
Error out.016: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_UUU]
Error out.017: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [_UU_]
Error out.018: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_U__]
Error out.019: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: 
Error out.020: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUU_UU]
Error out.021: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [__U_]
Error out.022: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [_U_U]
Error out.023: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU]
Error out.024: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [U___] [UUUUUUUU]
Error out.025: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [U___] [UUUUU_UU]
Error out.026: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [U_UU] [UUUU_UUU]
Error out.027: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_UU_] [UUUUUUU_]
Error out.028: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [U__U] [UUUUUUUU]
Error out.029: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_UU_] [UUUUU_UU]
Error out.030: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_U_U]
Error out.031: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UU_U]
Error out.032: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [U___]
Error out.033: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [_U__]
Error out.034: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: 
Error out.035: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [U_U_]
Error out.036: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [__U_] [UUUUUUUU]
Error out.037: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [__UU]
Error out.038: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: 
Error out.039: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [___U]
Error out.040: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [_UUU]
Error out.041: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [___U]
Error out.042: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [U_U_]
Error out.043: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [___U] [UUUUUUUU]
Error out.044: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [U__U]
Error out.045: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [__U_]
Error out.046: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [_UUU]
Error out.047: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [U_UU]
Error out.048: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [U___] [UU_UUUUU]
Error out.049: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.050: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [___U]
Error out.051: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU]
Error out.052: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: 
Error out.053: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU]
Error out.054: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU]
Error out.055: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [U_UU] [UUUUUU_U]
Error out.056: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU]
Error out.057: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [__UU]
Error out.058: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUU_]
Error out.059: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_U_U] [UUUUUUUU]
Error out.060: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUU_]
Error out.061: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [U__U]
Error out.062: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU]
Error out.063: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [_U__]
Error out.064: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_U__]
Error out.065: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU]
Error out.066: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU]
Error out.067: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [_UUU] [UU_UUUUU]
OK    out.068: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.069: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [U___]
Error out.070: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [U__U] [U_UUUUUU]
Error out.071: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_U_U]
Error out.072: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: 
Error out.073: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [___U]
Error out.074: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [___U] [UUUUUUU_]
Error out.075: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [_UUU]
Error out.076: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: 
Error out.077: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_UU_]
Error out.078: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU]
Error out.079: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [_U__]
Error out.080: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [_UUU]
Error out.081: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.082: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_U__]
Error out.083: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [__U_]
Error out.084: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [U__U]
Error out.085: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [_U__]
Error out.086: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU]
Error out.087: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [U__U]
Error out.088: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU]
Error out.089: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU]
Error out.090: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU]


and with the udev option:

OK    out.001: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.002: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.003: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.004: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.005: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.006: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.007: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.008: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.009: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.010: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.011: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.012: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.013: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.014: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.015: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.016: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.017: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.018: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.019: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.020: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.021: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.022: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.023: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.024: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.025: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.026: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.027: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.028: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.029: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.030: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.031: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.032: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.033: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.034: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.035: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.036: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.037: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.038: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.039: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.040: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.041: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.042: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.043: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.044: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.045: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.046: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.047: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.048: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.049: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.050: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.051: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.052: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.053: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.054: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.055: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.056: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.057: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.058: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.059: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.060: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.061: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.062: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.063: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.064: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.065: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.066: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.067: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.068: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.069: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.070: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.071: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.072: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.073: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.074: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.075: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.076: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.077: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.078: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.079: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.080: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.081: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.082: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.083: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.084: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.085: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.086: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.087: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.088: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.089: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.090: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Comment 15 Marco Colombo 2010-12-08 18:15:28 EST
@David
Is that with the updates-testing mdadm package as per Doug's suggestion or with the standard one?
Comment 16 David Lai 2010-12-08 21:57:39 EST
The mdadm is standard, kernel is standard.  I just wanted to test the udev flag
workaround, and.... it does work.

Oh - and my mdadm.conf uses UUID's, not device names.  So it appears that reordered device names cause problems with the standard mdadm even if UUID's are used in the conf file.
Comment 17 Marco Colombo 2010-12-09 04:48:32 EST
Well, thanks. :) I use UUID, too in my mdadm.conf.

Do try the update-testing package. It solved the issue completely here. Impressive testing job, BTW.
Comment 18 David Lai 2010-12-09 12:49:52 EST
Summary: updates-testing mdadm fixes raid assembly problem.

After I updated to the updates-testing mdadm:

  Updating       : mdadm-3.1.3-0.git20100804.2.fc14.x86_64                  1/2 
  Cleanup        : mdadm-3.1.3-0.git20100722.2.fc14.x86_64                  2/2 

I reran the test without udev flag and the new mdadm assembled the raids OK
each time.  Here are the test results:

OK    out.001: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.002: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.003: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.004: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.005: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.006: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.007: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.008: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.009: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.010: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.011: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.012: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.013: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.014: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.015: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.016: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.017: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.018: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.019: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.020: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.021: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.022: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.023: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.024: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.025: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.026: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.027: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.028: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.029: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.030: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.031: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.032: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.033: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.034: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.035: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.036: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.037: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.038: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.039: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.040: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.041: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.042: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.043: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.044: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.045: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.046: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.047: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.048: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.049: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.050: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.051: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.052: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.053: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.054: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.055: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.056: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.057: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.058: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.059: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.060: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.061: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.062: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.063: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.064: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.065: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.066: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.067: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.068: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.069: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.070: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.071: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.072: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.073: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.074: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.075: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.076: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.077: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.078: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.079: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.080: MAP: A0 A1 A2 A3 M0 M1 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.081: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.082: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.083: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.084: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
OK    out.085: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.086: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
OK    out.087: MAP: M0 M1 M2 M3 A0 A1 A2 A3  RAID: [UUUU] [UUUUUUUU]
Error out.088: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.089: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]
Error out.090: MAP: M0 M1 A0 A1 A2 A3 M2 M3  RAID: [UUUU] [UUUUUUUU]

It doesnt fix the mapping of hard drives to device names (I didnt expect it to), but it does fix the RAID problems!

My recommendations: update to the new mdadm;

in addition, if you're also concerned about having consistent hard drive to device mapping, then also use the udev flag.
Comment 19 Marco Colombo 2010-12-09 13:42:28 EST
Wait, now I'm afraid I misread something in your former post.

> does the disks /dev/sda thru /dev/sdh come up consistently mapped to the same physical drives?"

what exactly do you mean with that? Are you referring to the fact that /dev/sda "points" to a different disk across reboots? I mean, does the output of the following command is different?

$ hal-get-property --key storage.serial --udi `hal-find-by-property --key block.device --string /dev/sda`

Or are you referring to the order in which disks appear in /proc/mdstat? AFAIK, that order is meaningless, it just depends on which devices get activated first, since udev tries to activate all of them in parallel, the outcome is pretty much random. With max children set to 1, of course, you get no parallel activation. I bet that if you time it, it is slightly slower.

I'm no expert, and I may be wrong, but the udev activation of raid devices via detection and incremental activation of raid members (mdadm -I) is one way of doing it. In rc.sysinit there's still the code that activates raid devices using mdadm -As, it's just run later than udev, so at that point there should be nothing left to do. Try and disable the rules in /lib/udev/rules.d/65-md-incremental.rules or move the file elsewhere and see what happens. Raid devices should be activated later by mdadm -As, and then, _I think_, they may have consistent order.

Not that there is a point in having consistent order in /proc/mdstat, but it's just out of curiosity. :)
Comment 20 David Lai 2010-12-09 14:09:47 EST
I meant if the physical disk which is connected to the SATA port 0 always comes up as /dev/sda, etc...

Without the udev flag, all the drives come up different on each reboot; ie. /dev/sda may point to a different hard drive.

With the udev flag, all the drives come up in the exact same expected order on each reboot; ie. /dev/sda is ALWAYS the disk at motherboard SATA port 0

This has nothing to do with mdadm or raid, but I suspect the order in which hard drives comes up affects RAID assembly.  I know that if RAID is working correctly the order of the hard drives doesnt matter, mdadm should sort that all out.  But I guess the default mdadm in fedora14 appears to have a bug and cant assemble the RAID properly if the devices come up in random order.

In my testing I use a script, and I compare the hard drive serial number as reported by SMART against a list of hard drives I have attached to physical SATA ports on the server to determine how each device sda thru sdh is mapped.  In my testing the results show that without the udev flag, these devices come up in different orders across reboots, which in my opinion is a problem, and can have additional side effects like RAID not coming up correctly.

Good news though, inconsistent device mapping is not a problem for RAID, as long as you update mdadm to the updates-testing version.

Bad news is that inconsistent device mapping may have other problems not related to RAID.  For one example, if the system reports /dev/sda went bad, I dont want to have to do research to figure out which disk needs to be replaced.  I *SHOULD* replace the hard drive attached to sata port 0, because it *SHOULD* always be /dev/sda.
Comment 21 Marco Colombo 2010-12-09 15:31:32 EST
Well, I've never seen that happen, and I have been running with 4 disks installed for ages. Althought, they are on the same SATA controller.

Since it seems to be udev related (and not BIOS or initrd related) my suggestion is to look at /etc/udev/rules.d/70-persistent-cd.rules and try to do the same for disks.

Again, I'm no expert, my feeling is that in my system initrd takes care of deciding which is sda and which sdd, and does that serially and thus consistently. In your system, it's udev, and unless instructed differently, it does that in parallel.

You should open a bug against udev, maybe.

Note that probing order happens to be consistent, but is not 100% safe. You can't really rely on that. That is with max children=1.
Comment 22 David Lai 2010-12-09 18:07:54 EST
yes - if you have only one controller, it will come up consistently.  If you see in my tests  the motherboard controller ports M0 M1 M2 M3 - always come up
in that order.  And the add-in card A0 A1 A2 A3 also come up in order.  In my case since I have 2 controllers, udev/parallel may intermix them, so you get things like:

M0 M1 A0 A1 A2 A3 M2 M3

In summary: if you have only one controller, you will always get the correct order.  Problems only happen if you have more than one controller.
Comment 23 Doug Ledford 2011-07-14 19:34:16 EDT
F14 shipped with a broken mdadm.  A later mdadm update fixed it.  If this problem still persists after doing an update to the later mdadm and a rebuild of the dracut initramfs image, then there is more to be done.  Otherwise, this should already be fixed.  I'm closing this out as fixed, please reopen if it is not.

Note You need to log in before you can comment on or make changes to this bug.