Red Hat Bugzilla – Bug 488038
Failed to start array (RAID0) with root fs
Last modified: 2015-03-03 17:34:07 EST
Description of problem:
System fails to boot because it couldn't assemble RAID0 array with root / filesystem. Looks like kernel with older mdadm in initrd successfully assembles all arrays (so I have root accessible at least in ro mode) but then mdadm spawned from initscripts breaks everything.
Version-Release number of selected component (if applicable):
older version mdadm-3.0-0.devel2.1.fc11 is fine
always after boot
Steps to Reproduce:
1. Boot the system (root=/dev/md1)
mdadm: /dev/md1 has been started with 2 drives.
Setting hostname localhost.localdomain: [ OK ]
mdadm: /dev/md0 is already in use
mdadm: /dev/md3 is already in use
mdadm: failed to RUN_ARRAY /dev/md/3_0: Cannot allocate memory
mdadm: Not enough devices to start the array.
mdadm: /dev/md/0_0 has been started with 1 drive (out of 2)
... boot then fails on fsck unable to access the filesystem.
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=0.90 UUID=0c2a49fd:ba124f6d:e0634eb5:9e0e1855
ARRAY /dev/md1 level=raid0 num-devices=2 metadata=0.90 UUID=2d64fe1d:e87e3bfe:18f25720:de7af605
ARRAY /dev/md3 level=raid0 num-devices=2 metadata=0.90 UUID=f8b1e8e6:7a83767a:00b7a568:dbd45bb9
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
md: md1 stopped.
md1: setting max_sectors to 128, segment boundary to 32767
raid0: looking at sda2
raid0: comparing sda2(7678976) with sda2(7678976)
raid0: ==> UNIQUE
raid0: 1 zones
raid0: looking at sdb2
raid0: comparing sdb2(7678976) with sda2(7678976)
raid0: FINAL 1 zones
raid0 : md_size is 15357952 blocks.
raid0 : conf->hash_spacing is 15357952 blocks.
raid0 : nb_zone is 1.
raid0 : Allocating 8 bytes for hash.
md1: unknown partition table
This bug has been identified and needs a change to initscripts in order to be solved properly.
Specifically, in rc.sysinit, there is this line:
# Start any MD RAID arrays that haven't been started yet
[ -f /etc/mdadm.conf -a -x /sbin/mdadm ] && /sbin/mdadm -As --auto=yes --run
This needs to be changed as follows:
# Wait for local RAID arrays to finish incremental assembly before continuing
It turns out that the original line races with udev's attempts to perform incremental assembly on the array. In the end, udev ends up grabbing some devices and sticking them in a partially assembled array, and the call to mdadm grabs some other devices and sticks them in a *different* array, and neither array gets started properly. With this change, the udev incremental assembly rules work as expected.
Changing to initscripts package.
*** Bug 487965 has been marked as a duplicate of this bug. ***
Hrm. So things are mildly better w/the change prescribed in comment #1 on one of my affected systems. Instead of getting at least two different arrays created for what is supposed to be my /boot volume, I get only /dev/md0, but it contains only a single member.
Also, this change results in the following spew:
the program '/bin/bash' called 'udevsettle', it should use udevadm settle <options>', this will stop working in a future release
udevadm: the program '/bin/bash' called 'udevsettle', it should use 'udevadm settle <options>', this will stop working in a future release
Even after changing over to 'udevadm settle --timeout=30' and adding a 'sleep 5' after that, I'm still only getting a single drive added to /dev/md0 every time.
What version of mdadm are you using? I tested this with mdadm-3.0-0.devel3.1.fc11 which is not yet in rawhide, only locally built, and with that version it worked fine. As far as the udevsettle versus udevadm settle, that would be because I'm testing this on an F9 machine with older udev, so it would need to be changed for the later versions of udev in rawhide. However, no timeout nor any sleeps are necessary for me with the current mdadm (which also includes an updated mdadm rules file that could certainly play a role in what you are seeing). My impression is that to fully solve the problem, you really need both udpates, but a bug can only be against one component at a time. I'll clone for the mdadm half of the issue.
Will re-test once the devel3-based build of mdadm hits rawhide, was using an earlier version.
I tried to install the new mdadm build and it had a file conflict with udev:
Transaction Check Error:
file /lib/udev/rules.d/64-md-raid.rules from install of mdadm-3.0-0.devel3.1.fc11.x86_64 conflicts with file from package udev-139-2.fc11.x86_64
I've corrected the issue and a new version will be built shortly.
The new version is available in rawhide.
I was able to solve this issue without requiring any changes to initscripts, so this bug can be closed.
'without requiring any changes' - do you mean you want 'mdadm -As', or 'udevadm settle', both, or neither?
Well, I don't have the F11 init scripts at hand to verify, but I was under the impression it had a call to mdadm -As --run in there (no --auto=yes like earlier initscript packages have). That's exactly as it needs to be. So, I mean no changes to the current existing package...unless I'm wrong and in all the various edits I did I forgot the original unchanged line and it isn't mdadm -As --run.
I had the same error messages on f10 and it turned out that when I re-used a raid drive, it had previously been used as /dev/sde, and now I had re-partitioned it as /dev/sde1. This all worked fine manually, since I was typing sde1, but the kernel autodetection was finding the old superblock on sde first and, not finding a match, setting that up under its own /dev/md_d3 auto-raid thing instead.
mdadm -E /dev/sde
mdadm --misc --zero-superblock /dev/sde
and then the auto-assembly started working (for RAID-0, anyway, I'm onto RAID-10 now...)