Description of problem:
Installing onto a partition of an Intel RAID1 array doesn't produce a bootable system. Kernel boots, but even with rw kernel parameter, it says the device is write-protected, and mounts it read-only. fsck then fails on a write-protected device. In the shell, re-mounting with rw,remount also produces the same error about the disk being write-protected.
/dev/mdstat says that resync is pending, but not running.
Booting into installer rescue mode works fine, however. The volume gets mounted read-write under /mnt/sysimage. Resync progresses according to /proc/mdstat
I am not sure whether this is a dmraid or mdadm bug, the line between the two seems to have gotten somewhat blurred of late.
Version-Release number of selected component (if applicable):
Clean RHEL6 Beta 2 install.
Every time so far.
Steps to Reproduce:
1. Install onto Intel RAID1 (ICH/Matrix device mapper RAID)
2. Reboot. The system will bail when fsck bails due to the device being seen as write-protected.
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.
** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **
A little more information on this:
- The system in question is using ICH9 RAID1.
- If any RAID volumes are defined, even if they only cover parts of the disks available, those un-RAID-ed parts don't show up in the installer. This is broken behaviour because it doesn't coincide with the way other OS-es handle this situation, as it means the parts of the disks not claimed by RAID are rendered unusable if RAID is enabled.
- I tried re-installing, this time with ext4 root (my original install was ext3, no separate boot partition), and that managed to get as far as firstboot. I suspect, however, the crash kernel initrd rebuild killed it, and the same problem re-occurred on next boot that rendered the system unbootable (write-protected block device).
- Around the udev startup time, there is a warning/error message stating that /dev/mapper/<name of my RAID volume> cannot be statted / doesn't exist. Could it be that there is an issue where dmraid claims the underlying devices and makes them read-only, which then prevents anything else from accessing them? Perhaps a clash of integration between dmraid and MD raid subsystems (RAID1 seems to be handled by MD subsystem now, unlike in RHEL5 where dmraid was completely separate)?
Switching to correct component mdadm and reassigning.
Is this really a mdadm bug? It looks more like either:
1) dmraid bug in that it locks devices even though it didn't start the RAID
2) dracut bug in that initrd invokes dmraid when it should be using mdadm for
starting the RAID volumes.
From what I can see, it looks like MD RAID is on the receiving end of something
locking out the physical devices as read-only.
Or am I misunderstanding how this hangs together in RHEL6?
mdadm has to control Intel Matrix RAID devices in RHEL6.
Ie. dmraid should not be started at all if such devices are being discovered,
hence your assumption that dracut calls dmraid erroneously seems right.
I'm not even sure it is dracut at the moment (although I seem to remember seeing write-protect warnings last time, when the fs was ext3, before init was started). I'm thinking it might be rc.sysinit this time around.
That seems to be where the error about not being able to stat the
is coming from.
In fact, looking at rc.sysinit, lines 190-203:
if ! strstr "$cmdline" nodmraid && [ -x /sbin/dmraid ]; then
modprobe dm-mirror >/dev/null 2>&1
dmraidsets=$(LC_ALL=C /sbin/dmraid -s -c -i)
if [ "$?" = "0" ]; then
for dmname in $dmraidsets; do
if [[ "$dmname" =~ '^isw_.*' ]] && \
! strstr "$cmdline" noiswmd; then
/sbin/dmraid -ay -i --rm_partitions -p "$dmname" >/dev/null 2>&1
/sbin/kpartx -a -p p "/dev/mapper/$dmname"
I'll see if my problem goes away if I pass the nodmraid boot parameter, but that should either be unnecessary (except for debugging) or should at least be set correctly by anaconda. But that will still cause problems on systems with mixed fake RAID controllers if mdadm passthrough is only for Intel ICH chipsets.
I've got things mostly working OK now, but every once in a while, it still refuses to boot with the same symptoms. The issue seems to be that mdmon crashes in initrd. An error gets reported that it crashes in ld.so (IIRC), and it all fails from there onward. It's pretty intermittent, possibly a memory stomp dependant on what was in the memory before the reboot.
I'll jot down the exact error next time it occurs.
Oh,and this appears to get emitted on every boot from the above block from rc.sysinit:
failed to stat() /dev/mapper/isw_bcffhhfiji_System
What version of mdadm does rpm report you are using? (rpm -q mdadm?)
And here is the error:
mdmon trap invalid opcode ip:7f700723fe39 sp:7fff4c202d38 error:0 in ld-2.12.so[7f700722b000+1e000]
Thanks, this is a known and already fixed issue. We had to modify mdmon to use pthreads() instead of clone() when creating threads because glibc won't play nice with your multithread program if you *don't* use pthreads. This was fixed in mdadm-3.1.3-0.git20100722.2.el6. It will show up in later beta refreshes.
*** This bug has been marked as a duplicate of bug 604023 ***