Bug 620580
Summary: | Intel ICH RAID1 ends up write-protected, cannot be fsck-ed or mounted rw. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Gordan Bobic <gordan> |
Component: | mdadm | Assignee: | Doug Ledford <dledford> |
Status: | CLOSED DUPLICATE | QA Contact: | qe-baseos-daemons |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 6.0 | CC: | agk, dwysocha, heinzm, mbroz, peterm, prockai |
Target Milestone: | rc | Keywords: | RHELNAK |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-08-04 00:36:51 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Gordan Bobic
2010-08-02 21:53:25 UTC
This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** A little more information on this: - The system in question is using ICH9 RAID1. - If any RAID volumes are defined, even if they only cover parts of the disks available, those un-RAID-ed parts don't show up in the installer. This is broken behaviour because it doesn't coincide with the way other OS-es handle this situation, as it means the parts of the disks not claimed by RAID are rendered unusable if RAID is enabled. - I tried re-installing, this time with ext4 root (my original install was ext3, no separate boot partition), and that managed to get as far as firstboot. I suspect, however, the crash kernel initrd rebuild killed it, and the same problem re-occurred on next boot that rendered the system unbootable (write-protected block device). - Around the udev startup time, there is a warning/error message stating that /dev/mapper/<name of my RAID volume> cannot be statted / doesn't exist. Could it be that there is an issue where dmraid claims the underlying devices and makes them read-only, which then prevents anything else from accessing them? Perhaps a clash of integration between dmraid and MD raid subsystems (RAID1 seems to be handled by MD subsystem now, unlike in RHEL5 where dmraid was completely separate)? Switching to correct component mdadm and reassigning. Is this really a mdadm bug? It looks more like either: 1) dmraid bug in that it locks devices even though it didn't start the RAID volume 2) dracut bug in that initrd invokes dmraid when it should be using mdadm for starting the RAID volumes. From what I can see, it looks like MD RAID is on the receiving end of something locking out the physical devices as read-only. Or am I misunderstanding how this hangs together in RHEL6? mdadm has to control Intel Matrix RAID devices in RHEL6. Ie. dmraid should not be started at all if such devices are being discovered, hence your assumption that dracut calls dmraid erroneously seems right. I'm not even sure it is dracut at the moment (although I seem to remember seeing write-protect warnings last time, when the fs was ext3, before init was started). I'm thinking it might be rc.sysinit this time around. That seems to be where the error about not being able to stat the /dev/mapper/<raid-type_device-id_volume_name> is coming from. In fact, looking at rc.sysinit, lines 190-203: ==================== if ! strstr "$cmdline" nodmraid && [ -x /sbin/dmraid ]; then modprobe dm-mirror >/dev/null 2>&1 dmraidsets=$(LC_ALL=C /sbin/dmraid -s -c -i) if [ "$?" = "0" ]; then for dmname in $dmraidsets; do if [[ "$dmname" =~ '^isw_.*' ]] && \ ! strstr "$cmdline" noiswmd; then continue fi /sbin/dmraid -ay -i --rm_partitions -p "$dmname" >/dev/null 2>&1 /sbin/kpartx -a -p p "/dev/mapper/$dmname" done fi fi ==================== I'll see if my problem goes away if I pass the nodmraid boot parameter, but that should either be unnecessary (except for debugging) or should at least be set correctly by anaconda. But that will still cause problems on systems with mixed fake RAID controllers if mdadm passthrough is only for Intel ICH chipsets. I've got things mostly working OK now, but every once in a while, it still refuses to boot with the same symptoms. The issue seems to be that mdmon crashes in initrd. An error gets reported that it crashes in ld.so (IIRC), and it all fails from there onward. It's pretty intermittent, possibly a memory stomp dependant on what was in the memory before the reboot. I'll jot down the exact error next time it occurs. Oh,and this appears to get emitted on every boot from the above block from rc.sysinit: failed to stat() /dev/mapper/isw_bcffhhfiji_System What version of mdadm does rpm report you are using? (rpm -q mdadm?) mdadm-3.1.2-11.el6.x86_64 And here is the error: mdmon[511] trap invalid opcode ip:7f700723fe39 sp:7fff4c202d38 error:0 in ld-2.12.so[7f700722b000+1e000] Thanks, this is a known and already fixed issue. We had to modify mdmon to use pthreads() instead of clone() when creating threads because glibc won't play nice with your multithread program if you *don't* use pthreads. This was fixed in mdadm-3.1.3-0.git20100722.2.el6. It will show up in later beta refreshes. *** This bug has been marked as a duplicate of bug 604023 *** |