620580 – Intel ICH RAID1 ends up write-protected, cannot be fsck-ed or mounted rw.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 620580 - Intel ICH RAID1 ends up write-protected, cannot be fsck-ed or mounted rw.

Summary: Intel ICH RAID1 ends up write-protected, cannot be fsck-ed or mounted rw.

Keywords:
Status:	CLOSED DUPLICATE of bug 604023
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	mdadm
Sub Component:
Version:	6.0
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Doug Ledford
QA Contact:	qe-baseos-daemons
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-08-02 21:53 UTC by Gordan Bobic
Modified:	2010-08-04 00:36 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-08-04 00:36:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Gordan Bobic 2010-08-02 21:53:25 UTC

Description of problem:
Installing onto a partition of an Intel RAID1 array doesn't produce a bootable system. Kernel boots, but even with rw kernel parameter, it says the device is write-protected, and mounts it read-only. fsck then fails on a write-protected device. In the shell, re-mounting with rw,remount also produces the same error about the disk being write-protected.

/dev/mdstat says that resync is pending, but not running.

Booting into installer rescue mode works fine, however. The volume gets mounted read-write under /mnt/sysimage. Resync progresses according to /proc/mdstat

I am not sure whether this is a dmraid or mdadm bug, the line between the two seems to have gotten somewhat blurred of late.

Version-Release number of selected component (if applicable):
Clean RHEL6 Beta 2 install.

How reproducible:
Every time so far.

Steps to Reproduce:
1. Install onto Intel RAID1 (ICH/Matrix device mapper RAID)
2. Reboot. The system will bail when fsck bails due to the device being seen as write-protected.

Comment 2 RHEL Program Management 2010-08-02 22:27:34 UTC

This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 3 Gordan Bobic 2010-08-03 09:19:36 UTC

A little more information on this:

- The system in question is using ICH9 RAID1.

- If any RAID volumes are defined, even if they only cover parts of the disks available, those un-RAID-ed parts don't show up in the installer. This is broken behaviour because it doesn't coincide with the way other OS-es handle this situation, as it means the parts of the disks not claimed by RAID are rendered unusable if RAID is enabled.

- I tried re-installing, this time with ext4 root (my original install was ext3, no separate boot partition), and that managed to get as far as firstboot. I suspect, however, the crash kernel initrd rebuild killed it, and the same problem re-occurred on next boot that rendered the system unbootable (write-protected block device).

- Around the udev startup time, there is a warning/error message stating that /dev/mapper/<name of my RAID volume> cannot be statted / doesn't exist. Could it be that there is an issue where dmraid claims the underlying devices and makes them read-only, which then prevents anything else from accessing them? Perhaps a clash of integration between dmraid and MD raid subsystems (RAID1 seems to be handled by MD subsystem now, unlike in RHEL5 where dmraid was completely separate)?

Comment 4 Heinz Mauelshagen 2010-08-03 10:09:48 UTC

Switching to correct component mdadm and reassigning.

Comment 5 Gordan Bobic 2010-08-03 13:02:25 UTC

Is this really a mdadm bug? It looks more like either:

1) dmraid bug in that it locks devices even though it didn't start the RAID
volume

2) dracut bug in that initrd invokes dmraid when it should be using mdadm for
starting the RAID volumes.

From what I can see, it looks like MD RAID is on the receiving end of something
locking out the physical devices as read-only.

Or am I misunderstanding how this hangs together in RHEL6?

Comment 6 Heinz Mauelshagen 2010-08-03 13:39:00 UTC

mdadm has to control Intel Matrix RAID devices in RHEL6.

Ie. dmraid should not be started at all if such devices are being discovered,
hence your assumption that dracut calls dmraid erroneously seems right.

Comment 7 Gordan Bobic 2010-08-03 13:59:16 UTC

I'm not even sure it is dracut at the moment (although I seem to remember seeing write-protect warnings last time, when the fs was ext3, before init was started). I'm thinking it might be rc.sysinit this time around.

That seems to be where the error about not being able to stat the
/dev/mapper/<raid-type_device-id_volume_name>
is coming from.

In fact, looking at rc.sysinit, lines 190-203:

====================
if ! strstr "$cmdline" nodmraid && [ -x /sbin/dmraid ]; then
        modprobe dm-mirror >/dev/null 2>&1
        dmraidsets=$(LC_ALL=C /sbin/dmraid -s -c -i)
        if [ "$?" = "0" ]; then
                for dmname in $dmraidsets; do
                        if [[ "$dmname" =~ '^isw_.*' ]] && \
                           ! strstr "$cmdline" noiswmd; then
                                continue
                        fi
                        /sbin/dmraid -ay -i --rm_partitions -p "$dmname" >/dev/null 2>&1
                        /sbin/kpartx -a -p p "/dev/mapper/$dmname"
                done
        fi
fi
====================

I'll see if my problem goes away if I pass the nodmraid boot parameter, but that should either be unnecessary (except for debugging) or should at least be set correctly by anaconda. But that will still cause problems on systems with mixed fake RAID controllers if mdadm passthrough is only for Intel ICH chipsets.

Comment 8 Gordan Bobic 2010-08-03 23:45:41 UTC

I've got things mostly working OK now, but every once in a while, it still refuses to boot with the same symptoms. The issue seems to be that mdmon crashes in initrd. An error gets reported that it crashes in ld.so (IIRC), and it all fails from there onward. It's pretty intermittent, possibly a memory stomp dependant on what was in the memory before the reboot.

I'll jot down the exact error next time it occurs.

Comment 9 Gordan Bobic 2010-08-03 23:47:08 UTC

Oh,and this appears to get emitted on every boot from the above block from rc.sysinit:
failed to stat() /dev/mapper/isw_bcffhhfiji_System

Comment 10 Doug Ledford 2010-08-04 00:05:25 UTC

What version of mdadm does rpm report you are using? (rpm -q mdadm?)

Comment 11 Gordan Bobic 2010-08-04 00:24:13 UTC

mdadm-3.1.2-11.el6.x86_64

Comment 12 Gordan Bobic 2010-08-04 00:26:43 UTC

And here is the error:

mdmon[511] trap invalid opcode ip:7f700723fe39 sp:7fff4c202d38 error:0 in ld-2.12.so[7f700722b000+1e000]

Comment 13 Doug Ledford 2010-08-04 00:36:00 UTC

Thanks, this is a known and already fixed issue.  We had to modify mdmon to use pthreads() instead of clone() when creating threads because glibc won't play nice with your multithread program if you *don't* use pthreads.  This was fixed in mdadm-3.1.3-0.git20100722.2.el6.  It will show up in later beta refreshes.

Comment 14 Doug Ledford 2010-08-04 00:36:51 UTC


*** This bug has been marked as a duplicate of bug 604023 ***

Note You need to log in before you can comment on or make changes to this bug.