Bug 184570
Summary: | multiple problems with md autodetect | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Charlie Brady <charlieb-redhat-bugzilla> |
Component: | kernel | Assignee: | Doug Ledford <dledford> |
Status: | CLOSED NOTABUG | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | bugzilla, jbaron |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-08-27 23:55:33 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Charlie Brady
2006-03-09 21:53:20 UTC
I've discovered a technique to ensure that the md devices with the correct uids are started. The mdadm package source code includes a file mdassemble.c. If you run "make mdassemble", you obtain a statically linked program of about 150kb. If you include this program and a suitable /etc/mdadm.conf in an initrd file, and replace these lines: raidautorun /dev/md1 raidautorun /dev/md2 in /init with: mknod /dev/md1 b 9 1 mknot /dev/md2 b 9 2 mdassemble then the raid arrays are constructed correctly assembled. A suitable /etc/mdadm.conf can be created capturing the output of "mdadm --examine --scan": [root@test7 ~]# mdadm --examine --scan ARRAY /dev/md2 level=raid1 num-devices=2 \ UUID=a347e7f8:61d99d7b:bbbfa329:a5175902 devices=/dev/hda2,/dev/hdb2 ARRAY /dev/md1 level=raid1 num-devices=2 \ UUID=294182c7:af337fcf:2513036f:3c6231a9 devices=/dev/hda1,/dev/hdb1 [root@test7 ~]# [Note that although I've reported this against RHEL4, I haven't seen anything to suggest the same problem doesn't also affect recent FC releases. I consider this a serious problem (people commonly reuse disks, and don't expect the last disk added to be the one which actually boots). I'm surprised this issue hasn't received any attention from RedHat.] The linux kernel's autodetect feature is working as best it can. In this situation, you have presented it with two different raid devices that both claim to be md0 and both have valid, but different, uuids and both think they are up to date. The kernel has *no* way of knowing which one is right, so you only have a 50/50 chance of getting the right device started. In order to avoid a situation like this, there are multiple options: 1) Don't use autodetect and instead use manual startup (which is what your second post describes, although you really want to leave the device lines out of each array definition and add the line 'DEVICE partitions' to your mdadm.conf). 2) Before taking a disk out of service where it might be reused elsewhere, wipe the partition table so that it will be seen as clean when you put it into another machine. 3) If you don't want to wipe the partition table, you can at least set the partition type to Linux instead of Raid Autodetect which will keep the drives from interferring with the normal startup of raid arrays in whatever machine you put them into (I should also note that if you are using manual startup by uuid like in your second post, you can switch all of your partitions to linux instead of raid autodetect and mdassemble will still assemble them just fine). 4) If you do put the drive into a machine and have the problem you posted about, all you need to do is run fdisk on the new drive, switch partition types to linux, reboot, you'll now be on your original raid arrays, then you can hot add the replacement disks partitions into your running arrays, which will over write the superblocks on the new disks to make it match your running arrays, then you can re-enable the autodetect partition type on the replacement disk partitions, then reboot and the system will come up on the correct drive and with the array fully assembled. So, the long and short of it is that we don't support both A) using autodetect and B) putting disks with valid superblocks and raid autodetect partitions into a system that already has existing md devices of the same md device name and already using autodetect. It is the system administrator's responsibility to control which devices are labelled with RAID superblocks and tagged in the partition table for autodetect startup, especially when shifting drives between machines. > The linux kernel's autodetect feature is working as best it can. That's debatable. I think it would be less surprising, and more likely to be correct, if it searched devices first to last, rather than last to first. > The kernel has *no* way of knowing which one is right, so you only > have a 50/50 chance of getting the right device started. Correct. This is why I believe RedHat should not leave it to the kernel, but should provide some assistance via initrd. > 1) Don't use autodetect and instead use manual startup (which is what your > second post describes, although you really want to leave the device lines > out of each array definition and add the line 'DEVICE partitions' to your > mdadm.conf). Isn't this what RedHat/FC should do, so that the mounted root partition is the correct one - matching the booting kernel and the grub entry? > So, the long and short of it is that we don't support both [A and B] I think your product would be more reliable if you did(*), and it would be a simple modification to mkinitrd and mdadm to make it so. However, if you choose not to make it so, then I think you should document this gotcha. If as you say, the sysadmin has a responsibility to avoid this problem, then he/she should be made aware of that responsibility. It's certainly a surprising one. (*) it seems that debian's initrd makes efforts to ensure that the correct uuid is mounted as root. See, e.g.: http://www.mail-archive.com/debian-bugs-closed@lists.debian.org/msg84008.html http://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg227364.html http://lists.debian.org/debian-kernel/2005/03/msg00180.html This change to mdadm spec file adds mdassemble to the system: @@ -30,6 +30,7 @@ %build make CXFLAGS="$RPM_OPT_FLAGS" SYSCONFDIR="%{_sysconfdir}" mdadm make CXFLAGS="$RPM_OPT_FLAGS" SYSCONFDIR="%{_sysconfdir}" -C mdmpd mdmpd +make CXFLAGS="$RPM_OPT_FLAGS" SYSCONFDIR="%{_sysconfdir}" mdassemble %install make DESTDIR=$RPM_BUILD_ROOT MANDIR=%{_mandir} BINDIR=/sbin install @@ -40,6 +41,8 @@ mkdir -p -m 700 $RPM_BUILD_ROOT/var/run/mdmpd mkdir -p -m 700 $RPM_BUILD_ROOT/var/run/mdadm +install -D -m750 mdassemble $RPM_BUILD_ROOT/sbin/mdassemble + %clean [ $RPM_BUILD_ROOT != / ] && rm -rf $RPM_BUILD_ROOT @@ -75,6 +78,9 @@ %attr(0700,root,root) %dir /var/run/mdadm %changelog +* Mon Aug 28 2006 Charlie Brady <charlieb> 1.6.0-3sme01 +- Add mdassemble + And this patch to mkinitrd will use mdassemble instead of raidautorun in an initrd: @@ -702,8 +702,12 @@ if [ -n "$startraid" ]; then for dev in $raiddevices; do cp -a /dev/${dev} $MNTIMAGE/dev - echo "raidautorun /dev/${dev}" >> $RCFILE done + cp -a /sbin/mdassemble $MNTIMAGE/sbin + mkdir -p $MNTIMAGE/etc + echo DEVICE partitions > $MNTIMAGE/etc/mdadm.conf + mdadm --examine --scan | sed '/devices=/d' >> $MNTIMAGE/etc/mdadm.conf + echo "/sbin/mdassemble" >> $RCFILE fi if [ -z "$USE_UDEV" ]; then This mkinitrd patch actually works. The '/devices=/d' sed line in previous patch stripped rather too much of the config. I don't understand why we need mknod rather than static copies of the device node files, but it seems we do. @@ -705,9 +705,16 @@ if [ -n "$startraid" ]; then for dev in $raiddevices; do - cp -a /dev/${dev} $MNTIMAGE/dev - echo "raidautorun /dev/${dev}" >> $RCFILE + echo mknod /dev/${dev} b 9 $(echo $dev | sed s/md//) >> $RCFILE done + cp -a /sbin/mdassemble $MNTIMAGE/sbin + mkdir -p $MNTIMAGE/etc + echo DEVICE partitions > $MNTIMAGE/etc/mdadm.conf + mdadm --examine --scan | \ + sed -r \ + -e '/^ +devices=/d' \ + -e 's/ num-devices=[0-9]+//' >> $MNTIMAGE/etc/mdadm.conf + echo "/sbin/mdassemble" >> $RCFILE fi if [ -z "$USE_UDEV" ]; then |