Description of problem: I added some new drives in raid 5 with LVM on top and then got bit by this bug: http://bugzilla.kernel.org/show_bug.cgi?id=7590 In order to get some semblance of performance (3X extremely slow is still extremely slow) I disabled the drive on the slow port. I then had a system hang while copying stuff from one drive to another (different partition, but the drives are the same as the ones used in the soft raid array). Upon reboot (FC5 system) I recieved several errors and a kernel panic (can't scroll back to see what really happened). Only meaningful thing I can gather is it can't access root, but I don't know why. I reboot with the FC6 live cd and see the problem: raid5: cannot start dirty degraded array for md1 However the device mapper still seems to find my lv's. I know that I need to force the assemble when the system is up to get it working again. However, I can't because the device mapper has taken sda2 (the first of the 4 partitions in the raid set) to be the pv for my vg in liu of the non-existant md1. This partition is of type fd (definately not the 8e it would be if it was LVM). I can't free up the partition becauause doing a pvremove would probably really mess up the data on that partition and it's already dirty and degraded. I surmise what's going on is the lvm metadata happens to fit in one block (chunk size=64k) so the partition looks like it's a lvm pv if you ignore the partition type. Internet searching does not yield a good way to disable the functionality. Interactive startup is not interactive until after the LVM autodetect. Problem is later worked around by starting up without the first drive present and then plugging it in after the system is up and then getting it to register with echo "scsi add-single-device 3 0 0 0" >/proc/scsi/scsi (my primary controller doesn't support hot-plug so I had to swap drives around) Later research finds I might've avoided some of the mess with: md-mod.start_dirty_degraded=1 but I don't think that this method of operation should be desirable. If I had the same problem with the fc5 that is installed on there I could've really messed stuff up before I noticed. Version-Release number of selected component (if applicable): All versions are what comes on the live cd: http://fedoraproject.org/wiki/FedoraLiveCD How reproducible: Haven't attempted reproduction yet (still trying to rebuild my array at 2MB/sec) Steps to Reproduce: 1. Create a raid 5 (maybe also 0 or 4) array with a fairly large chunk sized (I used 64) and put lvm on top of it. Partitions should be of raid type. 2. Find some way to disable md auto assemble (degrade it and then try mounting it while dirty is what I did) as long as the first disk is intact. Actual results: /sbin/lvm.static vgchange -a y --ignorelockingfailure that is run at boot by rc.sysinit pretends nothing is wrong with using the first partition of your array as the pv instead of the raid device. The closest thing to an error is a message about the sizes seeming to not match up. Expected results: It should notice that the partition is not of type lvm and skip it or at least release the resources when it discovers the size doesn't match. I understand that if using a raw drive (bad idea, but whatever) or loopback/md/etc device there is no partition table so some deeper checking might be required, but if the partition type is incorrect or the size is wrong, it should require an override (perhaps on the kernel command line). Additional info: Calling it high severity because it *could* lead to loss of data.
As far as i recall, running lvm on top of md is not recommended (although i have no idea if this is mentioned anywhere in the documentation). You should always blacklist md-participating devices in your lvm.conf to avoid lvm picking them up. I don't think partition type check is sufficient or even a good idea, since someone issuing pvcreate /dev/foo3, where foo3 previously held say ext3 and was of type Linux, usually does not bother to fdisk the drive and flip partition types, yet he probably does not want next boot to fail due to incomplete volume group (since foo3 is excluded, being of non-lvm partition type). What -could- work is adding md metadata detection and refusing to use devices that have md signature on them (maybe unless forced). Thoughts, comments?
There is already md component detection, see lvm.conf # By default, LVM2 will ignore devices used as components of # software RAID (md) devices by looking for md superblocks. # 1 enables; 0 disables. md_component_detection = 1
Confirming that the md component detection works as it should. [root@dhcp-lab-168 ~]# mdadm --assemble /dev/md0 mdadm: /dev/md0 has been started with 1 drive (out of 2). [root@dhcp-lab-168 ~]# pvscan PV /dev/md0 VG vg0 lvm2 [5.99 GB / 5.95 GB free] Total: 1 [5.99 GB] / in use: 1 [5.99 GB] / in no VG: 0 [0 ] [root@dhcp-lab-168 ~]# mdadm --stop /dev/md0 mdadm: stopped /dev/md0 [root@dhcp-lab-168 ~]# pvscan No matching physical volumes found [root@dhcp-lab-168 ~]# vgchange -a y No volume groups found Let me try to put root partition on an lv atop md to see if initrd adds some confusion to the equation.
I do not recall any specific warnings against this, although that doesn't mean there weren't any as I might not have read it or remembered because I originally set up the LVM years ago (although originally with the plans of easy moving to raid later) and only set up the raid drives in January. I do crazy things anyway, like run raid on top of raid. Though that made sense at the time too, two 8.4GB 5400rpm drives striped together and then mirrored against a 7200rpm 20gb drive. My lvm.conf contains md_component_detection = 1 as does the one in my initrd (which brings me to another issue, the mkinitrd in fc5 doesn't seem to figure out to use raid456 for raid5, but I just pulled in the fc6 version). I haven't had the chance to check out what the fc6 live cd has, perhaps that's where the fault lies.
That sounds possible, i have checked the codepaths in lvm2 and they look all right, the test above worked as expected, so i am closing this as WORSKFORME, if you run into the problem again, please check if md_component_detection in the relevant configuration file is turned on and if so, reopen this report. Thanks.