Description of problem: It appears that newly allocated mimages start at the greatest int that's open, whether or not that place was just one of the images failed. Could we have them start at the next greatest int not previously in the mirror? so if we have a mirror with: mimage_0 mimage_1 mimage_2 and we fail mimage_2, shouldn't logically the next image be mimage_3? If we fail mimage_1, in that case, we don't use _1 for the newly allocated spot, we'd use _3. The current way makes automating lvm mirror failure testing way more difficult because now I not only have to know how many legs I randomly failed, but also, what place they were in and then figure out which goofy way lvm will rebuild the mirror before I can verify what image(s) should have left, and which new one(s) have appeared. Version-Release number of selected component (if applicable): 2.6.18-160.el5 lvm2-2.02.56-2.el5 BUILT: Thu Dec 10 09:38:13 CST 2009 lvm2-cluster-2.02.56-2.el5 BUILT: Thu Dec 10 09:38:41 CST 2009 device-mapper-1.02.39-1.el5 BUILT: Wed Nov 11 12:31:44 CST 2009 cmirror-1.1.39-2.el5 BUILT: Mon Jul 27 15:39:05 CDT 2009 kmod-cmirror-0.1.22-1.el5 BUILT: Mon Jul 27 15:28:46 CDT 2009
After researching this some more, it turns out the allocate new mimage logic is inconsistent, which makes testing (and especially automated testing) super difficult. If you have a 2-way mirror like so: mimage_0 sda1 mimage_1 sdb1 and you fail the primary leg (sda1) with the allocate policy, you'll end up with the following: mimage_0 sdb1 mimage_1 sdn1 If you have a 3-way mirror like so: mimage_0 sda1 mimage_1 sdb1 mimage_2 sdc1 and you fail the primary leg (sda1) with the allocate policy, you'll end up with the following: mimage_1 sdb1 mimage_2 sdc1 mimage_3 sdn1 Then add in the multiple device failure scenarios listed in comment #0, and you can see why I'm pulling my hair out trying to write the logic to verify that after each type of mirror failure in the matrix of possibilities, the correct mimages and devices are removed and the correct ones are added.
Corey, would it be possible to track the devices instead of the mimage numbers? What currently happens is this: upon failure (or any downconversion), the mimages are shifted, so that the ones to be removed are at the end. Then the end is removed and possibly replaced with new images. The numbering of mimage LVs is done (apparently) from left to right. But the second case looks odd indeed, but so far I got lost trying to understand the code that allocates these numbers. If it went like 0 -> sdb1, 2 -> sdc1 and 3 -> sdn1 that would be reasonable, right (looking at the 2way for reference).
Petr we do also track the actual devices. We attempt to track all devices and mimages both before and after a failure to ensure that everything is where it's expected. If it did end up going to "0 -> sdb1, 2 -> sdc1 and 3 -> sdn1" then yes, that would be reasonable, assuming all types of mirrors behave like that because it would be consistent. However I think that may still get confusing depending on the number of legs that got failed as you'd have leg images getting shuffled all over. I think the best bet is to add them to the end. so if you have the following: mimage_0 sdb1 mimage_1 sdc1 mimage_2 sdd1 mimage_3 sde1 and you fail sdc1 and sde1, then the new allocated spots should go to mimage_0 sdb1 mimage_2 sdd1 mimage_4 sdf1 mimage_5 sdg1 * Note how the new image starts at 4, and not 3 even though 3 was failed.
Ah, the reason 2-leg mirrors appear different is because they only have one leg remaining after the device failure. I just learned that n-way mirrors behave the same way when enough legs are failed to leave only one remaining. What happens is the remaining device gets shuffled to mimage_0 (the new primary leg) and all others that were failed and reallocated, get incrementally added to that. So you end up with the exact same images that you had before the failure. Leg failures that don't result in all but one leg remaining either get added to mimage_n (if the last leg was failed) or mimage_n+1 (if the last leg was not failed). So it's still odd, but a little more understandable then I had originally thought. Now we just need each customer to figure this out as well and nothing will appear random or crazy. :)
This should go into the man page.
Since it is too late to address this issue in RHEL 5.5, it has been proposed for RHEL 5.6. Contact your support representative if you need to escalate this issue.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.