Bug 523324
Summary: | RFE: lvm should better attempt to use whole devices when adding mirror legs | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Corey Marthaler <cmarthal> | ||||
Component: | lvm2 | Assignee: | Alasdair Kergon <agk> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.4 | CC: | agk, bgollahe, dwysocha, heinzm, iannis, jbrassow, mbroz, prockai | ||||
Target Milestone: | rc | Keywords: | FutureFeature | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | lvm2-2.02.88-1.el5 | Doc Type: | Enhancement | ||||
Doc Text: |
The updated allocation policy now better handles allocation of new segments for multiple segmented mirrors (mirrors which were repeatedly extended).
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-02-21 06:02:30 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Corey Marthaler
2009-09-14 21:21:33 UTC
Allocation code is hugely complex. I don't think it's wise to muddle around in it in the middle of a RHEL5 series. I'll take a quick look, and if it is an oversight, then I will fix it. However, if it requires allocation code changes, it would be much safer to leave this to a future release. (Probably 'devel_nack') Post the actual layouts before and after the changes you're concerned about. If you run with -vvvv you'll see sections like this in the output, which is what we need to explain how the code is behaving (grep for ^metadata): metadata/pv_map.c:49 Allowing allocation on /dev/loop0 start PE 3 length 21 metadata/pv_manip.c:272 /dev/loop0 0: 0 3: lvol0(0:0) metadata/pv_manip.c:272 /dev/loop0 1: 3 3: lvol1(0:0) metadata/pv_manip.c:272 /dev/loop0 2: 6 18: NULL(0:0) In this example, the final convert of mirror mB from 2-way to 3-way resulted in a leg made up of 3 segments, on only 2 disks. That doesn't make any sense at all. Why start on sdc1 at 375, switch to sdf1, and then go back to sdc1 at 500? I'd think the entire mimage should just be completely on sdc1. [mB_mimage_2] taft iwi-ao 1.46G /dev/sdc1(375) [mB_mimage_2] taft iwi-ao 1.46G /dev/sdf1(375) [mB_mimage_2] taft iwi-ao 1.46G /dev/sdc1(500) [root@taft-01 ~]# lvs -a -o +devices LV VG Attr LSize Log Copy% Devices mA taft mwi-a- 1.46G mA_mlog 100.00 mA_mimage_0(0),mA_mimage_1(0) [mA_mimage_0] taft iwi-ao 1.46G /dev/sdb1(0) [mA_mimage_0] taft iwi-ao 1.46G /dev/sdb1(250) [mA_mimage_0] taft iwi-ao 1.46G /dev/sdb1(500) [mA_mimage_1] taft iwi-ao 1.46G /dev/sdc1(0) [mA_mlog] taft lwi-ao 4.00M /dev/sde1(0) mB taft mwi-a- 1.46G mB_mlog 69.07 mB_mimage_0(0),mB_mimage_1(0) [mB_mimage_0] taft Iwi-ao 1.46G /dev/sdd1(0) [mB_mimage_1] taft Iwi-ao 1.46G /dev/sdb1(125) [mB_mimage_1] taft Iwi-ao 1.46G /dev/sdb1(375) [mB_mimage_1] taft Iwi-ao 1.46G /dev/sdb1(625) [mB_mlog] taft lwi-ao 4.00M /dev/sde1(1) [root@taft-01 ~]# lvconvert -m 2 taft/mA taft/mA: Converted: 21.3% taft/mA: Converted: 31.7% taft/mA: Converted: 55.2% taft/mA: Converted: 78.1% taft/mA: Converted: 85.1% taft/mA: Converted: 98.4% taft/mA: Converted: 100.0% Logical volume mA converted. [root@taft-01 ~]# lvs -a -o +devices LV VG Attr LSize Log Copy% Devices mA taft mwi-a- 1.46G mA_mlog 100.00 mA_mimage_0(0),mA_mimage_1(0),mA_mimage_2(0) [mA_mimage_0] taft iwi-ao 1.46G /dev/sdb1(0) [mA_mimage_0] taft iwi-ao 1.46G /dev/sdb1(250) [mA_mimage_0] taft iwi-ao 1.46G /dev/sdb1(500) [mA_mimage_1] taft iwi-ao 1.46G /dev/sdc1(0) [mA_mimage_2] taft iwi-ao 1.46G /dev/sdf1(0) [mA_mlog] taft lwi-ao 4.00M /dev/sde1(0) mB taft mwi-a- 1.46G mB_mlog 100.00 mB_mimage_0(0),mB_mimage_1(0) [mB_mimage_0] taft iwi-ao 1.46G /dev/sdd1(0) [mB_mimage_1] taft iwi-ao 1.46G /dev/sdb1(125) [mB_mimage_1] taft iwi-ao 1.46G /dev/sdb1(375) [mB_mimage_1] taft iwi-ao 1.46G /dev/sdb1(625) [mB_mlog] taft lwi-ao 4.00M /dev/sde1(1) [root@taft-01 ~]# lvconvert -m 2 taft/mB taft/mB: Converted: 18.7% taft/mB: Converted: 40.3% taft/mB: Converted: 65.9% taft/mB: Converted: 85.3% taft/mB: Converted: 100.0% Logical volume mB converted. [root@taft-01 ~]# lvs -a -o +devices LV VG Attr LSize Log Copy% Devices mA taft mwi-a- 1.46G mA_mlog 100.00 mA_mimage_0(0),mA_mimage_1(0),mA_mimage_2(0) [mA_mimage_0] taft iwi-ao 1.46G /dev/sdb1(0) [mA_mimage_0] taft iwi-ao 1.46G /dev/sdb1(250) [mA_mimage_0] taft iwi-ao 1.46G /dev/sdb1(500) [mA_mimage_1] taft iwi-ao 1.46G /dev/sdc1(0) [mA_mimage_2] taft iwi-ao 1.46G /dev/sdf1(0) [mA_mlog] taft lwi-ao 4.00M /dev/sde1(0) mB taft mwi-a- 1.46G mB_mlog 100.00 mB_mimage_0(0),mB_mimage_1(0),mB_mimage_2(0) [mB_mimage_0] taft iwi-ao 1.46G /dev/sdd1(0) [mB_mimage_1] taft iwi-ao 1.46G /dev/sdb1(125) [mB_mimage_1] taft iwi-ao 1.46G /dev/sdb1(375) [mB_mimage_1] taft iwi-ao 1.46G /dev/sdb1(625) [mB_mimage_2] taft iwi-ao 1.46G /dev/sdc1(375) [mB_mimage_2] taft iwi-ao 1.46G /dev/sdf1(375) [mB_mimage_2] taft iwi-ao 1.46G /dev/sdc1(500) [mB_mlog] taft lwi-ao 4.00M /dev/sde1(1) Created attachment 365577 [details]
verbose output from lvconvert cmd
This is the -vvvv from the lvconvert cmd that created the 3rd mimage
lvconvert -vvvv -m 2 taft/mB > out 2>&1
[mB_mimage_2] taft iwi-ao 1.46G /dev/sdc1(375)
[mB_mimage_2] taft iwi-ao 1.46G /dev/sdf1(375)
[mB_mimage_2] taft iwi-ao 1.46G /dev/sdc1(500)
The reason it's doing this is it's divided into 3 segments and it's doing these in order, taking the largest remaining each time, and that largest bounces between the two available disks, so you get c then f then c. To fix this, it needs to understand that it should preferentially allocate contiguously to the space it just allocated I suppose (even if that causes a new split). I'll think about it. This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release. I'm still struggling to specify the precise requirement here. Elsewhere, we have requests to maximise the use of available disks. We try policies contiguous, cling and normal in turn. Cling only takes account of space already part of the LV so doesn't apply in this example. But I'm wondering if we need to consider 'cling' within the 'normal' policy too, checking against devices used by not-yet-committed extents within the same allocation transaction. I checked in some code. I extended part of the cling policy to take effect within the normal policy. I also extended it to take account of already-reserved-but-not-yet-committed extents. This way, it can fill some parallel areas using the cling policy and the remaining ones using the (old) normal policy. I also adjusted some of the -vvvv log messages to make some of the decisions it's taking a bit clearer. There's an lvm.conf setting to (largely) disable the new behaviour, in case new strange cases are discovered. http://www.redhat.com/archives/lvm-devel/2011-February/msg00143.html This is still a work-in-progress as people find more configurations where the layout the tools select could be improved. Change to allocation policy need to be tested in upstream code, because there was not yet any upstream release including it, changes in 5.7 are too risky. Moving this to 5.8 timeframe (it will appear in rebased lvm2 code in 5.8). This functionality expected in rebased LVM Fixed in lvm2-2.02.88-1.el5. This appears to be fixed in the latest rpms based on one iteration of segmented mirror device failure. That said, I'm unable to run more than one iteration of this test due to bug 751135. 2.6.18-274.el5 lvm2-2.02.88-4.el5 BUILT: Wed Nov 16 09:40:55 CST 2011 lvm2-cluster-2.02.88-4.el5 BUILT: Wed Nov 16 09:46:51 CST 2011 device-mapper-1.02.67-2.el5 BUILT: Mon Oct 17 08:31:56 CDT 2011 device-mapper-event-1.02.67-2.el5 BUILT: Mon Oct 17 08:31:56 CDT 2011 cmirror-1.1.39-10.el5 BUILT: Wed Sep 8 16:32:05 CDT 2010 kmod-cmirror-0.1.22-3.el5 BUILT: Tue Dec 22 13:39:47 CST 2009 ================================================================================ Iteration 0.1 started at Mon Nov 28 18:24:01 CST 2011 ================================================================================ warning: mirrorA_mimage_0 is segmented, returning only the first device in this mimage warning: mirrorA_mimage_1 is segmented, returning only the first device in this mimage warning: mirrorA_mimage_2 is segmented, returning only the first device in this mimage Scenario kill_random_legs: Kill random legs ********* Mirror info for this scenario ********* * mirrors: mirrorA mirrorB * leg devices: /dev/sdb1 /dev/sdc1 /dev/sdd1 * log devices: /dev/sde1 * failpv(s): /dev/sdd1 * failnode(s): taft-01 taft-02 taft-03 taft-04 * leg fault policy: allocate * log fault policy: allocate ************************************************* Mirror Structure(s): LV Attr LSize Copy% Devices mirrorA mwi-ao 1.00G 100.00 mirrorA_mimage_0(0),mirrorA_mimage_1(0),mirrorA_mimage_2(0) [mirrorA_mimage_0] iwi-ao 1.00G /dev/sdb1(0) [mirrorA_mimage_0] iwi-ao 1.00G /dev/sdb1(250) [mirrorA_mimage_1] iwi-ao 1.00G /dev/sdc1(0) [mirrorA_mimage_1] iwi-ao 1.00G /dev/sdc1(250) [mirrorA_mimage_2] iwi-ao 1.00G /dev/sdd1(0) [mirrorA_mimage_2] iwi-ao 1.00G /dev/sdd1(250) [mirrorA_mlog] lwi-ao 4.00M /dev/sde1(0) mirrorB mwi-ao 1.00G 100.00 mirrorB_mimage_0(0),mirrorB_mimage_1(0),mirrorB_mimage_2(0) [mirrorB_mimage_0] iwi-ao 1.00G /dev/sdb1(125) [mirrorB_mimage_0] iwi-ao 1.00G /dev/sdb1(381) [mirrorB_mimage_1] iwi-ao 1.00G /dev/sdc1(125) [mirrorB_mimage_1] iwi-ao 1.00G /dev/sdc1(381) [mirrorB_mimage_2] iwi-ao 1.00G /dev/sdd1(125) [mirrorB_mimage_2] iwi-ao 1.00G /dev/sdd1(381) [mirrorB_mlog] lwi-ao 4.00M /dev/sde1(1) PV=/dev/sdd1 mirrorA_mimage_2: 3.1 mirrorA_mimage_2: 3.1 mirrorB_mimage_2: 3.1 mirrorB_mimage_2: 3.1 PV=/dev/sdd1 mirrorA_mimage_2: 3.1 mirrorA_mimage_2: 3.1 mirrorB_mimage_2: 3.1 mirrorB_mimage_2: 3.1 Writing verification files (checkit) to mirror(s) on... ---- taft-01 ---- ---- taft-02 ---- ---- taft-03 ---- ---- taft-04 ---- Sleeping 10 seconds to get some outsanding GFS I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- taft-01 ---- ---- taft-02 ---- ---- taft-03 ---- ---- taft-04 ---- Disabling device sdd on taft-01 Disabling device sdd on taft-02 Disabling device sdd on taft-03 Disabling device sdd on taft-04 Attempting I/O to cause mirror down conversion(s) on taft-01 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.387624 seconds, 108 MB/s 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.289258 seconds, 145 MB/s Verifying current sanity of lvm after the failure Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. Mirror Structure(s): Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. LV Attr LSize Copy% Devices mirrorA mwi-ao 1.00G 100.00 mirrorA_mimage_0(0),mirrorA_mimage_1(0),mirrorA_mimage_2(0) [mirrorA_mimage_0] iwi-ao 1.00G /dev/sdb1(0) [mirrorA_mimage_0] iwi-ao 1.00G /dev/sdb1(250) [mirrorA_mimage_1] iwi-ao 1.00G /dev/sdc1(0) [mirrorA_mimage_1] iwi-ao 1.00G /dev/sdc1(250) [mirrorA_mimage_2] iwi-ao 1.00G /dev/sdf1(0) [mirrorA_mlog] lwi-ao 4.00M /dev/sde1(0) mirrorB cwi-ao 1.00G 100.00 mirrorB_mimagetmp_2(0),mirrorB_mimage_2(0) [mirrorB_mimage_0] iwi-ao 1.00G /dev/sdb1(125) [mirrorB_mimage_0] iwi-ao 1.00G /dev/sdb1(381) [mirrorB_mimage_1] iwi-ao 1.00G /dev/sdc1(125) [mirrorB_mimage_1] iwi-ao 1.00G /dev/sdc1(381) [mirrorB_mimage_2] iwi-ao 1.00G /dev/sdf1(256) [mirrorB_mimagetmp_2] mwi-ao 1.00G 100.00 mirrorB_mimage_0(0),mirrorB_mimage_1(0) [mirrorB_mlog] lwi-ao 4.00M /dev/sde1(1) Verify that each of the mirror repairs finished successfully Verifying FAILED device /dev/sdd1 is *NOT* in the volume(s) Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. olog: 1 Verifying LOG device(s) /dev/sde1 *ARE* in the mirror(s) Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. Verifying LEG device /dev/sdb1 *IS* in the volume(s) Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. Verifying LEG device /dev/sdc1 *IS* in the volume(s) Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. verify the newly allocated dm devices were added as a result of the failures Checking EXISTENCE of mirrorA_mimage_2 on: taft-01 taft-02 taft-03 taft-04 Checking EXISTENCE of mirrorA_mimage_2 on: taft-01 taft-02 taft-03 taft-04 Checking EXISTENCE of mirrorB_mimage_2 on: taft-01 taft-02 taft-03 taft-04 Checking EXISTENCE of mirrorB_mimage_2 on: taft-01 taft-02 taft-03 taft-04 Verify that the mirror image order remains the same after the down conversion Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. warning: mirrorA_mimage_0 is segmented, returning only the first device in this mimage Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. warning: mirrorA_mimage_1 is segmented, returning only the first device in this mimage Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. warning: mirrorB_mimage_0 is segmented, returning only the first device in this mimage Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. warning: mirrorB_mimage_1 is segmented, returning only the first device in this mimage Couldn't find device with uuid zhgtC4-B7pd-z7vz-N7VN-glzU-uvUz-DZAmcX. Verifying files (checkit) on mirror(s) on... ---- taft-01 ---- ---- taft-02 ---- ---- taft-03 ---- ---- taft-04 ---- Enabling device sdd on taft-01 Enabling device sdd on taft-02 Enabling device sdd on taft-03 Enabling device sdd on taft-04 WARNING: Inconsistent metadata found for VG taft - updating to use version 23 Recreating PVs /dev/sdd1 WARNING: Volume group taft is not consistent Writing physical volume data to disk "/dev/sdd1" Extending the recreated PVs back into VG taft Waiting until all mirrors become fully syncd... 2/2 mirror(s) are fully synced: ( 100.00% 100.00% ) Verifying files (checkit) on mirror(s) on... ---- taft-01 ---- ---- taft-02 ---- ---- taft-03 ---- ---- taft-04 ---- Stopping the io load (collie/xdoio) on mirror(s) Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The updated allocation policy now better handles allocation of new segments for multiple segmented mirrors (mirrors which were repeatedly extended). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0161.html |