Bug 799048 - Volume group inconsistency issues are causing mirror conversion to fail after leg failure
Volume group inconsistency issues are causing mirror conversion to fail after...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.3
x86_64 Linux
urgent Severity urgent
: rc
: ---
Assigned To: Petr Rockai
Cluster QE
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-01 11:29 EST by Corey Marthaler
Modified: 2012-05-08 16:27 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-08 16:27:00 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2012-03-01 11:29:32 EST
Description of problem:
Nenad Peric came across this issue while running device failure regression tests.


Scenario kill_non_primary_leg: Kill non primary leg

********* Mirror info for this scenario *********
* mirrors:            mirror_1
* leg devices:        /dev/sde1 /dev/sdh1 /dev/sdg1
* log devices:        /dev/sde1
* failpv(s):          /dev/sdh1
* failnode(s):        taft-01
* leg fault policy:   allocate
* log fault policy:   allocate
*************************************************

Mirror Structure(s):
  LV                  Attr     LSize  Copy%  Devices
  mirror_1            mwi-aom-  2.00g 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0)
  [mirror_1_mimage_0] iwi-aom-  2.00g        /dev/sde1(0)
  [mirror_1_mimage_1] iwi-aom-  2.00g        /dev/sdh1(0)
  [mirror_1_mimage_2] iwi-aom-  2.00g        /dev/sdg1(0)
  [mirror_1_mlog]     lwi-aom-  4.00m        /dev/sde1(512)

PVS IN VG: /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1

PV=/dev/sdh1
        mirror_1_mimage_1: 4.1
PV=/dev/sdh1
        mirror_1_mimage_1: 4.1

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----

Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----

Disabling device sdh on taft-01

Attempting I/O to cause mirror down conversion(s) on taft-01
10+0 records in
10+0 records out

41943040 bytes (42 MB) copied, 0.274829 s, 153 MB/s

Verifying current sanity of lvm after the failure

Mirror Structure(s):
  Couldn't find device with uuid 76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo.
  LV                  Attr     LSize  Copy%  Devices                                  
  mirror_1            mwi-aom-  2.00g 100.00 mirror_1_mimage_0(0),mirror_1_mimage_2(0)
  [mirror_1_mimage_0] iwi-aom-  2.00g        /dev/sde1(0)                             
  [mirror_1_mimage_2] iwi-aom-  2.00g        /dev/sdg1(0)                             
  [mirror_1_mlog]     lwi-aom-  4.00m        /dev/sde1(512)                           

Verify that each of the mirror repairs finished successfully

Verifying FAILED device /dev/sdh1 is *NOT* in the volume(s)
  Couldn't find device with uuid 76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo.
olog: 1
Verifying LOG device(s) /dev/sde1 *ARE* in the mirror(s)
  Couldn't find device with uuid 76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo.
Verifying LEG device /dev/sde1 *IS* in the volume(s)
  Couldn't find device with uuid 76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo.
Verifying LEG device /dev/sdg1 *IS* in the volume(s)
  Couldn't find device with uuid 76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo.
verify the dm devices associated with /dev/sdh1 have been removed as expected
Checking REMOVAL of mirror_1_mimage_1 on:  taft-01
verify the newly allocated dm devices were added as a result of the failures
Checking EXISTENCE of mirror_1_mimage_3 on:  taft-01
mirror_1_mimage_3 on taft-01 should now exist

Here's where lvm attempts the up conversion
Mar  1 10:01:47 taft-01 lvm[3060]: Volume group revolution_9 metadata is inconsistent
Mar  1 10:01:47 taft-01 lvm[3060]: Monitoring mirror device revolution_9-mirror_1 for events.
Mar  1 10:01:47 taft-01 lvm[3060]: Volume group revolution_9 metadata is inconsistent
Mar  1 10:01:47 taft-01 lvm[3060]: Monitoring mirror device revolution_9-mirror_1 for events.
Mar  1 10:01:47 taft-01 lvm[3060]: Mirror status: 1 of 3 images failed.
Mar  1 10:01:47 taft-01 lvm[3060]: Trying to up-convert to 3 images, 1 logs.
Mar  1 10:01:48 taft-01 lvm[3060]: Insufficient suitable allocatable extents for logical volume : 512 more required
Mar  1 10:01:48 taft-01 lvm[3060]: Unable to allocate extents for mirror(s).
Mar  1 10:01:48 taft-01 lvm[3060]: Trying to up-convert to 2 images, 1 logs.
Mar  1 10:01:48 taft-01 qarshd[3516]: Running cmdline: dd if=/dev/zero of=/mnt/mirror_1/ddfile count=10 bs=4M
Mar  1 10:01:48 taft-01 lvm[3060]: Volume group revolution_9 metadata is inconsistent
Mar  1 10:01:48 taft-01 lvm[3060]: Monitoring mirror device revolution_9-mirror_1 for events.
Mar  1 10:01:48 taft-01 lvm[3060]: WARNING: Failed to replace 1 of 3 images in volume mirror_1
Mar  1 10:01:48 taft-01 lvm[3060]: Repair of mirrored device revolution_9-mirror_1 finished successfully.
Mar  1 10:01:48 taft-01 lvm[3060]: revolution_9-mirror_1 is now in-sync.
Mar  1 10:01:48 taft-01 lvm[3060]: revolution_9-mirror_1 is now in-sync.


# I attempted the same thing by hand and it failed for me as well:
[root@taft-01 ~]# lvs -a -o +devices
  Couldn't find device with uuid 76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo.
  LV                  VG           Attr     LSize  Log           Copy%  Devices
  mirror_1            revolution_9 mwi-aom-  2.00g mirror_1_mlog 100.00 mirror_1_mimage_0(0),mirror_1_mimage_2(0)
  [mirror_1_mimage_0] revolution_9 iwi-aom-  2.00g                      /dev/sde1(0)
  [mirror_1_mimage_2] revolution_9 iwi-aom-  2.00g                      /dev/sdg1(0)
  [mirror_1_mlog]     revolution_9 lwi-aom-  4.00m                      /dev/sde1(512)

[root@taft-01 ~]# pvscan
  Couldn't find device with uuid 76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo.
  PV unknown device   VG revolution_9   lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sdg1        VG revolution_9   lvm2 [135.66 GiB / 133.66 GiB free]
  PV /dev/sdf1        VG revolution_9   lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sde1        VG revolution_9   lvm2 [135.66 GiB / 133.66 GiB free]
  PV /dev/sdd1        VG revolution_9   lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sdc1        VG revolution_9   lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sda2        VG vg_taft01      lvm2 [67.75 GiB / 0    free]
  Total: 7 [881.73 GiB] / in use: 7 [881.73 GiB] / in no VG: 0 [0   ]

[root@taft-01 ~]# lvconvert -m 2 revolution_9/mirror_1
  Couldn't find device with uuid 76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo.
  Cannot change VG revolution_9 while PVs are missing.
  Consider vgreduce --removemissing.

[root@taft-01 ~]# vgreduce --removemissing revolution_9
  Couldn't find device with uuid 76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo.
  Wrote out consistent volume group revolution_9

[root@taft-01 ~]# lvconvert -m 2 revolution_9/mirror_1
  Couldn't find device with uuid 76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo.
  WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 19
  Removing PV unknown device (76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo) that no longer belongs to VG revolution_9
  unknown device: Couldn't get size.
  Volume group "revolution_9" not found

# WHERE DID THE PREVIOUS DEVICES IN THE VG GO!?! These missing devices had been failed in a previous iteration.
[root@taft-01 ~]# pvscan
  Couldn't find device with uuid 76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo.
  WARNING: Volume Group revolution_9 is not consistent
  PV /dev/sdg1   VG revolution_9   lvm2 [135.66 GiB / 133.66 GiB free]
  PV /dev/sde1   VG revolution_9   lvm2 [135.66 GiB / 133.66 GiB free]
  PV /dev/sda2   VG vg_taft01      lvm2 [67.75 GiB / 0    free]
  Total: 3 [339.08 GiB] / in use: 3 [339.08 GiB] / in no VG: 0 [0   ]

[root@taft-01 ~]# lvs -a -o +devices
  Couldn't find device with uuid 76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo.
  WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 20
  Removing PV unknown device (76Japu-bkd7-ec1J-3q8W-Wa4g-P4mJ-B5JcTo) that no longer belongs to VG revolution_9
  unknown device: Couldn't get size.
  Recovery of volume group "revolution_9" failed.
  Skipping volume group revolution_9
  LV      VG        Attr     LSize  Pool Origin Data%  Move Log Copy%  Convert Devices


Version-Release number of selected component (if applicable):
2.6.32-236.el6.x86_64

lvm2-2.02.94-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
lvm2-libs-2.02.94-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
lvm2-cluster-2.02.94-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.73-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
device-mapper-libs-1.02.73-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
device-mapper-event-1.02.73-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
device-mapper-event-libs-1.02.73-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
cmirror-2.02.94-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012


How reproducible:
Most of the time
Comment 1 Alasdair Kergon 2012-03-02 19:52:42 EST
So, I'm about to release 2.02.94 proper.
Then assuming the problem is still present, we'll try to reproduce a simpler version of it and debug it.
Comment 2 Alasdair Kergon 2012-03-03 11:53:38 EST
I can't explain those test results, I'm afraid.

I've tried various things with 2.02.94 here and have failed to reproduce any unexpected behaviour in these parts of the code.

(This is non-cmirror isn't it?  And without lvmetad running?)
Comment 3 Alasdair Kergon 2012-03-05 12:07:26 EST
Please check whether or not this is still present in 2.02.94-final.
Comment 4 Corey Marthaler 2012-03-05 18:28:32 EST
I'll attempt this again once the 2.02.94 build it out, but I'm still able to
reproduce this with the latest pre release builds. This is w/o cmirror and w/o
lvmetad.

2.6.32-236.el6.x86_64
lvm2-2.02.95-0.67.el6    BUILT: Mon Mar  5 08:19:07 CST 2012
lvm2-libs-2.02.95-0.67.el6    BUILT: Mon Mar  5 08:19:07 CST 2012
lvm2-cluster-2.02.95-0.67.el6    BUILT: Mon Mar  5 08:19:07 CST 2012
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.74-0.67.el6    BUILT: Mon Mar  5 08:19:07 CST 2012
device-mapper-libs-1.02.74-0.67.el6    BUILT: Mon Mar  5 08:19:07 CST 2012
device-mapper-event-1.02.74-0.67.el6    BUILT: Mon Mar  5 08:19:07 CST 2012
device-mapper-event-libs-1.02.74-0.67.el6    BUILT: Mon Mar  5 08:19:07 CST
2012
cmirror-2.02.95-0.67.el6    BUILT: Mon Mar  5 08:19:07 CST 2012

taft-01 lvm[4466]: Volume group revolution_9 metadata is inconsistent
taft-01 lvm[4466]: Monitoring mirror device revolution_9-mirror_1 for events.
taft-01 xinetd[1649]: EXIT: qarsh status=0 pid=5186 duration=1(sec)
taft-01 lvm[4466]: Volume group revolution_9 metadata is inconsistent
taft-01 lvm[4466]: Monitoring mirror device revolution_9-mirror_1 for events.
taft-01 lvm[4466]: Mirror status: 1 of 3 images failed.
taft-01 lvm[4466]: Trying to up-convert to 3 images, 1 logs.
taft-01 lvm[4466]: Insufficient suitable allocatable extents for logical volume
: 512 more required
taft-01 lvm[4466]: Unable to allocate extents for mirror(s).
taft-01 lvm[4466]: Trying to up-convert to 2 images, 1 logs.
taft-01 lvm[4466]: Volume group revolution_9 metadata is inconsistent
taft-01 lvm[4466]: Monitoring mirror device revolution_9-mirror_1 for events.
taft-01 lvm[4466]: WARNING: Failed to replace 1 of 3 images in volume mirror_1
taft-01 lvm[4466]: Repair of mirrored device revolution_9-mirror_1 finished
successfully.
taft-01 lvm[4466]: revolution_9-mirror_1 is now in-sync.
taft-01 lvm[4466]: revolution_9-mirror_1 is now in-sync.


[root@taft-01 tmp]# pvscan
  Couldn't find device with uuid oyeCfu-wDRe-kIhe-YJLx-bolq-e1oQ-IVJCMU.
  PV /dev/sdh1        VG revolution_9   lvm2 [135.66 GiB / 133.66 GiB free]
  PV unknown device   VG revolution_9   lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sdf1        VG revolution_9   lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sde1        VG revolution_9   lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sdd1        VG revolution_9   lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sdc1        VG revolution_9   lvm2 [135.66 GiB / 133.66 GiB free]
  PV /dev/sda2        VG vg_taft01      lvm2 [67.75 GiB / 0    free]
  Total: 7 [881.73 GiB] / in use: 7 [881.73 GiB] / in no VG: 0 [0   ]

[root@taft-01 tmp]# lvs -a -o +devices
Couldn't find device with uuid oyeCfu-wDRe-kIhe-YJLx-bolq-e1oQ-IVJCMU.
LV                  VG           Attr     LSize  Log           Copy%  Devices
mirror_1            revolution_9 mwi-aom-  2.00g mirror_1_mlog 100.00
mirror_1_mimage_0(0),mirror_1_mimage_2(0)
[mirror_1_mimage_0] revolution_9 iwi-aom-  2.00g                     
/dev/sdc1(0)
[mirror_1_mimage_2] revolution_9 iwi-aom-  2.00g                     
/dev/sdh1(0)
[mirror_1_mlog]     revolution_9 lwi-aom-  4.00m                     
/dev/sdf1(0)

[root@taft-01 tmp]# lvconvert -m 2 revolution_9/mirror_1
  Couldn't find device with uuid oyeCfu-wDRe-kIhe-YJLx-bolq-e1oQ-IVJCMU.
  Cannot change VG revolution_9 while PVs are missing.
  Consider vgreduce --removemissing.

[root@taft-01 tmp]#  vgreduce --removemissing  revolution_9
  Couldn't find device with uuid oyeCfu-wDRe-kIhe-YJLx-bolq-e1oQ-IVJCMU.
  Wrote out consistent volume group revolution_9

[root@taft-01 tmp]#  lvconvert -m 2 revolution_9/mirror_1
  Couldn't find device with uuid oyeCfu-wDRe-kIhe-YJLx-bolq-e1oQ-IVJCMU.
  WARNING: Inconsistent metadata found for VG revolution_9 - updating to use
version 28
  Removing PV unknown device (oyeCfu-wDRe-kIhe-YJLx-bolq-e1oQ-IVJCMU) that no
longer belongs to VG revolution_9
  unknown device: Couldn't get size.
  Volume group "revolution_9" not found
Comment 5 Jonathan Earl Brassow 2012-03-14 14:24:05 EDT
I cannot reproduce this.  Can you try again with the latest kernel/RPMs?  It would also be helpful to know what LVM commands would produce the error (i.e. how do I reproduce without the test-suite.)
Comment 6 Corey Marthaler 2012-03-15 17:09:44 EDT
Both Nenad and myself are still able to reproduce this with the latest rpms/kernel.

2.6.32-251.el6.x86_64
lvm2-2.02.95-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
lvm2-libs-2.02.95-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
lvm2-cluster-2.02.95-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.74-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
device-mapper-libs-1.02.74-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
device-mapper-event-1.02.74-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
device-mapper-event-libs-1.02.74-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
cmirror-2.02.95-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012

This doesn't happen everytime, so your best bet at reproducing it is to run the test suite.
./revolution_9 -l /home/msp/cmarthal/work/sts/sts-root -r /usr/tests/sts-rhel6.3 -o taft-01 -i 10 -F -e kill_non_primary_leg

Otherwise the "manual" way to repo this would be to create a 3-way mirror, fail a single device, watch allocation work once the mirror is written to, remove the mirror, and repeat until you see the VG get into a state where allocation will no longer work.


Mar 15 15:26:47 taft-01 lvm[1250]: Volume group revolution_9 metadata is inconsistent
Mar 15 15:26:47 taft-01 lvm[1250]: Monitoring mirror device revolution_9-mirror_1 for events.
Mar 15 15:26:47 taft-01 xinetd[1851]: EXIT: qarsh status=0 pid=4241 duration=1(sec)
Mar 15 15:26:48 taft-01 lvm[1250]: Volume group revolution_9 metadata is inconsistent
Mar 15 15:26:48 taft-01 lvm[1250]: Monitoring mirror device revolution_9-mirror_1 for events.
Mar 15 15:26:48 taft-01 lvm[1250]: Mirror status: 1 of 3 images failed.
Mar 15 15:26:48 taft-01 lvm[1250]: Trying to up-convert to 3 images, 1 logs.
Mar 15 15:26:48 taft-01 lvm[1250]: Insufficient suitable allocatable extents for logical volume : 512 more required
Mar 15 15:26:48 taft-01 lvm[1250]: Unable to allocate extents for mirror(s).
Mar 15 15:26:48 taft-01 lvm[1250]: Trying to up-convert to 2 images, 1 logs.
Mar 15 15:26:49 taft-01 lvm[1250]: Volume group revolution_9 metadata is inconsistent
Mar 15 15:26:49 taft-01 lvm[1250]: Monitoring mirror device revolution_9-mirror_1 for events.
Mar 15 15:26:49 taft-01 lvm[1250]: WARNING: Failed to replace 1 of 3 images in volume mirror_1
Mar 15 15:26:49 taft-01 lvm[1250]: Repair of mirrored device revolution_9-mirror_1 finished successfully.
Mar 15 15:26:49 taft-01 lvm[1250]: revolution_9-mirror_1 is now in-sync.
Mar 15 15:26:49 taft-01 lvm[1250]: revolution_9-mirror_1 is now in-sync.
Comment 7 Petr Rockai 2012-04-22 21:26:53 EDT
Ok, I have spent a while looking into this, and I can report these findings:
1) I cannot for the world of it reproduce the problem locally by just repeatedly having a mirror repaired in a VG
2) It turns out that when I run revolution_9 on taft, the PVs that are disabled by the test stay in a "missing" state (at least *sometimes*, but it seems to me that this happens reliably, every time), even after they are re-enabled
3) The above failure happens when we actually run out of PVs to use, and hence is not a "mirror repair" bug per se.

In this regard, doing a vgextend --restoremissing on the re-enabled device (immediately after enabling it, eg.) will fix the problem. As far as I can tell, this is not a bug at all (at least not as far as LVM is concerned), unless we want PVs to be automatically re-added to VGs when they have dropped out temporarily (but long enough to be kicked out of mirrors). I think this was never a requirement, more to the contrary: we want replacement of PVs to be manually confirmed (hence, vgextend --restoremissing).

So the two options regarding this bug:
1) Put a "vgextend --restoremissing VG PV" just after "Enabling device PV on localhost" happens in STS, to restore the PV for use
2) Change LVM to do this automatically

For various reasons, I don't like option 2, which comes with non-trivial regression risks *and* I think it's a bad policy (it could complicate post-failure analysis, and possibly even damage data, in real-world failure scenarios).

Opinions?
Comment 8 Alasdair Kergon 2012-04-23 07:14:42 EDT
I also prefer the status quo here: let's go down the route of adjusting the test to deal with this situation when it arises.
Comment 10 Corey Marthaler 2012-05-08 16:22:01 EDT
I will attempt to change the test(s) to deal with this issue.

Note You need to log in before you can comment on or make changes to this bug.