Bug 606528 - lvm device scan needed in order for re-enabled mirror device to show up
lvm device scan needed in order for re-enabled mirror device to show up
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.0
All Linux
medium Severity high
: rc
: ---
Assigned To: Petr Rockai
Corey Marthaler
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-21 16:57 EDT by Corey Marthaler
Modified: 2010-11-10 16:08 EST (History)
9 users (show)

See Also:
Fixed In Version: lvm2-2.02.72-2.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-10 16:08:07 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/etc/lvm/backup/helter_skelter (3.71 KB, text/plain)
2010-06-25 17:52 EDT, Corey Marthaler
no flags Details
/etc/lvm/archive/helter_skelter_00000.vg (2.29 KB, text/plain)
2010-06-25 17:52 EDT, Corey Marthaler
no flags Details
/etc/lvm/archive/helter_skelter_00001.vg (2.31 KB, text/plain)
2010-06-25 17:53 EDT, Corey Marthaler
no flags Details
/etc/lvm/archive/helter_skelter_00002.vg (3.73 KB, text/plain)
2010-06-25 17:54 EDT, Corey Marthaler
no flags Details
lvconvert -vvvv (74.90 KB, text/plain)
2010-06-30 12:04 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2010-06-21 16:57:29 EDT
Description of problem:
After a successful device failure and mirror repair, helter_skelter re-enables the failed device(s) and then assumes it's automatically added back into the VG before using them in an upconvert. This assumption appears to have regressed. Now lvm realizes they've returned but doesn't allow them to be used again until a pvscan is run. This used to be the case in early RHEL5, but then that changed later in RHEL5, and now appears to have changed again making it difficult to keep tests up to date.

[root@taft-01 ~]# lvscan
  WARNING: Inconsistent metadata found for VG helter_skelter - updating to use version 8
  Missing device /dev/sdh1 reappeared, updating metadata for VG helter_skelter to version 8.
  Missing device /dev/sdg1 reappeared, updating metadata for VG helter_skelter to version 8.
  ACTIVE            '/dev/helter_skelter/syncd_primary_log_2legs_1' [600.00 MiB] inherit
  ACTIVE            '/dev/vg_taft01/lv_root' [32.30 GiB] inherit
  ACTIVE            '/dev/vg_taft01/lv_home' [25.62 GiB] inherit
  ACTIVE            '/dev/vg_taft01/lv_swap' [9.83 GiB] inherit

Version-Release number of selected component (if applicable):
Version 2.02.67+ (custom build by Brassow)

How reproducible:
Everytime
Comment 2 Petr Rockai 2010-06-23 10:53:14 EDT
Actually, I am not sure. Treatment of inconsistent metadata should be the same as it used to be in later RHEL5 -- at least I don't remember changing any of this. We have a rudimentary test for this that is not catching any change in behaviour in past 12 months, but maybe we need to augment the test to cover some further scenarios.

I'll look into reproducing the problem. Corey, can you maybe provide the metadata from the system after the repair but before the inconsistent metadata is corrected? Thanks! (It should be available in the metadata backup directory.)
Comment 3 Corey Marthaler 2010-06-25 17:50:46 EDT
Posting everything found in /etc/lvm/[backup|cache], before a pvscan.
Comment 4 Corey Marthaler 2010-06-25 17:52:04 EDT
Created attachment 427002 [details]
/etc/lvm/backup/helter_skelter
Comment 5 Corey Marthaler 2010-06-25 17:52:59 EDT
Created attachment 427004 [details]
/etc/lvm/archive/helter_skelter_00000.vg
Comment 6 Corey Marthaler 2010-06-25 17:53:35 EDT
Created attachment 427006 [details]
/etc/lvm/archive/helter_skelter_00001.vg
Comment 7 Corey Marthaler 2010-06-25 17:54:06 EDT
Created attachment 427007 [details]
/etc/lvm/archive/helter_skelter_00002.vg
Comment 8 Petr Rockai 2010-06-28 15:54:07 EDT
Corey, what command do you use to upconvert the mirror? I have written following script:

aux prepare_vg 3
lvcreate -m 1 --ig -L 1 -n 2way $vg $dev1 $dev2 $dev3:0
disable_dev $dev2
echo n | lvconvert --repair $vg/2way 2>&1 | tee 2way.out
lvs -a -o +devices | not grep unknown
# the device is linear at this point
enable_dev $dev2
lvconvert -m 1 $vg/2way $dev1 $dev2 $dev3:0
check mirror $vg 2way $dev3

and it seems to work as expected with current CVS: the last lvconvert is saying this:

+ lvconvert -m 1 LVMTEST28808vg/2way /srv/build/lvm2/cvs-upstream/default/test/LVMTEST28808.YJdmEzPsP6/dev/mapper/LVMTEST28808pv1 /srv/build/lvm2/cvs-upstream/default/test/LVMTEST28808.YJdmEzPsP6/dev/mapper/LVMTEST28808pv2 /srv/build/lvm2/cvs-upstream/default/test/LVMTEST28808.YJdmEzPsP6/dev/mapper/LVMTEST28808pv3:0
  WARNING: Inconsistent metadata found for VG LVMTEST28808vg - updating to use version 8
  Missing device /srv/build/lvm2/cvs-upstream/default/test/LVMTEST28808.YJdmEzPsP6/dev/mapper/LVMTEST28808pv2 reappeared, updating metadata for VG LVMTEST28808vg to version 8.
  WARNING: This metadata update is NOT backed up
  WARNING: This metadata update is NOT backed up

the volume is mirrored again after the lvconvert, as checked by the last line of the script.

Thanks again.
Comment 9 Corey Marthaler 2010-06-28 18:03:01 EDT
'lvconvert -m $legnum -b $vg/$mirror @pvlist'

The pvlist includes the devices that were just failed/re-enabled. So that appears to be the only difference between our two cmds.
Comment 10 Petr Rockai 2010-06-29 18:06:18 EDT
Hm, could you get the output (ideally with -vvvv) of the failing lvconvert? I.e. your

lvconvert -m $legnum -b $vg/$mirror @pvlist

(with -vvvv added) on a volume group that has inconsistent metadata after the disabled device returned. Something is going wrong with that command -- it could be failing to get a lock or something like that maybe, or some other environment dependency is tripping the code. Also, by any chance, are you running in a cluster, or just locally?

Ta.
Comment 11 Corey Marthaler 2010-06-30 12:04:59 EDT
Created attachment 428028 [details]
lvconvert -vvvv
Comment 12 Petr Rockai 2010-06-30 16:34:02 EDT
I see. This shows up in the logs:

#metadata/metadata.c:3626   Cannot change VG helter_skelter while PVs are missing.
#metadata/metadata.c:3627   Consider vgreduce --removemissing.

this means, that your VG is incomplete at this point and you cannot upconvert the mirror without first fixing it. It would be interesting to know how you got into this situation.

What we have:
- /dev/sdg is failed
- a mirror write happens, which trips dmeventd, which
  - runs lvconvert --repair --use-policies ... this recovers the mirror
  - which in turn does (kind of) vgreduce --removemissing ... this removes /dev/sdg from helter_skelter

If you run vgextend at this point, it should notice that /dev/sdg disappeared from helter_skelter, update the inconsistent metadata and all should be well. This is assuming that all of the above worked.

So what I have found is that the second bullet under dmeventd, that is removal of /dev/sdg never happens with current code. This is a regression, and a likely cause for this bug. I will shortly submit a patch upstream. I have also corrected our upstream tests so this does not happen again...
Comment 13 Petr Rockai 2010-07-27 16:08:24 EDT
Fixed upstream in Version 2.02.70 - 6th July 2010: Restore the removemissing behaviour of lvconvert --repair --use-policies.
Comment 15 Corey Marthaler 2010-07-29 16:17:34 EDT
Fix verified in the latest build.

2.6.32-52.el6.x86_64

lvm2-2.02.72-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
lvm2-libs-2.02.72-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
lvm2-cluster-2.02.72-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
udev-147-2.21.el6    BUILT: Mon Jul 12 04:55:00 CDT 2010
device-mapper-1.02.53-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
device-mapper-libs-1.02.53-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
device-mapper-event-1.02.53-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
device-mapper-event-libs-1.02.53-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
cmirror-2.02.72-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
Comment 16 releng-rhel@redhat.com 2010-11-10 16:08:07 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.