Bug 829221

Summary: RFE: automatically restore PV from MISSING state after it becomes reachable again if it has no active MDA (ignoremetadata is true)
Product: Red Hat Enterprise Linux 6 Reporter: Leonid Natapov <lnatapov>
Component: lvm2Assignee: Petr Rockai <prockai>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.2CC: abaron, agk, bsarathy, cmarthal, coughlan, cpelland, dwysocha, ewarszaw, hateya, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, wnefal+redhatbugzilla, yeylon, ykaul, zkabelac
Target Milestone: rcKeywords: FutureFeature, Reopened, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.98-4.el6 Doc Type: Enhancement
Doc Text:
Feature: Automatically restore PV from MISSING state after it becomes reachable again if it has no active metadata areas. Reason: In cases of transient inaccessibility of a PV (like with iSCSI or other unreliable transport), LVM would require manual action to restore the PV for use even if there was no room for conflict, because there is no active MDA (metadata area) on the PV. Result (if any): Manual action is no longer required if the transiently inaccessible PV had no active metadata areas.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 08:10:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 883034, 886216    
Attachments:
Description Flags
logs
none
logs
none
logs none

Description Leonid Natapov 2012-06-06 09:05:48 UTC
Description of problem:
PV remained in MISSING state although the LUN was reachable.
A PV was marked as missing (a-m) in spite to be reachable.
We are convinced that it was reachable all the time because the other PV in the same storage and belonging to the same VG were normal. We also don't have any documentation explaining what is the MISSING flag.

attached logs are:
/var/log/messages and /etc/lvm/archive

Comment 3 Peter Rajnoha 2012-06-11 13:36:08 UTC
(In reply to comment #0)
> attached logs are:
> /var/log/messages and /etc/lvm/archive

Please, attach the logs mentioned above + the output of "pvs -vvvv" and output of the "lsblk" command. Also, try to grab the "sosreport" which might provide more insight for us on the state of the system as well. Thanks.

Comment 4 RHEL Program Management 2012-07-10 08:24:37 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 5 RHEL Program Management 2012-07-10 23:58:48 UTC
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 6 Peter Rajnoha 2012-07-18 11:02:04 UTC
We need the logs to move forward with this problem. Please, if possible, attach the logs mentioned in comment #3.

Comment 7 Leonid Natapov 2012-07-18 11:09:40 UTC
Created attachment 598851 [details]
logs

Comment 8 Leonid Natapov 2012-07-18 11:10:06 UTC
Created attachment 598852 [details]
logs

Comment 10 Gadi Ickowicz 2012-10-24 16:08:42 UTC
Created attachment 632877 [details]
logs

I just ran into this problem again, when attempting to remove a VG that was created on a single LUN, then extended to a second LUN.

The scenario used was this:

1. Create a vg on a single lun. 
2. extend vg to a second lun
3. The iscsi session is then closed, but reopened about a second later
4. both pvs are visible, however the vg has the partial attribute

logs attached as per previous comments.

This was reproduced on: 
Red Hat Enterprise Virtualization Hypervisor release 6.3 (20120710.0.el6_3)

Comment 11 Ben Marzinski 2012-10-24 17:23:40 UTC
Just so, we're on the same page here:

in step 1, you created the 013d86f5-5c0e-4f2c-b60a-4d0117ade7df VG on top of /dev/mapper/1qe-storage_sanity11340729

in step 2, you grew the VG by adding /dev/mapper/1qe-storage_sanity_ext1340730

correct?

In step 4, the dev/mapper/1qe-storage_sanity_ext1340730 PV is in the missing state

/dev/mapper/1qe-storage_sanity_ext1340730 013d86f5-5c0e-4f2c-b60a-4d0117ade7df lvm2 a-m  29.62g 29.62g  30.00g aNTqKL-HLBz-JYAI-KbDP-cqGo-PxAB-Y244bu


Can you run

# multipath -ll

both before and after step 3. I assume that like you say, both of these will show a working multipath device.

Looking at the messages, I can see

Oct 23 08:51:38 cyan-vdsh multipathd: sdb: remove path (uevent)
Oct 23 08:51:38 cyan-vdsh multipathd: 1qe-storage_sanity_ext1340730 Last path de
leted, disabling queueing
Oct 23 08:51:38 cyan-vdsh multipathd: 1qe-storage_sanity_ext1340730: devmap remo
ved
Oct 23 08:51:38 cyan-vdsh multipathd: 1qe-storage_sanity_ext1340730: stop event 
checker thread (139916255160064)
Oct 23 08:51:38 cyan-vdsh multipathd: 1qe-storage_sanity_ext1340730: removed map
 after removing all paths
...
Oct 23 08:52:13 cyan-vdsh multipathd: sdc: add path (uevent)
Oct 23 08:52:13 cyan-vdsh multipathd: 1qe-storage_sanity_ext1340730: load table [0 62914560 multipath 0 0 1 1 round-robin 0 1 1 8:32 1]
Oct 23 08:52:13 cyan-vdsh multipathd: 1qe-storage_sanity_ext1340730: event checker started
Oct 23 08:52:13 cyan-vdsh multipathd: sdc path added to devmap 1qe-storage_sanity_ext1340730

So, you lost your last path, and the multipath device wasn't open, so it got deleted. Less than a minute later, the path re-appears, and the multipath device gets recreated.

The reason that 1qe-storage_sanity11340729 isn't also marked as missing is that the device is in-use, so it doesn't get freed on the last delete.

Oct 23 08:51:38 cyan-vdsh multipathd: sda: remove path (uevent)
Oct 23 08:51:38 cyan-vdsh multipathd: 1qe-storage_sanity11340729 Last path delet
ed, disabling queueing
Oct 23 08:51:38 cyan-vdsh multipathd: 1qe-storage_sanity11340729: map in use
Oct 23 08:51:38 cyan-vdsh multipathd: 1qe-storage_sanity11340729: can't flush
Oct 23 08:51:38 cyan-vdsh multipathd: flush_on_last_del in progress
Oct 23 08:51:38 cyan-vdsh multipathd: 1qe-storage_sanity11340729: load table [0 
62914560 multipath 0 0 0 0]
Oct 23 08:51:38 cyan-vdsh multipathd: sda: path removed from map 1qe-storage_san
ity11340729

It looks like lVM is having a problem with clearing out the MISSING flag on PVs that disappear and then reappear.  I don't think this is multipath specific.  Petr, I'm kicking this over to you since it looks similar to Bug 537913.  If the multipath -ll output shows that the multipath device is really not usable after step 3, or if there's so other multipath issue I missed, you can kick it back.

Comment 12 Petr Rockai 2012-10-24 21:52:37 UTC
Basically, as I believe the report is about a PV actually disappearing and coming back, this is not a bug but actually a feature. If VG metadata changes while a PV is missing, that PV needs to be re-added by issuing "vgextend --restoremissing VG PV". This is to avoid accidental (and automatic) corruption in cases where the PV was modified (eg. by being available to another machine) while missing.

If RHEV can reasonably assume that the PV that went offline and came back is in a state where it can be automatically made available to the VG again, it should issue vgextend --restoremissing on such PVs.

Comment 13 Ayal Baron 2012-11-10 23:11:04 UTC
Reopening and marking as rfe as LVM can determine this automatically in this case.

Comment 14 Alasdair Kergon 2012-11-11 15:25:23 UTC
So what are you asking for?

A new configuration setting in LVM to make it *assume* that any MISSING device was not written to while it was missing?  RHEV would have to take the responsibility for ensuring that condition was always true and supporting users.  

We would state clearly that lvm-team will not support situations caused by incorrect use of the setting and it should only be used in a fully-controlled environment where the necessary "no split write" condition can be guaranteed through other control mechanisms.

Comment 15 Ayal Baron 2012-11-14 22:31:46 UTC
(In reply to comment #14)
> So what are you asking for?
> 
> A new configuration setting in LVM to make it *assume* that any MISSING
> device was not written to while it was missing?  RHEV would have to take the
> responsibility for ensuring that condition was always true and supporting
> users.  
> 
> We would state clearly that lvm-team will not support situations caused by
> incorrect use of the setting and it should only be used in a
> fully-controlled environment where the necessary "no split write" condition
> can be guaranteed through other control mechanisms.

Since the missing pv doesn't have any active mda (which users could alter) what changes are you protecting against?
The changes I can think of include:
1. the pv being added to another VG, which actually goes back to a different rfe I've opened a long time ago which was that LVM 'mark' each pv as belonging to a certain vg even if it doesn't have an active vg mda.

2. running 'pvcreate' (also covered by VG uuid appearing in pv md)
3. just writing data directly on the pv (this could happen to any pv regardless of whether it is missing or not so imo it's not interesting)

Also, iiuc the pv would not require manual addition if there are no vg changes while it is gone, despite the fact that all the changes above could happen to it in the same way, so I'm not quite sure I understand what it is you're protecting against.

Comment 16 Petr Rockai 2012-11-19 12:04:48 UTC
Alasdair,

what is being asked for is to detect "safe" situations automatically. It's not entirely easy, but it should be possible, at least for some of the safe situations. This basically means that for every PV that is MISSING but we have found, verify that it has no active MDA and if this is the case, clear the missing flag. I'll have to think a bit more whether we are opening any holes, but it seems to be safe.

Ayal,

if there are no VG changes, there is no room for conflict; if the locally-missing PV had a metadata update, it marks the locally-available PVs as MISSING in that update. In other words, the MISSING situation is basically symmetric, whatever one side sees is MISSING on the other. Either "side" of the split updating the VG while the other is gone will prompt recovery.

Comment 17 Ayal Baron 2012-11-19 12:20:37 UTC
(In reply to comment #16)
> Alasdair,
> 
> what is being asked for is to detect "safe" situations automatically. It's
> not entirely easy, but it should be possible, at least for some of the safe
> situations. This basically means that for every PV that is MISSING but we
> have found, verify that it has no active MDA and if this is the case, clear
> the missing flag. I'll have to think a bit more whether we are opening any
> holes, but it seems to be safe.
> 
> Ayal,
> 
> if there are no VG changes, there is no room for conflict; if the
> locally-missing PV had a metadata update, it marks the locally-available PVs
> as MISSING in that update. In other words, the MISSING situation is
> basically symmetric, whatever one side sees is MISSING on the other. Either
> "side" of the split updating the VG while the other is gone will prompt
> recovery.

got it.  as mentioned, I'm talking about "safe" situations where the missing pv did not previously have any active mda nor does it have it after reappearing.

Comment 18 Bhavna Sarathy 2012-11-20 15:17:28 UTC
This issue needs to be resolved in RHEL6.4 and in RHEL6.3.z.  RHEV 3.1 would like to take this fix in post GA, in RHEV 3.1.z.  This bug needs devel and qe acks.

Comment 19 Petr Rockai 2012-11-25 19:48:26 UTC
A fix will land upstream shortly (assuming tests pass), as 09d77d0..60668f8. Therefore, devel_ack+. I'll POST this bug as soon as the fix is actually upstream.

Comment 20 Petr Rockai 2012-11-25 21:32:25 UTC
Tests passed, including the one for this specific feature. The relevant upstream commit is 60668f823e830ce39e452234996910c51728aa76. Regarding QA, this is how I test the feature in upstream suite (disable a device with no MDA, do a write operation that is legal on partial VGs -- in this case, lvremove, triggering a metadata write with MISSING flag for the disabled device, then re-enabling the device and checking that a write operation succeeds, wiping the MISSING flag):

. lib/test

aux prepare_vg 3

pvchange --metadataignore y $dev1

lvcreate -m 1 -l 1 -n mirror $vg
lvchange -a n $vg/mirror
lvcreate -l 1 -n lv1 $vg "$dev1"

# try to just change metadata; we expect the new version (with MISSING_PV set
# on the reappeared volume) to be written out to the previously missing PV
aux disable_dev "$dev1"
lvremove $vg/mirror
not vgck $vg 2>&1 | tee log
grep "missing 1 physical volume" log
not lvcreate -m 1 -l 1 -n mirror $vg # write operations fail
aux enable_dev "$dev1"
lvcreate -m 1 -l 1 -n mirror $vg # no MDA => automatically restored
vgck $vg

Comment 23 Corey Marthaler 2012-12-11 21:18:22 UTC
Fix verified in the latest rpms.

2.6.32-343.el6.x86_64
lvm2-2.02.98-4.el6    BUILT: Wed Dec  5 08:35:04 CST 2012
lvm2-libs-2.02.98-4.el6    BUILT: Wed Dec  5 08:35:04 CST 2012
lvm2-cluster-2.02.98-4.el6    BUILT: Wed Dec  5 08:35:04 CST 2012
udev-147-2.43.el6    BUILT: Thu Oct 11 05:59:38 CDT 2012
device-mapper-1.02.77-4.el6    BUILT: Wed Dec  5 08:35:04 CST 2012
device-mapper-libs-1.02.77-4.el6    BUILT: Wed Dec  5 08:35:04 CST 2012
device-mapper-event-1.02.77-4.el6    BUILT: Wed Dec  5 08:35:04 CST 2012
device-mapper-event-libs-1.02.77-4.el6    BUILT: Wed Dec  5 08:35:04 CST 2012
cmirror-2.02.98-4.el6    BUILT: Wed Dec  5 08:35:04 CST 2012

The mirror failure test cases no longer need to 'vgreduce --removemissing' and recreate the failed PVs when there is no MDA area on the failed device.

================================================================================
Iteration 10.1 started at Tue Dec 11 13:07:59 CST 2012
================================================================================
Scenario kill_no_mda_primary_2_legs: Kill primary leg containing *no* MDA of synced 2 leg mirror(s)

********* Mirror hash info for this scenario *********
* names:              no_mda_primary_2legs_1
* sync:               1
* striped:            0
* leg devices:        /dev/sdd1 /dev/sdf1
* log devices:        /dev/sdg1
* no MDA devices:     /dev/sdd1
* failpv(s):          /dev/sdd1
* failnode(s):        taft-01
* leg fault policy:   remove
* log fault policy:   remove
******************************************************




================================================================================
Iteration 10.1 started at Tue Dec 11 13:07:59 CST 2012
================================================================================
Scenario kill_no_mda_primary_2_legs: Kill primary leg containing *no* MDA of synced 2 leg mirror(s)

********* Mirror hash info for this scenario *********
* names:              no_mda_primary_2legs_1
* sync:               1
* striped:            0
* leg devices:        /dev/sdd1 /dev/sdf1
* log devices:        /dev/sdg1
* no MDA devices:     /dev/sdd1
* failpv(s):          /dev/sdd1
* failnode(s):        taft-01
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Comment 24 errata-xmlrpc 2013-02-21 08:10:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0501.html