Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 929193

Summary: Regular mirror devices are marked missing (a-m) even after the PVs are back online
Product: Red Hat Enterprise Linux 7 Reporter: Nenad Peric <nperic>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Default / Unclassified QA Contact: Cluster QE <mspqa-list>
Status: CLOSED NOTABUG Docs Contact:
Severity: unspecified    
Priority: unspecified CC: agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, zkabelac
Version: 7.0   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-31 21:04:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
-vvvv output of pvs -a and vgs none

Description Nenad Peric 2013-03-29 13:07:26 UTC
Created attachment 718079 [details]
-vvvv output of pvs -a and vgs

Description of problem:

When an ordinaty mirror (NOT raid1) is created and a few devices are taken offline, even after they come back LVM sees them as missing (they are not unknown)

Here is an example of what happens:


[root@tardis-01-port1 ~]# pvs -a
  Couldn't find device with uuid 3zV5pT-Xr0L-XLWw-gPdp-p5IY-h1BD-AOkyVq.
  PV                             VG                   Fmt  Attr PSize   PFree 
  /dev/revolution_9/mirror_1                               ---       0      0 
  /dev/rhel_tardis-01-port1/home                           ---       0      0 
  /dev/rhel_tardis-01-port1/root                           ---       0      0 
  /dev/rhel_tardis-01-port1/swap                           ---       0      0 
  /dev/sda1                                                ---       0      0 
  /dev/sda2                      rhel_tardis-01-port1 lvm2 a--  278.88g     0 
  /dev/sdb1                                                ---       0      0 
  /dev/sdc1                                                ---       0      0 
  /dev/sdd1                                                ---       0      0 
  /dev/sde1                                                ---       0      0 
  /dev/sdf1                      revolution_9         lvm2 a-m   93.12g 93.12g
  /dev/sdg1                      revolution_9         lvm2 a-m   93.12g 93.12g
  /dev/sdi1                      revolution_9         lvm2 a-m   93.12g 93.12g
  /dev/sdj1                      revolution_9         lvm2 a--   93.12g 91.12g
  /dev/sdk1                      revolution_9         lvm2 a-m   93.12g 93.12g
[root@tardis-01-port1 ~]# dmsetup ls
revolution_9-mirror_1	(253:7)
rhel_tardis--01--port1-home	(253:2)
rhel_tardis--01--port1-swap	(253:0)
rhel_tardis--01--port1-root	(253:1)
[root@tardis-01-port1 ~]# dmsetup table
revolution_9-mirror_1: 0 4194304 linear 8:145 10240
rhel_tardis--01--port1-home: 0 471597056 linear 8:2 8390656
rhel_tardis--01--port1-swap: 0 8388608 linear 8:2 2048
rhel_tardis--01--port1-root: 0 104857600 linear 8:2 479987712
[root@tardis-01-port1 ~]# lvs -a
  Couldn't find device with uuid 3zV5pT-Xr0L-XLWw-gPdp-p5IY-h1BD-AOkyVq.
  LV       VG                   Attr      LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  mirror_1 revolution_9         -wi-ao---   2.00g                                             
  home     rhel_tardis-01-port1 -wi-ao--- 224.88g                                             
  root     rhel_tardis-01-port1 -wi-ao---  50.00g                                             
  swap     rhel_tardis-01-port1 -wi-ao---   4.00g                                             
[root@tardis-01-port1 ~]# vgs
  Couldn't find device with uuid 3zV5pT-Xr0L-XLWw-gPdp-p5IY-h1BD-AOkyVq.
  VG                   #PV #LV #SN Attr   VSize   VFree  
  revolution_9           6   1   0 wz-pn- 558.75g 556.75g
  rhel_tardis-01-port1   1   3   0 wz--n- 278.88g      0 


[root@tardis-01-port1 ~]# cat /sys/block/sd{f,g,i,k}/device/state 
running
running
running
running


as you can see the devices are running - but are marked missing.
Now I will add back the remaining missing PV and repeat the vgs command (as you can see all the devices now come back as available):

[root@tardis-01-port1 ~]# echo "running" >/sys/block/sdh/device/state
[root@tardis-01-port1 ~]# vgs
  WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 33
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 33.
  Missing device /dev/sdi1 reappeared, updating metadata for VG revolution_9 to version 33.
  Missing device /dev/sdh1 reappeared, updating metadata for VG revolution_9 to version 33.
  Missing device /dev/sdg1 reappeared, updating metadata for VG revolution_9 to version 33.
  Missing device /dev/sdf1 reappeared, updating metadata for VG revolution_9 to version 33.
  VG                   #PV #LV #SN Attr   VSize   VFree  
  revolution_9           6   1   0 wz--n- 558.75g 556.75g
  rhel_tardis-01-port1   1   3   0 wz--n- 278.88g      0 




Version-Release number of selected component (if applicable):

lvm2-2.02.99-0.6.el7.x86_64
device-mapper-1.02.78-0.6.el7.x86_64


How reproducible:
Everytime


Steps to Reproduce:

Actual results:

Create a regular mirror with multiple legs, then fail random legs, and bring them back but leave at least one PV still missing (failed/offline).
The PVs which are in online/running state are going to be marked as (a-m) for as long as the last PV is offline. 
As soon as it comes back online, all of the PVs lose the missing flag and LV sees them as present. 


Expected results:

The devices which have re-appeared should not have a missing flag in ordinary mirror situation. 


Additional information:

This may be somehow related to Bug #921715 as well.

Comment 3 Zdenek Kabelac 2013-04-03 08:19:00 UTC
Have you tried to use:

vgextend --restoremissing vgname  pvnames...

Comment 4 Nenad Peric 2013-04-03 09:45:45 UTC
I know that using vgextend/vgreduce is an option to change their state, but I have not tried to do so. They should not have been marked as a-m when using mirror_segtype_default=mirror. 

The activity of marking PVs as missing and then reinstating them via vgextend/vgreduce was expected to be used with raid1 segtype, but not with old mirror segtype afaik.

If I am correct in my assumption:

1) Why are PVs marked missing when they are back in ordinary mirrors?

2) Why don't they come back on their own, but do back in a bunch once the last PV has returned without any vgextend intevention in-between (there were lvm commands executed in the meantime, so it is not a timing issue)?


If I am not correct with my segtype=mirror assumption:

Is this marking of PVs as 'a-m' now an expected behavior for the returning (running) PVs of ordinary mirror segtype? 

If so, then:

a) it has to be documented and 
b) PVs should never lose the a-m flags on a whim, for as long as they are part of the same VG they were in when they went offline (otherwise what would be the point of such an unstable a-m flag).

Comment 5 Alasdair Kergon 2013-04-03 10:41:17 UTC
Is it using the 'missing' devices when they come back?  (It shouldn't in case something was changed on them while disconnected.  Unless you run --restoremissing to confirm that nothing was changed on them, the system has to do a full resync.)

Comment 6 Nenad Peric 2013-04-03 11:17:26 UTC
Well it just removed the a-m flags from the returned devices (as you can see in my opening comment), without me doing any --restoremissing

They sat there as a-m:

  /dev/sdb1                                                ---       0      0 
  /dev/sdc1                                                ---       0      0 
  /dev/sdd1                                                ---       0      0 
  /dev/sde1                                                ---       0      0 
  /dev/sdf1                      revolution_9         lvm2 a-m   93.12g 93.12g
  /dev/sdg1                      revolution_9         lvm2 a-m   93.12g 93.12g
  /dev/sdi1                      revolution_9         lvm2 a-m   93.12g 93.12g
  /dev/sdj1                      revolution_9         lvm2 a--   93.12g 91.12g
  /dev/sdk1                      revolution_9         lvm2 a-m   93.12g 93.12g

(note that /dev/sdh1 is not there, since it is offline) 

The devices are all online, LVM sees their names, and they are a-m.

Now I return /dev/sdh1 from the dead (only that, no other commands done) and do a vgs:

[root@tardis-01-port1 ~]# echo "running" >/sys/block/sdh/device/state
[root@tardis-01-port1 ~]# vgs
  WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 33
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 33.
  Missing device /dev/sdi1 reappeared, updating metadata for VG revolution_9 to version 33.
  Missing device /dev/sdh1 reappeared, updating metadata for VG revolution_9 to version 33.
  Missing device /dev/sdg1 reappeared, updating metadata for VG revolution_9 to version 33.
  Missing device /dev/sdf1 reappeared, updating metadata for VG revolution_9 to version 33.
  VG                   #PV #LV #SN Attr   VSize   VFree  
  revolution_9           6   1   0 wz--n- 558.75g 556.75g
  rhel_tardis-01-port1   1   3   0 wz--n- 278.88g      0

Comment 8 Petr Rockai 2013-04-03 15:30:55 UTC
The segment type has nothing to do with treatment of missing PVs. Even if you don't have mirrors at all, same thing will happen, once you write to a VG, PVs will be marked as missing. Unless you write to the VG (whether due to recovery or unrelated action) the missing flag won't be cleared.

Comment 9 Petr Rockai 2013-04-03 15:35:12 UTC
As for PVs coming back on a whim, I was always opposed to that, but code to sometimes do that has been added to LVM. It's not entirely correct but will work more often than not. It probably won't lose data as long as metadata backups are working correctly.

Comment 10 Petr Rockai 2013-07-31 21:04:57 UTC
Overall, I don't think this is a problem. If you think some behaviour needs to change in the code, please re-open the bug, ideally with an example of a run where you think the current logic fails. Two kinds of failures could happen:
- it is entirely clear that the PVs in question could not change while it was
  missing, but the code did not detect this
- the code automatically re-integrated a PV which was modified externally and
  possibly wiped the changed metadata in the process

Please do keep in mind that when an automated mirror recovery kicks in (whether hotspare or downconvert), this also means that the metadata has diverged and we currently do not treat these cases as safe to reclaim (fails the "entirely clear" criterion from point 1).

(PS: Using a chain-(crypto)hashing scheme for metadata versioning would presumably allow us to establish clear relationships between random snapshots of the same VG. Whether this is viable is an exercise for the future.)