Bug 995440

Summary: lvmetad unable to update metadata after failed PV has returned to mirror.
Product: Red Hat Enterprise Linux 6 Reporter: Nenad Peric <nperic>
Component: lvm2Assignee: Petr Rockai <prockai>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.5CC: agk, cmarthal, dwysocha, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, tlavigne, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: lvm2-2.02.100-5.el6 Doc Type: Bug Fix
Doc Text:
Cause: Repair of inconsistent metadata used a different code path depending on whether lvmetad was running and enabled. Consequence: The lvmetad version of metadata repair failed to actually correct the metadata and a warning was printed repeatedly by every command (until the problem was manually fixed). Fix: The code paths have been reconciled. Result: Metadata inconsistencies are automatically repaired as appropriate, whether lvmetad is enabled or not.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-21 23:26:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nenad Peric 2013-08-09 11:27:11 UTC
Description of problem:
When a PV fails and leaves VG (mirror leg failure), then subsequently returns, lvmetad cannot seem to cope with updating of the metadata. It keeps repeating the same message over and over again

This is the output:

[root@tardis-01 log]# vgs
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  VG           #PV #LV #SN Attr   VSize   VFree  
  revolution_9   6   1   0 wz--n- 465.80g 459.80g
  vg_tardis01    1   3   0 wz--n- 278.88g      0 
[root@tardis-01 log]# pvs
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  PV         VG           Fmt  Attr PSize   PFree  
  /dev/sda2  vg_tardis01  lvm2 a--  278.88g      0 
  /dev/sdb1  revolution_9 lvm2 a--   93.12g  91.12g
  /dev/sdc1               lvm2 a--   93.13g  93.13g
  /dev/sdd1  revolution_9 lvm2 a--   93.12g  91.12g
  /dev/sde1  revolution_9 lvm2 a--  184.00m 184.00m
  /dev/sdf1               lvm2 a--   93.13g  93.13g
  /dev/sdg1               lvm2 a--   93.13g  93.13g
  /dev/sdi1  revolution_9 lvm2 a--   93.12g  93.12g
  /dev/sdk1  revolution_9 lvm2 a--   93.12g  93.12g
  /dev/sdl1               lvm2 a--   93.13g  93.13g
  /dev/sdm1  revolution_9 lvm2 a--   93.12g  91.12g
  /dev/sdo1               lvm2 a--   93.13g  93.13g
[root@tardis-01 log]# lvs
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  LV       VG           Attr       LSize   Pool Origin Data%  Move Log           Cpy%Sync Convert
  mirror_1 revolution_9 mwi-a-m---   2.00g                         mirror_1_mlog   100.00        
  lv_home  vg_tardis01  -wi-ao---- 224.88g                                                       
  lv_root  vg_tardis01  -wi-ao----  50.00g                                                       
  lv_swap  vg_tardis01  -wi-ao----   4.00g                                                       


The message:
Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
keeps repeating with every LV command. 
By the way the returned PV is actually shown in the VG as present. 


with lvmetad off it goes like so:

[root@tardis-01 log]# lvs --config 'global{use_lvmetad=0}'
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  WARNING: Inconsistent metadata found for VG revolution_9 - updating to use version 11
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  LV       VG           Attr       LSize   Pool Origin Data%  Move Log           Cpy%Sync Convert
  mirror_1 revolution_9 mwi-a-m---   2.00g                         mirror_1_mlog   100.00        
  lv_home  vg_tardis01  -wi-ao---- 224.88g                                                       
  lv_root  vg_tardis01  -wi-ao----  50.00g                                                       
  lv_swap  vg_tardis01  -wi-ao----   4.00g                                                       
[root@tardis-01 log]# lvs --config 'global{use_lvmetad=0}'
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  LV       VG           Attr       LSize   Pool Origin Data%  Move Log           Cpy%Sync Convert
  mirror_1 revolution_9 mwi-a-m---   2.00g                         mirror_1_mlog   100.00        
  lv_home  vg_tardis01  -wi-ao---- 224.88g                                                       
  lv_root  vg_tardis01  -wi-ao----  50.00g                                                       
  lv_swap  vg_tardis01  -wi-ao----   4.00g                                                  

As you can see the message does not repeat anymore. 


Turn on lvmetad again and we get:

[root@tardis-01 log]# lvs
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  Missing device /dev/sdk1 reappeared, updating metadata for VG revolution_9 to version 11.
  LV       VG           Attr       LSize   Pool Origin Data%  Move Log           Cpy%Sync Convert
  mirror_1 revolution_9 mwi-a-m---   2.00g                         mirror_1_mlog   100.00        
  lv_home  vg_tardis01  -wi-ao---- 224.88g                                                       
  lv_root  vg_tardis01  -wi-ao----  50.00g                                                       
  lv_swap  vg_tardis01  -wi-ao----   4.00g        




Version-Release number of selected component (if applicable):

lvm2-2.02.100-0.45.el6.x86_64

How reproducible:

Eveerytime

Steps to Reproduce:
1. Create VG, Create a mirror LV, wait for sync
2. Fail a random PV (wait for repair/conversion)
3. Get the PV back and try to execute any LVM command with lvmetad running.


Expected results:

Should update metadata and its version as it does without lvmetad on.

Comment 3 Nenad Peric 2013-10-08 11:56:44 UTC
I can still reproduce this easily by running revolution_9 test:

revolution_9 -i 5 -o virt-012 -e kill_random_legs,kill_random_devices

it is 100% reproducible for me (lvmetad on)

[root@virt-012 ~]# vgs
  PV LpymAu-GeS1-jD9f-OHgn-keyq-kPcv-Jj1qO8 not recognised. Is the device missing?
  Missing device /dev/sdj1 reappeared, updating metadata for VG revolution_9 to version 55.
  Missing device /dev/sdb1 reappeared, updating metadata for VG revolution_9 to version 55.
  PV LpymAu-GeS1-jD9f-OHgn-keyq-kPcv-Jj1qO8 not recognised. Is the device missing?
  Missing device /dev/sdj1 reappeared, updating metadata for VG revolution_9 to version 55.
  Missing device /dev/sdb1 reappeared, updating metadata for VG revolution_9 to version 55.
  VG           #PV #LV #SN Attr   VSize  VFree 
  revolution_9   6   1   0 wz-pn- 59.95g 55.95g
  vg_virt012     1   2   0 wz--n-  7.51g     0 
[root@virt-012 ~]# lvs
  PV LpymAu-GeS1-jD9f-OHgn-keyq-kPcv-Jj1qO8 not recognised. Is the device missing?
  Missing device /dev/sdj1 reappeared, updating metadata for VG revolution_9 to version 55.
  Missing device /dev/sdb1 reappeared, updating metadata for VG revolution_9 to version 55.
  PV LpymAu-GeS1-jD9f-OHgn-keyq-kPcv-Jj1qO8 not recognised. Is the device missing?
  Missing device /dev/sdj1 reappeared, updating metadata for VG revolution_9 to version 55.
  Missing device /dev/sdb1 reappeared, updating metadata for VG revolution_9 to version 55.
  LV       VG           Attr       LSize   Pool Origin Data%  Move Log           Cpy%Sync Convert
  mirror_1 revolution_9 mwi-aom---   2.00g                         mirror_1_mlog   100.00        
  lv_root  vg_virt012   -wi-ao----   6.71g                                                       
  lv_swap  vg_virt012   -wi-ao---- 816.00m                        

================================================================================
================================================================================

An easy way to reproduce WITHOUT this revolution test would be:

Turn off lvmetad and set use_lvmetad to 0. 
Have these set in lvm.conf:

	mirror_log_fault_policy="allocate"
	mirror_image_fault_policy="allocate"

	mirror_segtype_default="mirror"

Create a VG, and a mirrored LV, wait for sync, then fail a device:

- vgcreate newvg /dev/sd{a..f}1
- lvcreate -m1 -L3G -n mirror newvg
- echo 1 >/sys/block/sda/device/delete

do some I/O to force repair and replacement of the device
wait for the sync to finish 9so that LVM is not doing anything on the LV anymore)

When device is replaced and sync is done, turn on lvmetad by changing use_lvmetad to  1 in lvm.conf and starting lvm2-lvmetad daemon (/etc/init.d/lvm2-lcmetad start). 
Return the failing device by rescanning the scsi bus 
for example: 

echo "- - -" >/sys/class/scsi_host/host6/scan

Now try any lvm command, here are the results:

[root@virt-012 ~]# lvs
vg  Missing device /dev/sda1 reappeared, updating metadata for VG newvg to version 12.
  Missing device /dev/sda1 reappeared, updating metadata for VG newvg to version 12.
  LV      VG         Attr       LSize   Pool Origin Data%  Move Log         Cpy%Sync Convert
  mirror  newvg      mwi-a-m---   3.00g                         mirror_mlog   100.00        
  lv_root vg_virt012 -wi-ao----   6.71g                                                     
  lv_swap vg_virt012 -wi-ao---- 816.00m                                                     
[root@virt-012 ~]# vgs
  Missing device /dev/sda1 reappeared, updating metadata for VG newvg to version 12.
  Missing device /dev/sda1 reappeared, updating metadata for VG newvg to version 12.
  VG         #PV #LV #SN Attr   VSize  VFree 
  newvg        6   1   0 wz--n- 59.95g 53.95g
  vg_virt012   1   2   0 wz--n-  7.51g     0 
[root@virt-012 ~]# pvscan --cache
  WARNING: Inconsistent metadata found for VG newvg
[root@virt-012 ~]# vgs
  Missing device /dev/sda1 reappeared, updating metadata for VG newvg to version 12.
  Missing device /dev/sda1 reappeared, updating metadata for VG newvg to version 12.
  VG         #PV #LV #SN Attr   VSize  VFree 
  newvg        6   1   0 wz--n- 59.95g 53.95g
  vg_virt012   1   2   0 wz--n-  7.51g     0 


The number just stays the same.

Comment 4 Zdenek Kabelac 2013-10-08 14:24:42 UTC
The problem here seems to be with lvmetad updating its cache.

I've upstreamed test case for internal lvm test suite for such case:
https://www.redhat.com/archives/lvm-devel/2013-October/msg00021.html

Fix for lvmetad case needs to be added.

Comment 5 Petr Rockai 2013-10-09 13:32:36 UTC
This should be fixed upstream in 0decd7553ac9dcf4a7d81f5b10b1f4ca053ae9a5. (We have cleared up the matter about lvconvert --repair: without lvmetad, it accidentally repairs the metadata even though it's not supposed to do that. It's not a major issue, although it might be surprising that PVs are re-integrated from by dmeventd. With "normal" commands, things now work as expected both with and without lvmetad.)

Comment 7 Nenad Peric 2013-10-11 12:44:28 UTC
The messages no longer repeat and metadata is updated as it should be.

[root@virt-011 ~]# vgs
  PV SRfdXr-cr5q-1lgz-UFi3-7uRd-dJhd-vcKszr not recognised. Is the device missing?
  PV SRfdXr-cr5q-1lgz-UFi3-7uRd-dJhd-vcKszr not recognised. Is the device missing?
  VG         #PV #LV #SN Attr   VSize  VFree 
  newvg        4   1   0 wz-pn- 39.97g 33.96g
  vg_virt011   1   2   0 wz--n-  7.51g     0 
[root@virt-011 ~]# echo "- - -" >/sys/class/scsi_host/host9/scan 
[root@virt-011 ~]# lvs -a
  Missing device /dev/sdb1 reappeared, updating metadata for VG newvg to version 10.
  LV                VG         Attr       LSize   Pool Origin Data%  Move Log         Cpy%Sync Convert
  mirror            newvg      mwi-a-m---   3.00g                         mirror_mlog   100.00        
  [mirror_mimage_0] newvg      iwi-aom---   3.00g                                                     
  [mirror_mimage_1] newvg      iwi-aom---   3.00g                                                     
  [mirror_mlog]     newvg      lwi-aom---   4.00m                                                     
  lv_root           vg_virt011 -wi-ao----   6.71g                                                     
  lv_swap           vg_virt011 -wi-ao---- 816.00m                                                     
[root@virt-011 ~]# lvs -a
  LV                VG         Attr       LSize   Pool Origin Data%  Move Log         Cpy%Sync Convert
  mirror            newvg      mwi-a-m---   3.00g                         mirror_mlog   100.00        
  [mirror_mimage_0] newvg      iwi-aom---   3.00g                                                     
  [mirror_mimage_1] newvg      iwi-aom---   3.00g                                                     
  [mirror_mlog]     newvg      lwi-aom---   4.00m                                                     
  lv_root           vg_virt011 -wi-ao----   6.71g                                                     
  lv_swap           vg_virt011 -wi-ao---- 816.00m                                  


marking VERIFIED with:

lvm2-2.02.100-5.el6.x86_64

Comment 8 errata-xmlrpc 2013-11-21 23:26:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1704.html