Bug 1215228

Summary: LVM (vgdisplay) does not show the true path hierarchy of underlying PVs
Product: Red Hat Enterprise Linux 6 Reporter: Stan Saner <ssaner>
Component: lvm2Assignee: Alasdair Kergon <agk>
lvm2 sub component: Devices, Filtering and Stacking (RHEL6) QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, prockai, rbednar, robert.x.tomczyk, salmy, teigland, tlavigne, zkabelac
Version: 6.6   
Target Milestone: rc   
Target Release: 6.7   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.143-6.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-11 01:16:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1172231, 1268411    

Description Stan Saner 2015-04-24 16:30:46 UTC
Description of problem: 
-----------------------
LVM (vgdisplay) does not show the true path hierarchy of underlying PVs if LVM detects storage via non-multipathed device files rather than multipathed ones during boot.


Version-Release number of selected component (if applicable): RHEL 6


How reproducible:
----------------- 
Reproducible usually with 3rd party multipathing products like Veritas VxDMP or EMC Powerpath are not activated early enough during boot to allow LVM to detect multipathed storage hosting root filesystem and primary swap, or when lvmconf filtering in initrd causes detection of non-multipathed devices in preference to multipathed ones.

Later when the system is fully booted, multipathing fully activated, vgdisplay will report multipathed devices, because it scans devices to generate its output


Steps to Reproduce:
-------------------
1. Install and configure Symantec/Veritas Storage Foundation SFHA version 6.1 or 6.2

2. Configure VxDMP to control the multipathing to the physical volumes hosting root FS and primary swap



Actual results:
---------------
Example from the customer configuration:

dmsetup reports for the vg1_root and vg1_swap logical volumes the mapping as not using multipathed devices

vg_root-vg1_swap (253:4)
 `- (8:209)                         ---> /dev/sdn1
vg_root-vg1_root (253:1)
 `-vg_root-vg1_root-real (253:0)
    `- (8:209)                      ---> /dev/sdn1

Note the major:minor number 8:209 corresponds to
brw-rw----.  1 root disk      8, 209 Apr  9 14:21 sdn1


Other logical volumes in the _same_ root volume group have the expected mapping with multipathed devices

vg_root-vg1_VG1_FS1 (253:6)
 `-vg_root-vg1_VG1_FS1-real (253:5)
    `- (201:177)                     ---> emc_clariion0_12s1
vg_root-vg1_VG1_FS0 (253:10)
 `-vg_root-vg1_VG1_FS0-real (253:9)
    `- (201:177)
vg_root-litp_vg1_VG1_FS1_snapshot (253:8)
 |-vg_root-litp_vg1_VG1_FS1_snapshot-cow (253:7)
 |  `- (201:177)
 `-vg_root-vg1_VG1_FS1-real (253:5)
    `- (201:177)
vg_root-litp_vg1_VG1_FS0_snapshot (253:12)
 |-vg_root-litp_vg1_VG1_FS0_snapshot-cow (253:11)
 |  `- (201:177)
 `-vg_root-vg1_VG1_FS0-real (253:9)
    `- (201:177)



vgdisplay -vv reports physical volumes for the whole volume group and does not differentiate when some of the logical volumes use different PVs. It reports multipathed devices are in use.

  --- Logical volume ---
  LV Path                /dev/vg_root/vg1_root
  LV Name                vg1_root
  VG Name                vg_root

  LV Path                /dev/vg_root/vg1_swap
  LV Name                vg1_swap
  VG Name                vg_root

 VG Name               vg_root
  --- Physical volumes ---
  PV Name               /dev/vx/dmp/emc_clariion0_12s1
  PV Name               /dev/vx/dmp/emc_clariion0_43s1
  PV Name               /dev/vx/dmp/emc_clariion0_3s2


The VxDMP device files are

brw-------. 1 root root 201, 177 Apr  9 14:20 emc_clariion0_12s1
brw-------. 1 root root 201,  81 Apr  9 14:20 emc_clariion0_43s1
brw-------. 1 root root 201,  98 Apr  9 14:20 emc_clariion0_3s2


Expected results:
-----------------

vgdisplay should report either the non-multipathed device files are being used or print a warning that the devices actually in use by the kernel do not match the reported multipathed ones. 


Additional info:
----------------

I suppose following questions need answering before deciding how to address this problem:

1. In customer system only root FS and primary swap had the LVM -> /dev/sd* mapping. The volumes in the root VG activated later during boot when multipathing layers are fully active had correct mappings to multipathed devices

vgdisplay reports only one set of PVs that applies to all logical volumes. Which ones should it report, the ones used by root FS and primary swap or the rest that uses multipathed devices?

We should perhaps get at least some warning that some volumes use non-multipathed devices.

2. Is there a potential for memory corruption perhaps due to caching when some lvols go directly to /dev/sd* and others go through multipathing layers?

3. Could we perhaps implement an extra option for vgdisplay that would report the mappings as dmsetup reports it? 



Veritas is addressing their problem by ensuring the correct filters are in place in the lvm.conf and activating the root VG volumes via the multipathed device files, so that should prevent from the problem in the first place.

However there is still a potential exposure to such problem with other 3rd party multipathing solutions, so vgdisplay / LVM code needs fortification.

Comment 2 David Teigland 2015-04-24 17:05:34 UTC
I'm in the middle of working a patch in this area, and we should at a minimum be able to report warnings from the lvm commands that fully describe the situation.

If or how we could enhance the formal output of the reporting/display commands is a more difficult question that will take more time to sort out.

Comment 3 Stan Saner 2015-05-06 15:01:57 UTC
(In reply to David Teigland from comment #2)
> I'm in the middle of working a patch in this area, and we should at a
> minimum be able to report warnings from the lvm commands that fully describe
> the situation.
> 
> If or how we could enhance the formal output of the reporting/display
> commands is a more difficult question that will take more time to sort out.

Hi David,

   Customer has been extremely cooperative and said they would be willing to test at least the initial simpler fix where the warnings are printed from the lvm commands.

I know it is rather unrealistic to expect the output enhancement of the display commands at this stage as the team needs to discuss how to approach that. But if you have the simpler implementation ready, please share. The customer may still have the reproduction environment available, but it may need to be reused for other purposes soon. This is our chance to have it tested.

Thanks and #regards,
Stan Saner

Comment 4 David Teigland 2015-05-08 16:42:49 UTC
I'd suggest using the last tagged release, which right now is 2.02.119

commit bee2df3903d0956ba2e09ce9ae9ae55dfc5d3fd1
Author: Alasdair G Kergon <agk>
Date:   Sat May 2 01:41:17 2015 +0100

    pre-release

Comment 5 Alasdair Kergon 2015-05-09 00:35:58 UTC
(In reply to David Teigland from comment #4)
> I'd suggest using the last tagged release, which right now is 2.02.119

If providing any build of this, make it clear that this is NOT a supported release and must ONLY be installed on test machines that will be reinstalled after testing.

Comment 9 Stan Saner 2015-06-23 12:36:35 UTC
The customer testing of the patch created from the tagged release 2.02.119 (I branched off and extracted the relevant bits, see commit 64ba86e61c59a9f214db1d74ec311fef5400e299 ) did not provide the expected result. The testing was performed under 2 scenarios:

1. install the patch on a system with the existing problem and see the effect

# pvs
  PV                              VG      Fmt  Attr PSize   PFree  
  /dev/vx/dmp/emc_clariion0_116s2 vg_root lvm2 a--  169.51g  64.51g
  /dev/vx/dmp/emc_clariion0_137   vg_app  lvm2 a--  305.00g 200.00g

# lvs
  LV                  VG      Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  vg2_lv_etc          vg_app  -wi-ao----  5.00g                                                    
  vg2_lv_opt          vg_app  -wi-ao---- 20.00g                                                    
  vg2_lv_var_ericsson vg_app  -wi-ao---- 80.00g                                                    
  vg_root_lv_root     vg_root -wi-ao---- 50.00g                                                    
  vg_root_lv_swap     vg_root -wi-ao----  5.00g                                                    
  vg_root_lv_var      vg_root -wi-ao---- 50.00g                                                    

# dmsetup ls --tree
vg_app-vg2_lv_opt (253:2)
 └─ (201:112)
vg_app-vg2_lv_var_ericsson (253:4)
 └─ (201:112)
vg_root-vg_root_lv_var (253:5)
 └─ (201:82)
vg_root-vg_root_lv_swap (253:1)
 └─ (8:242)
vg_root-vg_root_lv_root (253:0)
 └─ (8:242)
vg_app-vg2_lv_etc (253:3)
 └─ (201:112)



brw-rw----.  1 root disk      8, 242 Jun 22 10:54 sdp2
lrwxrwxrwx.   1 root root   98 Jun 22 10:54 b8:242 -> /devices/pci0000:00/0000:00:03.0/0000:05:00.1/host2/rport-2:0-6/target2:0:0/2:0:0:0/block/sdp/sdp2

brw-------. 1 root root 201,  82 Jun 22 10:54 emc_clariion0_116s2

# rpm -qa | grep lvm2
lvm2-2.02.119-1.el6_6.0.bz1215288.x86_64
lvm2-libs-2.02.119-1.el6_6.0.bz1215288.x86_64

# rpm -qa | grep device-mapper
device-mapper-devel-1.02.88-1.el6_6.0.bz1215288.x86_64
device-mapper-persistent-data-0.3.2-1.el6.x86_64
device-mapper-event-1.02.88-1.el6_6.0.bz1215288.x86_64
device-mapper-event-libs-1.02.88-1.el6_6.0.bz1215288.x86_64
device-mapper-libs-1.02.88-1.el6_6.0.bz1215288.x86_64
device-mapper-event-devel-1.02.88-1.el6_6.0.bz1215288.x86_64
device-mapper-1.02.88-1.el6_6.0.bz1215288.x86_64


2. include the patch in the install image and perform the fresh install

# pvs
  PV                              VG      Fmt  Attr PSize   PFree  
  /dev/vx/dmp/emc_clariion0_116s2 vg_root lvm2 a--  169.51g  64.51g
  /dev/vx/dmp/emc_clariion0_137   vg_app  lvm2 a--  305.00g 200.00g

# lvs
  LV                  VG      Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  vg2_lv_etc          vg_app  -wi-ao----  5.00g                                                    
  vg2_lv_opt          vg_app  -wi-ao---- 20.00g                                                    
  vg2_lv_var_ericsson vg_app  -wi-ao---- 80.00g                                                    
  vg_root_lv_root     vg_root -wi-ao---- 50.00g                                                    
  vg_root_lv_swap     vg_root -wi-ao----  5.00g                                                    
  vg_root_lv_var      vg_root -wi-ao---- 50.00g                                                    

# dmsetup ls --tree
vg_app-vg2_lv_opt (253:2)
 └─ (201:112)
vg_app-vg2_lv_var_ericsson (253:4)
 └─ (201:112)
vg_root-vg_root_lv_var (253:5)
 └─ (201:82)
vg_root-vg_root_lv_swap (253:1)
 └─ (8:242)
vg_root-vg_root_lv_root (253:0)
 └─ (8:242)
vg_app-vg2_lv_etc (253:3)
 └─ (201:112)


brw-rw----.  1 root disk      8, 242 Jun 22 16:53 sdp2
lrwxrwxrwx.  1 root root   98 Jun 22 16:53 b8:242 -> /devices/pci0000:00/0000:00:03.0/0000:05:00.1/host2/rport-2:0-6/target2:0:0/2:0:0:0/block/sdp/sdp2

brw-------. 1 root root 201,  66 Jun 22 16:53 emc_clariion0_116s2


# rpm -qa | grep lvm2
lvm2-2.02.119-1.el6_6.0.bz1215288.x86_64
lvm2-libs-2.02.119-1.el6_6.0.bz1215288.x86_64

# rpm -qa | grep device-mapper
device-mapper-devel-1.02.88-1.el6_6.0.bz1215288.x86_64
device-mapper-persistent-data-0.3.2-1.el6.x86_64
device-mapper-event-1.02.88-1.el6_6.0.bz1215288.x86_64
device-mapper-event-libs-1.02.88-1.el6_6.0.bz1215288.x86_64
device-mapper-libs-1.02.88-1.el6_6.0.bz1215288.x86_64
device-mapper-event-devel-1.02.88-1.el6_6.0.bz1215288.x86_64
device-mapper-1.02.88-1.el6_6.0.bz1215288.x86_64



In both scenarios the root filesystem and the swap device in the root VG still map to the non-multipath devices and no warning is printed or any indication of such fact. In essence the behaviour is exactly the same as without the patch, the patch has not helped to show or prevent the mismatch between the dmsetup view and the pvs view.


Customer is willing to perform further testing if we provide another iteration of the patch.

Comment 10 David Teigland 2015-06-23 14:22:46 UTC
A warning would be printed if the new code was actually finding duplicates.
Are any filters being used?

Could you have them run:
pvs -a -o+uuid

Comment 11 Stan Saner 2015-06-23 15:08:34 UTC
I asked for the pvs -a -o+uuid output, but I guess pvs -a -v as collected in the SOSreport would have the same info. Here it is from the test after a fresh install with patched installation image:

  PV                              VG      Fmt  Attr PSize   PFree   DevSize PV UUID
  /dev/vx/dmp/disk_0                           ---       0       0  279.37g
  /dev/vx/dmp/emc_clariion0_116s1              ---       0       0  500.00m
  /dev/vx/dmp/emc_clariion0_116s2 vg_root lvm2 a--  169.51g  64.51g 169.51g FwdQ1G-58xs-8zYj-y2m6-bMqf-y5lD-cvme06
  /dev/vx/dmp/emc_clariion0_137   vg_app  lvm2 a--  305.00g 200.00g 305.00g PthTwG-2bof-9ZBM-qc7E-QQQF-qktA-DKvqYi
  /dev/vx/dmp/emc_clariion0_138                ---       0       0   25.00g
  /dev/vx/dmp/emc_clariion0_139                ---       0       0   25.00g
  /dev/vx/dmp/emc_clariion0_140s5              ---       0       0   32.12m
  /dev/vx/dmp/emc_clariion0_140s6              ---       0       0   44.92g
  /dev/vx/dmp/emc_clariion0_148                ---       0       0   60.00g
  /dev/vx/dmp/emc_clariion0_777                ---       0       0   60.00g


I will upload the SOSreport from the test performed as a fresh install with the patch being part of the install image.

Comment 13 David Teigland 2015-06-23 16:32:02 UTC
Here's the filter they are using:

global_filter = [ "r|^/dev/sd.*$|", "r|/dev/VxDMP.*|", "r|/dev/vx/dmpconfig|", "r|/dev/vx/rdmp/.*|", "r|/dev/dm-[0-9]*|", "r|/dev/mpath/mpath[0-9]*|", "r|/dev/mapper/mpath[0-9]*|", ]

It seems they are filtering out the duplicates, which is a fine, but it means that they won't get any warnings since lvm will not see any duplicates.

If they remove the filter, then they should see the duplicate PV warnings.

Comment 14 Stan Saner 2015-06-24 14:57:51 UTC
David, the filter settings on the customer system are exactly the same as when the original problem was detected. This Bug is about LVM commands reporting mutlipathed devices being used for certain volumes in the root VG, when in fact non-multipathed devices carry the traffic.


Customer placed LVM filters in initrd image and on the root FS. The filter from initrd was removed in order to replicate the fault when the case was open with Red Hat. The filter in root FS stayed. 

That is why the LVM binds to physical devices during the bootup process and when the system is up and running with LVM filters from root FS it shows that is now using the DMP devices which is not true.

Let me recap the expectations:

LVM commands should report either the non-multipathed device files are being used or print a warning that the devices actually in use by the kernel do not match the reported multipathed ones.

Comment 15 David Teigland 2015-06-24 15:45:27 UTC
lvm will print a warning if it sees two devices for the same PV. If one of those devices is filtered out, lvm will not see any duplicate and will have no reason to print a warning.  So, in comment 11, please name the two devices in the 'pvs -a' output which are duplicates.  Then, provide the output from the following command and name the two devices in the output which are duplicates:

pvs -a -o+uuid --config 'devices/global_filter=[ "a|.*/|" ]'

Comment 24 Alasdair Kergon 2016-03-17 20:25:05 UTC
It is fairly easy to reproduce artificially:

Create 3 devices.
Put 2 of them into a VG.
Create and activate an LV across both of the devices in the VG.

Now, edit lvm.conf filters to hide just one of the two devices, and use dd to clone the hidden device onto the 3rd device.

Run 'pvs' etc.  You'll see the tools consistently telling you that the 3rd device is used, not the 1st (now hidden) one, and nothing appears wrong.  But if you check with dmsetup or lsblk you'll see that actually it's still the 1st device that's being used.


Now with the patched code, you'll see messages like:

WARNING: Device inconsistency: Why is vg/lvol3_mlog using /dev/loop15 when its metadata uses /dev/loop3?

(precise wording still being discussed)

Comment 25 Alasdair Kergon 2016-03-17 20:36:22 UTC
The original report was similar to that, except instead of obtaining the 3rd device by using 'dd', the 1st device simply got wrapped up into a multipath device.  (So the 3rd device was a multipath device with the 1st device as one of its paths.)

Comment 26 Peter Rajnoha 2016-03-21 14:08:17 UTC
Patches:

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=03b0a786403ad1762bfbbe354756a9b83ee6629c

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=f231bdb20bdc885460dfc49db744147bb1bc90da

The WARNING message is:

WARNING: Device mismatch detected for <vg_name>/<lv_name> which is accessing <devA1>, <devA2>, ... instead of <devB1>, <devB2>...

Comment 29 Roman Bednář 2016-03-24 15:29:52 UTC
Using the reproducer from Comment #24 I found that the new warning message does not appear all the time while it should do so. This is caused by default RHEL6 setting in lvm.conf (obtain_device_list_from_udev=0) which makes lvm commands use .cache whenever possible thus bypassing the check of underlying PVs in some cases. See example below:

# dmsetup ls --tree
vg-testlv (253:2)
 ├─ (8:16)
 └─ (8:0)
vg_virt267-lv_swap (253:1)
 └─ (252:2)
vg_virt267-lv_root (253:0)
 └─ (252:2)

# lsblk
NAME                          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                           252:0    0  8.1G  0 disk 
├─vda1                        252:1    0  500M  0 part /boot
└─vda2                        252:2    0  7.6G  0 part 
  ├─vg_virt267-lv_root (dm-0) 253:0    0  6.8G  0 lvm  /
  └─vg_virt267-lv_swap (dm-1) 253:1    0  828M  0 lvm  [SWAP]
sdc                             8:32   0    1G  0 disk 
sdb                             8:16   0    1G  0 disk 
└─vg-testlv (dm-2)            253:2    0    1G  0 lvm  
sdd                             8:48   0    1G  0 disk 
sde                             8:64   0    1G  0 disk 
sda                             8:0    0    1G  0 disk 
└─vg-testlv (dm-2)            253:2    0    1G  0 lvm  


>>>Warning is shown as expected in 'pvs'
# pvs
  Found duplicate PV zAQkpWHrTh0Rx0v3N20QuU6kUS5XefVW: using /dev/sdc not /dev/sda
  Using duplicate PV /dev/sdc without holders, replacing /dev/sda
  WARNING: Device mismatch detected for vg/testlv which is accessing /dev/sda instead of /dev/sdc.
  PV         VG         Fmt  Attr PSize    PFree   
  /dev/sdb   vg         lvm2 a--u 1020.00m 1016.00m
  /dev/sdc   vg         lvm2 a--u 1020.00m       0 
  /dev/vda2  vg_virt267 lvm2 a--u    7.63g       0 

>>>Warning is missing in 'lvs'
# lvs
   Found duplicate PV zAQkpWHrTh0Rx0v3N20QuU6kUS5XefVW: using /dev/sdc not /dev/sda
   Using duplicate PV /dev/sdc without holders, replacing /dev/sda
   LV      VG         Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
   testlv  vg         -wi-a-----   1.00g                                                    
   lv_root vg_virt267 -wi-ao----   6.82g                                                    
   lv_swap vg_virt267 -wi-ao---- 828.00m 

======================================================================
2.6.32-634.el6.x86_64

lvm2-2.02.143-3.el6    BUILT: Tue Mar 22 15:26:10 CET 2016
lvm2-libs-2.02.143-3.el6    BUILT: Tue Mar 22 15:26:10 CET 2016
lvm2-cluster-2.02.143-3.el6    BUILT: Tue Mar 22 15:26:10 CET 2016
udev-147-2.72.el6    BUILT: Tue Mar  1 13:14:05 CET 2016
device-mapper-1.02.117-3.el6    BUILT: Tue Mar 22 15:26:10 CET 2016
device-mapper-libs-1.02.117-3.el6    BUILT: Tue Mar 22 15:26:10 CET 2016
device-mapper-event-1.02.117-3.el6    BUILT: Tue Mar 22 15:26:10 CET 2016
device-mapper-event-libs-1.02.117-3.el6    BUILT: Tue Mar 22 15:26:10 CET 2016
device-mapper-persistent-data-0.6.2-0.1.rc7.el6    BUILT: Tue Mar 22 14:58:09 CET 2016
cmirror-2.02.143-3.el6    BUILT: Tue Mar 22 15:26:10 CET 2016

Comment 30 Peter Rajnoha 2016-03-24 15:32:30 UTC
Yes, this is because the vgid/lvid index is not created if we're not scanning devices, but we're reading the persistent .cache file instead. We should fix this! (the index should be also created if we're reading .cache file)

Comment 35 Roman Bednář 2016-03-31 12:13:17 UTC
Display of the new warning message is still inconsistent in some cases.

See example below:
=============================================================
Keep the filter present:
# grep filter /etc/lvm/lvm.conf | grep -v "#"
	filter = [ "r|/dev/sdb|" ]

# pvs
  WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb instead of /dev/sdc.
  PV VG Fmt  Attr PSizePFree  
  /dev/sda   vg lvm2 a--u 1020.00m   0
  /dev/sdc   vg lvm2 a--u 1020.00m 1016.00m
  /dev/vda2  vg_virt010 lvm2 a--u7.63g   0
 
# lvs -o +devices
  LV  VG Attr   LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices
  lvol0   vg -wi-a-----   1.00g /dev/sda(0)
  lvol0   vg -wi-a-----   1.00g /dev/sdc(0)
  lv_root vg_virt010 -wi-ao----   6.82g /dev/vda2(0)  
  lv_swap vg_virt010 -wi-ao---- 828.00m /dev/vda2(1746)
 
# lvs
  LV  VG Attr   LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvol0   vg -wi-a-----   1.00g
  lv_root vg_virt010 -wi-ao----   6.82g
  lv_swap vg_virt010 -wi-ao---- 828.00m
 
# rm /etc/lvm/cache/.cache
rm: remove regular file `/etc/lvm/cache/.cache'? y
 
# lvs
  WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb instead of /dev/sdc.
  LV  VG Attr   LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvol0   vg -wi-a-----   1.00g
  lv_root vg_virt010 -wi-ao----   6.82g
  lv_swap vg_virt010 -wi-ao---- 828.00m  /dev/vda2(1746)

Missing message here:
# lvs
  LV  VG Attr   LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvol0   vg -wi-a-----   1.00g
  lv_root vg_virt010 -wi-ao----   6.82g
  lv_swap vg_virt010 -wi-ao---- 828.00m
 
=============================================================
2.6.32-634.el6.x86_64
 
lvm2-2.02.143-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016
lvm2-libs-2.02.143-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016
lvm2-cluster-2.02.143-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016
udev-147-2.72.el6BUILT: Tue Mar  1 13:14:05 CET 2016
device-mapper-1.02.117-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016
device-mapper-libs-1.02.117-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016
device-mapper-event-1.02.117-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016
device-mapper-event-libs-1.02.117-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016
device-mapper-persistent-data-0.6.2-0.1.rc7.el6BUILT: Tue Mar 22 14:58:09 CET 2016
cmirror-2.02.143-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016

Comment 36 Peter Rajnoha 2016-04-01 12:58:00 UTC
Should be fixed now with https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=15d1824facce1ac38c2669b17c8c0965b8c18f3e:

[0] fedora/~ # vgcreate vg /dev/sda
  Physical volume "/dev/sda" successfully created.
  Volume group "vg" successfully created

[0] fedora/~ # lvcreate -l1 vg
  Logical volume "lvol0" created.

[0] fedora/~ # dd if=/dev/sda of=/dev/sdb bs=1M
128+0 records in
128+0 records out
134217728 bytes (134 MB) copied, 0.789494 s, 170 MB/s

[0] fedora/~ # pvs                   
  Found duplicate PV 4S9oMTNhgKZJNVd1MfOCVDbPrmgPlOMe: using /dev/sdb not /dev/sda
  Using duplicate PV /dev/sdb without holders, replacing /dev/sda
  WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sda instead of /dev/sdb.
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/sdb   vg     lvm2 a--  124.00m 120.00m

[0] fedora/~ # lvs
  Found duplicate PV 4S9oMTNhgKZJNVd1MfOCVDbPrmgPlOMe: using /dev/sdb not /dev/sda
  Using duplicate PV /dev/sdb without holders, replacing /dev/sda
  WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sda instead of /dev/sdb.
  LV    VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvol0 vg     -wi-a-----   4.00m            

(now filtering out the /dev/sda which is actually used for the LV)

[0] fedora/~ # lvmconfig --type diff
global {
	use_lvmetad=0
}
devices {
	obtain_device_list_from_udev=0
	filter=["a|/dev/sdb|","r|.*|"]
}

[0] fedora/~ # pvs
  WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sda instead of /dev/sdb.
  PV         VG Fmt  Attr PSize   PFree  
  /dev/sdb   vg lvm2 a--  124.00m 120.00m

[0] fedora/~ # vgs
  WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sda instead of /dev/sdb.
  VG #PV #LV #SN Attr   VSize   VFree  
  vg   1   1   0 wz--n- 124.00m 120.00m

[0] fedora/~ # lvs
  WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sda instead of /dev/sdb.
  LV    VG Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvol0 vg -wi-a----- 4.00m

Comment 37 Peter Rajnoha 2016-04-01 13:16:58 UTC
Some explanation for the failure to detect the mismatch before:

RHEL6 uses obtain_device_list_from_udev=0 by default and that also means the /etc/lvm/cache/.cache file is used. This one contains devices which passed filters from any previous LVM command.

(In reply to Roman Bednář from comment #35)
> # pvs
>   WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb
> instead of /dev/sdc.
>   PV VG Fmt  Attr PSizePFree  
>   /dev/sda   vg lvm2 a--u 1020.00m   0
>   /dev/sdc   vg lvm2 a--u 1020.00m 1016.00m
>   /dev/vda2  vg_virt010 lvm2 a--u7.63g   0
>  

- pvs does full rescan and it does not rely on .cache file used. So all devices are processes and we see which device is exactly used by an LV when scanning devices.

> # lvs -o +devices
>   LV  VG Attr   LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
> Devices
>   lvol0   vg -wi-a-----   1.00g /dev/sda(0)
>   lvol0   vg -wi-a-----   1.00g /dev/sdc(0)
>   lv_root vg_virt010 -wi-ao----   6.82g /dev/vda2(0)  
>   lv_swap vg_virt010 -wi-ao---- 828.00m /dev/vda2(1746)
>  
> # lvs
>   LV  VG Attr   LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
>   lvol0   vg -wi-a-----   1.00g
>   lv_root vg_virt010 -wi-ao----   6.82g
>   lv_swap vg_virt010 -wi-ao---- 828.00m
>  

- lvs relies on .cache file and it takes the list of devices it finds there as the complete list, anything else is like if it didn't exist. So if we filtered out /dev/sdb, we just didn't see that there's lvol0 over sdb while scanning devices - because we scanned only the ones which are in the .cache file. And then we also didn't have a chance to detect the device mismatch.

> # rm /etc/lvm/cache/.cache
> rm: remove regular file `/etc/lvm/cache/.cache'? y
>  
> # lvs
>   WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb
> instead of /dev/sdc.
>   LV  VG Attr   LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
>   lvol0   vg -wi-a-----   1.00g
>   lv_root vg_virt010 -wi-ao----   6.82g
>   lv_swap vg_virt010 -wi-ao---- 828.00m  /dev/vda2(1746)

- by removing the .cache file, we do full rescan and so we see complete device list during device scan, including the sdb over which the lvol0 is mapped.

> 
> Missing message here:
> # lvs
>   LV  VG Attr   LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
>   lvol0   vg -wi-a-----   1.00g
>   lv_root vg_virt010 -wi-ao----   6.82g
>   lv_swap vg_virt010 -wi-ao---- 828.00m
>  

- and again, we're using the .cache file from previous lvm command - so we have saved filtering results. So again, we're hitting the problem.


To resolve this issue, we have to iterate over devices in sysfs to gather information about which device is under an LV in real if obtain_device_list_from_udev=0 and hence .cache file is used. We use this complete information then for the device mismatch detection.

In summary, when we look up which devices are ACTUALLY used by an LV (the info we gather while building up device cache during device scan), we need FULL list of devices which is unfiltered.

Comment 39 Roman Bednář 2016-04-01 14:18:39 UTC
Marking as verified using the same reproducer as mentioned above.
Warning message now appears always regardless of filter and usage of cache.

========================================================================
Filter out the duplicated device(also tested without filter):
# grep filter /etc/lvm/lvm.conf 
	...
	filter = [ "r|/dev/sdb|" ]
	...

# pvs
  Found duplicate PV KuN2MpmGB2QB4CeEPL2y2xHiCnNnwj7J: using /dev/sdc not /dev/sdb
  Using duplicate PV /dev/sdc without holders, replacing /dev/sdb
>>>WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb instead of /dev/sdc.
  PV         VG         Fmt  Attr PSize    PFree   
  /dev/sda   vg         lvm2 a--u 1020.00m       0 
  /dev/sdc   vg         lvm2 a--u 1020.00m 1016.00m
  ...

# rm /etc/lvm/cache/.cache 
rm: remove regular file `/etc/lvm/cache/.cache'? y

# lvs
  Found duplicate PV KuN2MpmGB2QB4CeEPL2y2xHiCnNnwj7J: using /dev/sdc not /dev/sdb
  Using duplicate PV /dev/sdc without holders, replacing /dev/sdb
>>>WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb instead of /dev/sdc.
  LV      VG         Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvol0   vg         -wi-a-----   1.00g                                                    
  ...                                                

# lvs
  Found duplicate PV KuN2MpmGB2QB4CeEPL2y2xHiCnNnwj7J: using /dev/sdc not /dev/sdb
  Using duplicate PV /dev/sdc without holders, replacing /dev/sdb
>>>WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb instead of /dev/sdc.
  LV      VG         Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvol0   vg         -wi-a-----   1.00g                                                    
  ...      


========================================================================
Tested on:
2.6.32-634.el6.x86_64

lvm2-2.02.143-6.el6    BUILT: Fri Apr  1 15:13:37 CEST 2016
lvm2-libs-2.02.143-6.el6    BUILT: Fri Apr  1 15:13:37 CEST 2016
lvm2-cluster-2.02.143-6.el6    BUILT: Fri Apr  1 15:13:37 CEST 2016
udev-147-2.72.el6    BUILT: Tue Mar  1 13:14:05 CET 2016
device-mapper-1.02.117-6.el6    BUILT: Fri Apr  1 15:13:37 CEST 2016
device-mapper-libs-1.02.117-6.el6    BUILT: Fri Apr  1 15:13:37 CEST 2016
device-mapper-event-1.02.117-6.el6    BUILT: Fri Apr  1 15:13:37 CEST 2016
device-mapper-event-libs-1.02.117-6.el6    BUILT: Fri Apr  1 15:13:37 CEST 2016
device-mapper-persistent-data-0.6.2-0.1.rc7.el6    BUILT: Tue Mar 22 14:58:09 CET 2016
cmirror-2.02.143-6.el6    BUILT: Fri Apr  1 15:13:37 CEST 2016

Comment 41 errata-xmlrpc 2016-05-11 01:16:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0964.html