Bug 1215228
| Summary: | LVM (vgdisplay) does not show the true path hierarchy of underlying PVs | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Stan Saner <ssaner> | 
| Component: | lvm2 | Assignee: | Alasdair Kergon <agk> | 
| lvm2 sub component: | Devices, Filtering and Stacking (RHEL6) | QA Contact: | cluster-qe <cluster-qe> | 
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, prockai, rbednar, robert.x.tomczyk, salmy, teigland, tlavigne, zkabelac | 
| Version: | 6.6 | ||
| Target Milestone: | rc | ||
| Target Release: | 6.7 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | lvm2-2.02.143-6.el6 | Doc Type: | Bug Fix | 
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-05-11 01:16:40 UTC | Type: | Bug | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1172231, 1268411 | ||
I'm in the middle of working a patch in this area, and we should at a minimum be able to report warnings from the lvm commands that fully describe the situation. If or how we could enhance the formal output of the reporting/display commands is a more difficult question that will take more time to sort out. (In reply to David Teigland from comment #2) > I'm in the middle of working a patch in this area, and we should at a > minimum be able to report warnings from the lvm commands that fully describe > the situation. > > If or how we could enhance the formal output of the reporting/display > commands is a more difficult question that will take more time to sort out. Hi David, Customer has been extremely cooperative and said they would be willing to test at least the initial simpler fix where the warnings are printed from the lvm commands. I know it is rather unrealistic to expect the output enhancement of the display commands at this stage as the team needs to discuss how to approach that. But if you have the simpler implementation ready, please share. The customer may still have the reproduction environment available, but it may need to be reused for other purposes soon. This is our chance to have it tested. Thanks and #regards, Stan Saner I'd suggest using the last tagged release, which right now is 2.02.119
commit bee2df3903d0956ba2e09ce9ae9ae55dfc5d3fd1
Author: Alasdair G Kergon <agk>
Date:   Sat May 2 01:41:17 2015 +0100
    pre-release
    (In reply to David Teigland from comment #4) > I'd suggest using the last tagged release, which right now is 2.02.119 If providing any build of this, make it clear that this is NOT a supported release and must ONLY be installed on test machines that will be reinstalled after testing. The customer testing of the patch created from the tagged release 2.02.119 (I branched off and extracted the relevant bits, see commit 64ba86e61c59a9f214db1d74ec311fef5400e299 ) did not provide the expected result. The testing was performed under 2 scenarios: 1. install the patch on a system with the existing problem and see the effect # pvs PV VG Fmt Attr PSize PFree /dev/vx/dmp/emc_clariion0_116s2 vg_root lvm2 a-- 169.51g 64.51g /dev/vx/dmp/emc_clariion0_137 vg_app lvm2 a-- 305.00g 200.00g # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert vg2_lv_etc vg_app -wi-ao---- 5.00g vg2_lv_opt vg_app -wi-ao---- 20.00g vg2_lv_var_ericsson vg_app -wi-ao---- 80.00g vg_root_lv_root vg_root -wi-ao---- 50.00g vg_root_lv_swap vg_root -wi-ao---- 5.00g vg_root_lv_var vg_root -wi-ao---- 50.00g # dmsetup ls --tree vg_app-vg2_lv_opt (253:2) └─ (201:112) vg_app-vg2_lv_var_ericsson (253:4) └─ (201:112) vg_root-vg_root_lv_var (253:5) └─ (201:82) vg_root-vg_root_lv_swap (253:1) └─ (8:242) vg_root-vg_root_lv_root (253:0) └─ (8:242) vg_app-vg2_lv_etc (253:3) └─ (201:112) brw-rw----. 1 root disk 8, 242 Jun 22 10:54 sdp2 lrwxrwxrwx. 1 root root 98 Jun 22 10:54 b8:242 -> /devices/pci0000:00/0000:00:03.0/0000:05:00.1/host2/rport-2:0-6/target2:0:0/2:0:0:0/block/sdp/sdp2 brw-------. 1 root root 201, 82 Jun 22 10:54 emc_clariion0_116s2 # rpm -qa | grep lvm2 lvm2-2.02.119-1.el6_6.0.bz1215288.x86_64 lvm2-libs-2.02.119-1.el6_6.0.bz1215288.x86_64 # rpm -qa | grep device-mapper device-mapper-devel-1.02.88-1.el6_6.0.bz1215288.x86_64 device-mapper-persistent-data-0.3.2-1.el6.x86_64 device-mapper-event-1.02.88-1.el6_6.0.bz1215288.x86_64 device-mapper-event-libs-1.02.88-1.el6_6.0.bz1215288.x86_64 device-mapper-libs-1.02.88-1.el6_6.0.bz1215288.x86_64 device-mapper-event-devel-1.02.88-1.el6_6.0.bz1215288.x86_64 device-mapper-1.02.88-1.el6_6.0.bz1215288.x86_64 2. include the patch in the install image and perform the fresh install # pvs PV VG Fmt Attr PSize PFree /dev/vx/dmp/emc_clariion0_116s2 vg_root lvm2 a-- 169.51g 64.51g /dev/vx/dmp/emc_clariion0_137 vg_app lvm2 a-- 305.00g 200.00g # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert vg2_lv_etc vg_app -wi-ao---- 5.00g vg2_lv_opt vg_app -wi-ao---- 20.00g vg2_lv_var_ericsson vg_app -wi-ao---- 80.00g vg_root_lv_root vg_root -wi-ao---- 50.00g vg_root_lv_swap vg_root -wi-ao---- 5.00g vg_root_lv_var vg_root -wi-ao---- 50.00g # dmsetup ls --tree vg_app-vg2_lv_opt (253:2) └─ (201:112) vg_app-vg2_lv_var_ericsson (253:4) └─ (201:112) vg_root-vg_root_lv_var (253:5) └─ (201:82) vg_root-vg_root_lv_swap (253:1) └─ (8:242) vg_root-vg_root_lv_root (253:0) └─ (8:242) vg_app-vg2_lv_etc (253:3) └─ (201:112) brw-rw----. 1 root disk 8, 242 Jun 22 16:53 sdp2 lrwxrwxrwx. 1 root root 98 Jun 22 16:53 b8:242 -> /devices/pci0000:00/0000:00:03.0/0000:05:00.1/host2/rport-2:0-6/target2:0:0/2:0:0:0/block/sdp/sdp2 brw-------. 1 root root 201, 66 Jun 22 16:53 emc_clariion0_116s2 # rpm -qa | grep lvm2 lvm2-2.02.119-1.el6_6.0.bz1215288.x86_64 lvm2-libs-2.02.119-1.el6_6.0.bz1215288.x86_64 # rpm -qa | grep device-mapper device-mapper-devel-1.02.88-1.el6_6.0.bz1215288.x86_64 device-mapper-persistent-data-0.3.2-1.el6.x86_64 device-mapper-event-1.02.88-1.el6_6.0.bz1215288.x86_64 device-mapper-event-libs-1.02.88-1.el6_6.0.bz1215288.x86_64 device-mapper-libs-1.02.88-1.el6_6.0.bz1215288.x86_64 device-mapper-event-devel-1.02.88-1.el6_6.0.bz1215288.x86_64 device-mapper-1.02.88-1.el6_6.0.bz1215288.x86_64 In both scenarios the root filesystem and the swap device in the root VG still map to the non-multipath devices and no warning is printed or any indication of such fact. In essence the behaviour is exactly the same as without the patch, the patch has not helped to show or prevent the mismatch between the dmsetup view and the pvs view. Customer is willing to perform further testing if we provide another iteration of the patch. A warning would be printed if the new code was actually finding duplicates. Are any filters being used? Could you have them run: pvs -a -o+uuid I asked for the pvs -a -o+uuid output, but I guess pvs -a -v as collected in the SOSreport would have the same info. Here it is from the test after a fresh install with patched installation image: PV VG Fmt Attr PSize PFree DevSize PV UUID /dev/vx/dmp/disk_0 --- 0 0 279.37g /dev/vx/dmp/emc_clariion0_116s1 --- 0 0 500.00m /dev/vx/dmp/emc_clariion0_116s2 vg_root lvm2 a-- 169.51g 64.51g 169.51g FwdQ1G-58xs-8zYj-y2m6-bMqf-y5lD-cvme06 /dev/vx/dmp/emc_clariion0_137 vg_app lvm2 a-- 305.00g 200.00g 305.00g PthTwG-2bof-9ZBM-qc7E-QQQF-qktA-DKvqYi /dev/vx/dmp/emc_clariion0_138 --- 0 0 25.00g /dev/vx/dmp/emc_clariion0_139 --- 0 0 25.00g /dev/vx/dmp/emc_clariion0_140s5 --- 0 0 32.12m /dev/vx/dmp/emc_clariion0_140s6 --- 0 0 44.92g /dev/vx/dmp/emc_clariion0_148 --- 0 0 60.00g /dev/vx/dmp/emc_clariion0_777 --- 0 0 60.00g I will upload the SOSreport from the test performed as a fresh install with the patch being part of the install image. Here's the filter they are using: global_filter = [ "r|^/dev/sd.*$|", "r|/dev/VxDMP.*|", "r|/dev/vx/dmpconfig|", "r|/dev/vx/rdmp/.*|", "r|/dev/dm-[0-9]*|", "r|/dev/mpath/mpath[0-9]*|", "r|/dev/mapper/mpath[0-9]*|", ] It seems they are filtering out the duplicates, which is a fine, but it means that they won't get any warnings since lvm will not see any duplicates. If they remove the filter, then they should see the duplicate PV warnings. David, the filter settings on the customer system are exactly the same as when the original problem was detected. This Bug is about LVM commands reporting mutlipathed devices being used for certain volumes in the root VG, when in fact non-multipathed devices carry the traffic. Customer placed LVM filters in initrd image and on the root FS. The filter from initrd was removed in order to replicate the fault when the case was open with Red Hat. The filter in root FS stayed. That is why the LVM binds to physical devices during the bootup process and when the system is up and running with LVM filters from root FS it shows that is now using the DMP devices which is not true. Let me recap the expectations: LVM commands should report either the non-multipathed device files are being used or print a warning that the devices actually in use by the kernel do not match the reported multipathed ones. lvm will print a warning if it sees two devices for the same PV. If one of those devices is filtered out, lvm will not see any duplicate and will have no reason to print a warning. So, in comment 11, please name the two devices in the 'pvs -a' output which are duplicates. Then, provide the output from the following command and name the two devices in the output which are duplicates: pvs -a -o+uuid --config 'devices/global_filter=[ "a|.*/|" ]' It is fairly easy to reproduce artificially: Create 3 devices. Put 2 of them into a VG. Create and activate an LV across both of the devices in the VG. Now, edit lvm.conf filters to hide just one of the two devices, and use dd to clone the hidden device onto the 3rd device. Run 'pvs' etc. You'll see the tools consistently telling you that the 3rd device is used, not the 1st (now hidden) one, and nothing appears wrong. But if you check with dmsetup or lsblk you'll see that actually it's still the 1st device that's being used. Now with the patched code, you'll see messages like: WARNING: Device inconsistency: Why is vg/lvol3_mlog using /dev/loop15 when its metadata uses /dev/loop3? (precise wording still being discussed) The original report was similar to that, except instead of obtaining the 3rd device by using 'dd', the 1st device simply got wrapped up into a multipath device. (So the 3rd device was a multipath device with the 1st device as one of its paths.) Patches: https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=03b0a786403ad1762bfbbe354756a9b83ee6629c https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=f231bdb20bdc885460dfc49db744147bb1bc90da The WARNING message is: WARNING: Device mismatch detected for <vg_name>/<lv_name> which is accessing <devA1>, <devA2>, ... instead of <devB1>, <devB2>... Using the reproducer from Comment #24 I found that the new warning message does not appear all the time while it should do so. This is caused by default RHEL6 setting in lvm.conf (obtain_device_list_from_udev=0) which makes lvm commands use .cache whenever possible thus bypassing the check of underlying PVs in some cases. See example below: # dmsetup ls --tree vg-testlv (253:2) ├─ (8:16) └─ (8:0) vg_virt267-lv_swap (253:1) └─ (252:2) vg_virt267-lv_root (253:0) └─ (252:2) # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 252:0 0 8.1G 0 disk ├─vda1 252:1 0 500M 0 part /boot └─vda2 252:2 0 7.6G 0 part ├─vg_virt267-lv_root (dm-0) 253:0 0 6.8G 0 lvm / └─vg_virt267-lv_swap (dm-1) 253:1 0 828M 0 lvm [SWAP] sdc 8:32 0 1G 0 disk sdb 8:16 0 1G 0 disk └─vg-testlv (dm-2) 253:2 0 1G 0 lvm sdd 8:48 0 1G 0 disk sde 8:64 0 1G 0 disk sda 8:0 0 1G 0 disk └─vg-testlv (dm-2) 253:2 0 1G 0 lvm >>>Warning is shown as expected in 'pvs' # pvs Found duplicate PV zAQkpWHrTh0Rx0v3N20QuU6kUS5XefVW: using /dev/sdc not /dev/sda Using duplicate PV /dev/sdc without holders, replacing /dev/sda WARNING: Device mismatch detected for vg/testlv which is accessing /dev/sda instead of /dev/sdc. PV VG Fmt Attr PSize PFree /dev/sdb vg lvm2 a--u 1020.00m 1016.00m /dev/sdc vg lvm2 a--u 1020.00m 0 /dev/vda2 vg_virt267 lvm2 a--u 7.63g 0 >>>Warning is missing in 'lvs' # lvs Found duplicate PV zAQkpWHrTh0Rx0v3N20QuU6kUS5XefVW: using /dev/sdc not /dev/sda Using duplicate PV /dev/sdc without holders, replacing /dev/sda LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert testlv vg -wi-a----- 1.00g lv_root vg_virt267 -wi-ao---- 6.82g lv_swap vg_virt267 -wi-ao---- 828.00m ====================================================================== 2.6.32-634.el6.x86_64 lvm2-2.02.143-3.el6 BUILT: Tue Mar 22 15:26:10 CET 2016 lvm2-libs-2.02.143-3.el6 BUILT: Tue Mar 22 15:26:10 CET 2016 lvm2-cluster-2.02.143-3.el6 BUILT: Tue Mar 22 15:26:10 CET 2016 udev-147-2.72.el6 BUILT: Tue Mar 1 13:14:05 CET 2016 device-mapper-1.02.117-3.el6 BUILT: Tue Mar 22 15:26:10 CET 2016 device-mapper-libs-1.02.117-3.el6 BUILT: Tue Mar 22 15:26:10 CET 2016 device-mapper-event-1.02.117-3.el6 BUILT: Tue Mar 22 15:26:10 CET 2016 device-mapper-event-libs-1.02.117-3.el6 BUILT: Tue Mar 22 15:26:10 CET 2016 device-mapper-persistent-data-0.6.2-0.1.rc7.el6 BUILT: Tue Mar 22 14:58:09 CET 2016 cmirror-2.02.143-3.el6 BUILT: Tue Mar 22 15:26:10 CET 2016 Yes, this is because the vgid/lvid index is not created if we're not scanning devices, but we're reading the persistent .cache file instead. We should fix this! (the index should be also created if we're reading .cache file) Additional fixes are upstream now: https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=91bb202ded059a4109ff4351825c77c1fcf9197b https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=8c27c5274980dddf64283602bc23b89a5623da0a https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=52e0d0db4460d90172e9bd45b9ef30e7f4f75ae7 Display of the new warning message is still inconsistent in some cases. See example below: ============================================================= Keep the filter present: # grep filter /etc/lvm/lvm.conf | grep -v "#" filter = [ "r|/dev/sdb|" ] # pvs WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb instead of /dev/sdc. PV VG Fmt Attr PSizePFree /dev/sda vg lvm2 a--u 1020.00m 0 /dev/sdc vg lvm2 a--u 1020.00m 1016.00m /dev/vda2 vg_virt010 lvm2 a--u7.63g 0 # lvs -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices lvol0 vg -wi-a----- 1.00g /dev/sda(0) lvol0 vg -wi-a----- 1.00g /dev/sdc(0) lv_root vg_virt010 -wi-ao---- 6.82g /dev/vda2(0) lv_swap vg_virt010 -wi-ao---- 828.00m /dev/vda2(1746) # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lvol0 vg -wi-a----- 1.00g lv_root vg_virt010 -wi-ao---- 6.82g lv_swap vg_virt010 -wi-ao---- 828.00m # rm /etc/lvm/cache/.cache rm: remove regular file `/etc/lvm/cache/.cache'? y # lvs WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb instead of /dev/sdc. LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lvol0 vg -wi-a----- 1.00g lv_root vg_virt010 -wi-ao---- 6.82g lv_swap vg_virt010 -wi-ao---- 828.00m /dev/vda2(1746) Missing message here: # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lvol0 vg -wi-a----- 1.00g lv_root vg_virt010 -wi-ao---- 6.82g lv_swap vg_virt010 -wi-ao---- 828.00m ============================================================= 2.6.32-634.el6.x86_64 lvm2-2.02.143-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016 lvm2-libs-2.02.143-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016 lvm2-cluster-2.02.143-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016 udev-147-2.72.el6BUILT: Tue Mar 1 13:14:05 CET 2016 device-mapper-1.02.117-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016 device-mapper-libs-1.02.117-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016 device-mapper-event-1.02.117-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016 device-mapper-event-libs-1.02.117-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016 device-mapper-persistent-data-0.6.2-0.1.rc7.el6BUILT: Tue Mar 22 14:58:09 CET 2016 cmirror-2.02.143-5.el6BUILT: Wed Mar 30 16:16:24 CEST 2016 Should be fixed now with https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=15d1824facce1ac38c2669b17c8c0965b8c18f3e: [0] fedora/~ # vgcreate vg /dev/sda Physical volume "/dev/sda" successfully created. Volume group "vg" successfully created [0] fedora/~ # lvcreate -l1 vg Logical volume "lvol0" created. [0] fedora/~ # dd if=/dev/sda of=/dev/sdb bs=1M 128+0 records in 128+0 records out 134217728 bytes (134 MB) copied, 0.789494 s, 170 MB/s [0] fedora/~ # pvs Found duplicate PV 4S9oMTNhgKZJNVd1MfOCVDbPrmgPlOMe: using /dev/sdb not /dev/sda Using duplicate PV /dev/sdb without holders, replacing /dev/sda WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sda instead of /dev/sdb. PV VG Fmt Attr PSize PFree /dev/sdb vg lvm2 a-- 124.00m 120.00m [0] fedora/~ # lvs Found duplicate PV 4S9oMTNhgKZJNVd1MfOCVDbPrmgPlOMe: using /dev/sdb not /dev/sda Using duplicate PV /dev/sdb without holders, replacing /dev/sda WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sda instead of /dev/sdb. LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lvol0 vg -wi-a----- 4.00m (now filtering out the /dev/sda which is actually used for the LV) [0] fedora/~ # lvmconfig --type diff global { use_lvmetad=0 } devices { obtain_device_list_from_udev=0 filter=["a|/dev/sdb|","r|.*|"] } [0] fedora/~ # pvs WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sda instead of /dev/sdb. PV VG Fmt Attr PSize PFree /dev/sdb vg lvm2 a-- 124.00m 120.00m [0] fedora/~ # vgs WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sda instead of /dev/sdb. VG #PV #LV #SN Attr VSize VFree vg 1 1 0 wz--n- 124.00m 120.00m [0] fedora/~ # lvs WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sda instead of /dev/sdb. LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lvol0 vg -wi-a----- 4.00m Some explanation for the failure to detect the mismatch before: RHEL6 uses obtain_device_list_from_udev=0 by default and that also means the /etc/lvm/cache/.cache file is used. This one contains devices which passed filters from any previous LVM command. (In reply to Roman Bednář from comment #35) > # pvs > WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb > instead of /dev/sdc. > PV VG Fmt Attr PSizePFree > /dev/sda vg lvm2 a--u 1020.00m 0 > /dev/sdc vg lvm2 a--u 1020.00m 1016.00m > /dev/vda2 vg_virt010 lvm2 a--u7.63g 0 > - pvs does full rescan and it does not rely on .cache file used. So all devices are processes and we see which device is exactly used by an LV when scanning devices. > # lvs -o +devices > LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert > Devices > lvol0 vg -wi-a----- 1.00g /dev/sda(0) > lvol0 vg -wi-a----- 1.00g /dev/sdc(0) > lv_root vg_virt010 -wi-ao---- 6.82g /dev/vda2(0) > lv_swap vg_virt010 -wi-ao---- 828.00m /dev/vda2(1746) > > # lvs > LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert > lvol0 vg -wi-a----- 1.00g > lv_root vg_virt010 -wi-ao---- 6.82g > lv_swap vg_virt010 -wi-ao---- 828.00m > - lvs relies on .cache file and it takes the list of devices it finds there as the complete list, anything else is like if it didn't exist. So if we filtered out /dev/sdb, we just didn't see that there's lvol0 over sdb while scanning devices - because we scanned only the ones which are in the .cache file. And then we also didn't have a chance to detect the device mismatch. > # rm /etc/lvm/cache/.cache > rm: remove regular file `/etc/lvm/cache/.cache'? y > > # lvs > WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb > instead of /dev/sdc. > LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert > lvol0 vg -wi-a----- 1.00g > lv_root vg_virt010 -wi-ao---- 6.82g > lv_swap vg_virt010 -wi-ao---- 828.00m /dev/vda2(1746) - by removing the .cache file, we do full rescan and so we see complete device list during device scan, including the sdb over which the lvol0 is mapped. > > Missing message here: > # lvs > LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert > lvol0 vg -wi-a----- 1.00g > lv_root vg_virt010 -wi-ao---- 6.82g > lv_swap vg_virt010 -wi-ao---- 828.00m > - and again, we're using the .cache file from previous lvm command - so we have saved filtering results. So again, we're hitting the problem. To resolve this issue, we have to iterate over devices in sysfs to gather information about which device is under an LV in real if obtain_device_list_from_udev=0 and hence .cache file is used. We use this complete information then for the device mismatch detection. In summary, when we look up which devices are ACTUALLY used by an LV (the info we gather while building up device cache during device scan), we need FULL list of devices which is unfiltered. Marking as verified using the same reproducer as mentioned above. Warning message now appears always regardless of filter and usage of cache. ======================================================================== Filter out the duplicated device(also tested without filter): # grep filter /etc/lvm/lvm.conf ... filter = [ "r|/dev/sdb|" ] ... # pvs Found duplicate PV KuN2MpmGB2QB4CeEPL2y2xHiCnNnwj7J: using /dev/sdc not /dev/sdb Using duplicate PV /dev/sdc without holders, replacing /dev/sdb >>>WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb instead of /dev/sdc. PV VG Fmt Attr PSize PFree /dev/sda vg lvm2 a--u 1020.00m 0 /dev/sdc vg lvm2 a--u 1020.00m 1016.00m ... # rm /etc/lvm/cache/.cache rm: remove regular file `/etc/lvm/cache/.cache'? y # lvs Found duplicate PV KuN2MpmGB2QB4CeEPL2y2xHiCnNnwj7J: using /dev/sdc not /dev/sdb Using duplicate PV /dev/sdc without holders, replacing /dev/sdb >>>WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb instead of /dev/sdc. LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lvol0 vg -wi-a----- 1.00g ... # lvs Found duplicate PV KuN2MpmGB2QB4CeEPL2y2xHiCnNnwj7J: using /dev/sdc not /dev/sdb Using duplicate PV /dev/sdc without holders, replacing /dev/sdb >>>WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdb instead of /dev/sdc. LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lvol0 vg -wi-a----- 1.00g ... ======================================================================== Tested on: 2.6.32-634.el6.x86_64 lvm2-2.02.143-6.el6 BUILT: Fri Apr 1 15:13:37 CEST 2016 lvm2-libs-2.02.143-6.el6 BUILT: Fri Apr 1 15:13:37 CEST 2016 lvm2-cluster-2.02.143-6.el6 BUILT: Fri Apr 1 15:13:37 CEST 2016 udev-147-2.72.el6 BUILT: Tue Mar 1 13:14:05 CET 2016 device-mapper-1.02.117-6.el6 BUILT: Fri Apr 1 15:13:37 CEST 2016 device-mapper-libs-1.02.117-6.el6 BUILT: Fri Apr 1 15:13:37 CEST 2016 device-mapper-event-1.02.117-6.el6 BUILT: Fri Apr 1 15:13:37 CEST 2016 device-mapper-event-libs-1.02.117-6.el6 BUILT: Fri Apr 1 15:13:37 CEST 2016 device-mapper-persistent-data-0.6.2-0.1.rc7.el6 BUILT: Tue Mar 22 14:58:09 CET 2016 cmirror-2.02.143-6.el6 BUILT: Fri Apr 1 15:13:37 CEST 2016 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0964.html  | 
Description of problem: ----------------------- LVM (vgdisplay) does not show the true path hierarchy of underlying PVs if LVM detects storage via non-multipathed device files rather than multipathed ones during boot. Version-Release number of selected component (if applicable): RHEL 6 How reproducible: ----------------- Reproducible usually with 3rd party multipathing products like Veritas VxDMP or EMC Powerpath are not activated early enough during boot to allow LVM to detect multipathed storage hosting root filesystem and primary swap, or when lvmconf filtering in initrd causes detection of non-multipathed devices in preference to multipathed ones. Later when the system is fully booted, multipathing fully activated, vgdisplay will report multipathed devices, because it scans devices to generate its output Steps to Reproduce: ------------------- 1. Install and configure Symantec/Veritas Storage Foundation SFHA version 6.1 or 6.2 2. Configure VxDMP to control the multipathing to the physical volumes hosting root FS and primary swap Actual results: --------------- Example from the customer configuration: dmsetup reports for the vg1_root and vg1_swap logical volumes the mapping as not using multipathed devices vg_root-vg1_swap (253:4) `- (8:209) ---> /dev/sdn1 vg_root-vg1_root (253:1) `-vg_root-vg1_root-real (253:0) `- (8:209) ---> /dev/sdn1 Note the major:minor number 8:209 corresponds to brw-rw----. 1 root disk 8, 209 Apr 9 14:21 sdn1 Other logical volumes in the _same_ root volume group have the expected mapping with multipathed devices vg_root-vg1_VG1_FS1 (253:6) `-vg_root-vg1_VG1_FS1-real (253:5) `- (201:177) ---> emc_clariion0_12s1 vg_root-vg1_VG1_FS0 (253:10) `-vg_root-vg1_VG1_FS0-real (253:9) `- (201:177) vg_root-litp_vg1_VG1_FS1_snapshot (253:8) |-vg_root-litp_vg1_VG1_FS1_snapshot-cow (253:7) | `- (201:177) `-vg_root-vg1_VG1_FS1-real (253:5) `- (201:177) vg_root-litp_vg1_VG1_FS0_snapshot (253:12) |-vg_root-litp_vg1_VG1_FS0_snapshot-cow (253:11) | `- (201:177) `-vg_root-vg1_VG1_FS0-real (253:9) `- (201:177) vgdisplay -vv reports physical volumes for the whole volume group and does not differentiate when some of the logical volumes use different PVs. It reports multipathed devices are in use. --- Logical volume --- LV Path /dev/vg_root/vg1_root LV Name vg1_root VG Name vg_root LV Path /dev/vg_root/vg1_swap LV Name vg1_swap VG Name vg_root VG Name vg_root --- Physical volumes --- PV Name /dev/vx/dmp/emc_clariion0_12s1 PV Name /dev/vx/dmp/emc_clariion0_43s1 PV Name /dev/vx/dmp/emc_clariion0_3s2 The VxDMP device files are brw-------. 1 root root 201, 177 Apr 9 14:20 emc_clariion0_12s1 brw-------. 1 root root 201, 81 Apr 9 14:20 emc_clariion0_43s1 brw-------. 1 root root 201, 98 Apr 9 14:20 emc_clariion0_3s2 Expected results: ----------------- vgdisplay should report either the non-multipathed device files are being used or print a warning that the devices actually in use by the kernel do not match the reported multipathed ones. Additional info: ---------------- I suppose following questions need answering before deciding how to address this problem: 1. In customer system only root FS and primary swap had the LVM -> /dev/sd* mapping. The volumes in the root VG activated later during boot when multipathing layers are fully active had correct mappings to multipathed devices vgdisplay reports only one set of PVs that applies to all logical volumes. Which ones should it report, the ones used by root FS and primary swap or the rest that uses multipathed devices? We should perhaps get at least some warning that some volumes use non-multipathed devices. 2. Is there a potential for memory corruption perhaps due to caching when some lvols go directly to /dev/sd* and others go through multipathing layers? 3. Could we perhaps implement an extra option for vgdisplay that would report the mappings as dmsetup reports it? Veritas is addressing their problem by ensuring the correct filters are in place in the lvm.conf and activating the root VG volumes via the multipathed device files, so that should prevent from the problem in the first place. However there is still a potential exposure to such problem with other 3rd party multipathing solutions, so vgdisplay / LVM code needs fortification.