Bug 1330646 - Duplicate PVs found in DM-Multipath
Summary: Duplicate PVs found in DM-Multipath
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: David Teigland
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1458346
Blocks: 1385242
TreeView+ depends on / blocked
 
Reported: 2016-04-26 16:22 UTC by jblume@redhat.com
Modified: 2021-09-03 12:52 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-19 13:36:04 UTC
Target Upstream Version:


Attachments (Terms of Use)
PVS debug output (113.00 KB, text/plain)
2016-04-28 08:36 UTC, jblume@redhat.com
no flags Details
Second PVS debug output (93.19 KB, text/plain)
2016-04-28 08:37 UTC, jblume@redhat.com
no flags Details

Description jblume@redhat.com 2016-04-26 16:22:12 UTC
Description of problem:
A customer has several Netapp-LUNs attached to his 8-socket machine. He grouped the LUNs using DM-Multipath as follows:

- All LUNs are active and multipathd is running. When trying to list all 
  defined physical volumes with # pvs I get all the PVs on the mpath-devices
  plus the following output:
  WARNING: duplicate PV <long string> is being used from both devices \
  /dev/mapper/mpathX and /dev/sdX
- When trying to apply a filter in /etc/lvm/lvm.conf with is accepted as best 
  practice with
  filter [ "a|/dev/mapper/mpath.*|", "r|.*|" ]
  When calling # pvs again, besides the PVs 
  on the multipathed devices there is a new error message:
  Duplicate of PV <long string> dev /dev/mapper/mpathX exists on unknown\
  device <major>:<minor>
- The output of # multipath -ll seems to be correct.

Version-Release number of selected component (if applicable):
- RHEL7.2

How reproducible:
Import Netapp-LUNs into your system and group them with DM-Multipath

Steps to Reproduce:
1. Import Netapp-LUNs into your system and group them with DM-Multipath
2. Ensure they that they are available with # lsblk 
3. Modify your /etc/lvm/lvm.conf by adding the filter
4. Run # pvs

Actual results:
Getting the error message that duplicate PVs were found on unknown underlying devices

Expected results:
Just the multipathed devices are touched to detect the physical volumes

Additional info:
- Checked out https://access.redhat.com/solutions/39566.
  Made sure that /etc/multipath/bindings file is in sync with
  # multipath -ll | grep mpath | sort

Comment 1 David Teigland 2016-04-26 16:30:45 UTC
Does lvm.conf have multipath_component_detection set to 1?  That should make lvm ignore the multipath component devices.

Comment 3 jblume@redhat.com 2016-04-27 15:04:35 UTC
(In reply to David Teigland from comment #1)
> Does lvm.conf have multipath_component_detection set to 1?  That should make
> lvm ignore the multipath component devices.

Hi David,

I checked that out and I can confirm. multipath_component_detection set to 1.

Comment 4 David Teigland 2016-04-27 15:19:14 UTC
Thanks, so the multipath detection isn't working for some reason.  We'll need to look at debug output to figure out why.  Please attach the output of both:

pvs -vvvv --config devices/external_device_info_source=none |& tee pvs.1
pvs -vvvv --config devices/external_device_info_source=udev |& tee pvs.2

Comment 5 jblume@redhat.com 2016-04-28 08:36:58 UTC
Created attachment 1151778 [details]
PVS debug output

Comment 6 jblume@redhat.com 2016-04-28 08:37:40 UTC
Created attachment 1151779 [details]
Second PVS debug output

Comment 7 Zdenek Kabelac 2016-04-28 08:47:59 UTC
As a 'hotfix'  - please  set  "use_lvmetad=0"  in lvm.conf
and stop/mask   lvm2-lvmetad.service  and lvm2-lvmetad.socket via systemctl.

Comment 8 jblume@redhat.com 2016-04-28 10:02:10 UTC
(In reply to Zdenek Kabelac from comment #7)
> As a 'hotfix'  - please  set  "use_lvmetad=0"  in lvm.conf
> and stop/mask   lvm2-lvmetad.service  and lvm2-lvmetad.socket via systemctl.

Thanks. Did it and it works fine.

Comment 9 David Teigland 2016-04-28 14:49:08 UTC
pvscan is seeing the underlying devs (because multipath_component_detection isn't working), and then sending them to lvmetad.  The 'pvs' command is then getting those underlying devs from lvmetad.

Using filters to workaround the broken multipath_component_detction should work, but you'd need to use "global_filter" instead of "filter" to make it apply to lvmetad (if you want to try this, you'd need to go back to using lvmetad by reversing the steps in comment 7.)

I'm not sure what data to collect to further debug the initial problem with multipath_component_detection.

Comment 10 jblume@redhat.com 2016-04-29 15:13:32 UTC
(In reply to David Teigland from comment #9)
> pvscan is seeing the underlying devs (because multipath_component_detection
> isn't working), and then sending them to lvmetad.  The 'pvs' command is then
> getting those underlying devs from lvmetad.
> 
> Using filters to workaround the broken multipath_component_detction should
> work, but you'd need to use "global_filter" instead of "filter" to make it
> apply to lvmetad (if you want to try this, you'd need to go back to using
> lvmetad by reversing the steps in comment 7.)
> 
> I'm not sure what data to collect to further debug the initial problem with
> multipath_component_detection.

Thanks. Reverted the steps taken before and replaced filter by global_filter.
So the entry in /etc/lvm/lvm.conf is
global_filter = [ "a|/dev/mapper/mpath.*|", "r|/dev/.*|" ]

This actually works. Thanks a lot.

Comment 11 Marko Myllynen 2016-05-11 11:33:53 UTC
FWIW, I saw the same "Duplicate of PV ... exists on unknown device" with 3PAR LUNs but didn't have multipath_component_detection set in multipath.conf. This is on RHEL 7 + latest errata. When using global_filter instead of filter, then I see no unexpected messages with pvs(8).

I need to release the system to other teams in few days but before that I could provide you whatever logs that might be helpful?

Thanks.

Comment 12 David Teigland 2016-05-11 14:46:58 UTC
The duplicate PV handling has recently been redesigned and replaced.  Once there are lvm builds containing this new code (don't know when that will be), it would be nice to try running it on some systems affected by duplicate PVs.

Comment 13 Peter Rajnoha 2016-05-12 08:17:45 UTC
Currently, we have initial lvm2 build that has v2.02.152 in 7.3 (lvm2-2.02.152-2.el7). I'll do a rebase when more bugs/solutions stack up for 7.3.

Comment 18 David Teigland 2016-07-14 14:50:32 UTC
Yes, 7.3 will have new duplicate PV handling.  I think there's another bz addressing the problem with Netapp being incompatible with current udev/multipath/etc, which is probably why the standard multipath_component_detection is not working.

Comment 22 David Teigland 2016-12-07 16:41:51 UTC
Could this be retested with 7.3?  I think this may be fixed in 7.3 by this commit:

commit 939f5310b9e58a560247c44cbd8a8f8af86aae7c
Author: David Teigland <teigland>
Date:   Wed Aug 31 13:05:53 2016 -0500

    lvmetad: use udev to ignore multipath components during scan
    
    When scanning devs to populate lvmetad during system startup,
    filter-mpath with native sysfs multipath component detection
    may not detect that a dev is multipath component.  This is
    because the multipath devices may not be set up yet.
    
    Because of this, pvscan will scan multipath components during
    startup, will see them as duplicate PVs, and will disable
    lvmetad.  This will leave lvmetad disabled on systems using
    multipath, unless something or someone runs pvscan --cache
    to rescan.
    
    To avoid this problem, the code that is scanning devices to
    populate lvmetad will now check the udev db to see if a
    dev is a multipath component that should be skipped.
    
    (This may not be perfect due to inherent udev races, but will
    cover most cases and will be at least as good as it's ever
    been.)

Comment 23 Frank Danapfel 2016-12-19 16:38:03 UTC
(In reply to David Teigland from comment #22)
> Could this be retested with 7.3?  I think this may be fixed in 7.3 by this
> commit:
> 

Sorry, I can't help here since I don't have access to appropriate hardware.

Comment 24 David Teigland 2016-12-19 19:45:57 UTC
From what I can tell, this should be fixed, and it doesn't appear we will get any more information from this source.

Comment 26 David Teigland 2017-05-08 19:10:21 UTC
There were other commits related to the one above.  But more broadly, the entire duplicate PV handling was overhauled in 7.3.  It's not clear how much help this one fix would be on its own, that would need to be tested (I don't think we ever reproduced the specific startup race.)  Do they have an effective workaround for this in 7.2?

Comment 27 Sam Yangsao 2017-05-09 17:34:48 UTC
(In reply to David Teigland from comment #26)
> There were other commits related to the one above.  But more broadly, the
> entire duplicate PV handling was overhauled in 7.3.  It's not clear how much
> help this one fix would be on its own, that would need to be tested (I don't
> think we ever reproduced the specific startup race.)  Do they have an
> effective workaround for this in 7.2?

The customer is able to reproduce this easily in 7.2.z EUS and do have a sandbox environment for us to grab data sets from.  They are not using any type of workaround at the moment and let the Duplicate PV messages pass through on reboot.  

pvs commands seem to be working fine, but they are concerned about any potential issues that could arise from this as they can not upgrade to 7.3 (SAP HANA supportability).  

Can we be confident enough to tell them that these error messages can be safely ignored?

Also, we can work with them to see if this can be addressed in a z-stream fix for 7.2 EUS as it will be around until 11/2019.

Thanks!

Comment 29 Marian Csontos 2017-05-10 11:23:38 UTC
If I am to package this, I say this looks fine to me.

There is no way this ignore devices unless the udev DB is incorrect, in which case system is in bad state anyway.

This may accept multipath component when udev initialization times out, in which case, using global_filter should be considered as udev is not managing the load.

Comment 32 Marian Csontos 2017-06-29 13:58:47 UTC
IIUC this may need changes to configure from the commit b8779e706e[1] adding HAVE_LIBUDEV_UDEV_DEVICE_GET_IS_INITIALIZED but it would work even without it, as there is an alternative path for RHEL6.

[1]: https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=b8779e706e

Also there is an unrelated snippet in dev-ext.c touched by the commit - coming from commit f82e0210b7cb[2].

[2]: https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=f82e0210b7cb9f3926ef5051ca34a4ad383ba271

First question is whether to include that as well, or would we need an extra ZStream bug for that?

For now I am including only the HAVE_LIBUDEV_UDEV_DEVICE_GET_IS_INITIALIZED in config(.in) and removing the unrelated snippet from dev-ext.c.

Secondly for RHEL-7 it could be simplified and we could remove the #ifdef part and ignore the second patch altogether.

Peter, what's custom/best thing to do? Minimal patch, or being close to original patch?

Comment 33 Peter Rajnoha 2017-06-29 14:56:14 UTC
The patch [2] was committed first and then I hit a problem with RHEL6 where the fn was not available and hence I comitted the other patch [1] the same day. So the patch for configure to test whether the udev_device_get_is_initialized is available is only for those older libudev versions.

The patch [2] applies only if user has devices/external_device_info_source="udev" in their configuration (and this one is not used by default, default is external_device_info_source="none"). The patch [2] causes that udev db records which are incomplete (due to previous timeout in udev) are not used and we error out instead (the external_device_info_source="udev" is used for information on filtering, including detection of mpath components). But again, this is not default, default is external_device_info_source="none". So if we don't have reports about problems with external_device_info_source="udev", I wouldn't include the patch [2].

RHEL7 since 7.0 has the udev_device_get_is_initialized function available so we can say that the patch for "configure" [1] is not quite needed too, but if you want to be sure, feel free to add that.

Comment 34 Marian Csontos 2017-06-30 14:10:22 UTC
Making this depend on properly cloned Bug 1458346. Once confirmed, we will close this as duplicate.

Comment 35 Marian Csontos 2017-07-19 10:04:50 UTC
Bug 1458346 has an Errata released and this one is likely a duplicate. Is this still reproducible with the latest lvm2 build?

Comment 36 Sam Yangsao 2017-07-19 13:29:36 UTC
(In reply to Marian Csontos from comment #35)
> Bug 1458346 has an Errata released and this one is likely a duplicate. Is
> this still reproducible with the latest lvm2 build?

jblume no longer works here - I don't believe it is as the customer was able to not reproduce the issue with 7.3, but we requested the backport to 7.2.z EUS because of the extended lifecycle of this EUS release from 2 to 4 years (for SAP HANA environments).  Thanks!

Comment 37 Sam Yangsao 2017-07-19 13:36:04 UTC
Discussions for the backport were discussed in this Bug for 7.2.z as it was re-opened by me, but the approval to 7.2.z EUS is in this Bug [1], fixing the title and closing this Bug to avoid any further confusion.

Thanks!

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1458346


Note You need to log in before you can comment on or make changes to this bug.