Hide Forgot
Description of problem: A customer has several Netapp-LUNs attached to his 8-socket machine. He grouped the LUNs using DM-Multipath as follows: - All LUNs are active and multipathd is running. When trying to list all defined physical volumes with # pvs I get all the PVs on the mpath-devices plus the following output: WARNING: duplicate PV <long string> is being used from both devices \ /dev/mapper/mpathX and /dev/sdX - When trying to apply a filter in /etc/lvm/lvm.conf with is accepted as best practice with filter [ "a|/dev/mapper/mpath.*|", "r|.*|" ] When calling # pvs again, besides the PVs on the multipathed devices there is a new error message: Duplicate of PV <long string> dev /dev/mapper/mpathX exists on unknown\ device <major>:<minor> - The output of # multipath -ll seems to be correct. Version-Release number of selected component (if applicable): - RHEL7.2 How reproducible: Import Netapp-LUNs into your system and group them with DM-Multipath Steps to Reproduce: 1. Import Netapp-LUNs into your system and group them with DM-Multipath 2. Ensure they that they are available with # lsblk 3. Modify your /etc/lvm/lvm.conf by adding the filter 4. Run # pvs Actual results: Getting the error message that duplicate PVs were found on unknown underlying devices Expected results: Just the multipathed devices are touched to detect the physical volumes Additional info: - Checked out https://access.redhat.com/solutions/39566. Made sure that /etc/multipath/bindings file is in sync with # multipath -ll | grep mpath | sort
Does lvm.conf have multipath_component_detection set to 1? That should make lvm ignore the multipath component devices.
(In reply to David Teigland from comment #1) > Does lvm.conf have multipath_component_detection set to 1? That should make > lvm ignore the multipath component devices. Hi David, I checked that out and I can confirm. multipath_component_detection set to 1.
Thanks, so the multipath detection isn't working for some reason. We'll need to look at debug output to figure out why. Please attach the output of both: pvs -vvvv --config devices/external_device_info_source=none |& tee pvs.1 pvs -vvvv --config devices/external_device_info_source=udev |& tee pvs.2
Created attachment 1151778 [details] PVS debug output
Created attachment 1151779 [details] Second PVS debug output
As a 'hotfix' - please set "use_lvmetad=0" in lvm.conf and stop/mask lvm2-lvmetad.service and lvm2-lvmetad.socket via systemctl.
(In reply to Zdenek Kabelac from comment #7) > As a 'hotfix' - please set "use_lvmetad=0" in lvm.conf > and stop/mask lvm2-lvmetad.service and lvm2-lvmetad.socket via systemctl. Thanks. Did it and it works fine.
pvscan is seeing the underlying devs (because multipath_component_detection isn't working), and then sending them to lvmetad. The 'pvs' command is then getting those underlying devs from lvmetad. Using filters to workaround the broken multipath_component_detction should work, but you'd need to use "global_filter" instead of "filter" to make it apply to lvmetad (if you want to try this, you'd need to go back to using lvmetad by reversing the steps in comment 7.) I'm not sure what data to collect to further debug the initial problem with multipath_component_detection.
(In reply to David Teigland from comment #9) > pvscan is seeing the underlying devs (because multipath_component_detection > isn't working), and then sending them to lvmetad. The 'pvs' command is then > getting those underlying devs from lvmetad. > > Using filters to workaround the broken multipath_component_detction should > work, but you'd need to use "global_filter" instead of "filter" to make it > apply to lvmetad (if you want to try this, you'd need to go back to using > lvmetad by reversing the steps in comment 7.) > > I'm not sure what data to collect to further debug the initial problem with > multipath_component_detection. Thanks. Reverted the steps taken before and replaced filter by global_filter. So the entry in /etc/lvm/lvm.conf is global_filter = [ "a|/dev/mapper/mpath.*|", "r|/dev/.*|" ] This actually works. Thanks a lot.
FWIW, I saw the same "Duplicate of PV ... exists on unknown device" with 3PAR LUNs but didn't have multipath_component_detection set in multipath.conf. This is on RHEL 7 + latest errata. When using global_filter instead of filter, then I see no unexpected messages with pvs(8). I need to release the system to other teams in few days but before that I could provide you whatever logs that might be helpful? Thanks.
The duplicate PV handling has recently been redesigned and replaced. Once there are lvm builds containing this new code (don't know when that will be), it would be nice to try running it on some systems affected by duplicate PVs.
Currently, we have initial lvm2 build that has v2.02.152 in 7.3 (lvm2-2.02.152-2.el7). I'll do a rebase when more bugs/solutions stack up for 7.3.
Yes, 7.3 will have new duplicate PV handling. I think there's another bz addressing the problem with Netapp being incompatible with current udev/multipath/etc, which is probably why the standard multipath_component_detection is not working.
Could this be retested with 7.3? I think this may be fixed in 7.3 by this commit: commit 939f5310b9e58a560247c44cbd8a8f8af86aae7c Author: David Teigland <teigland> Date: Wed Aug 31 13:05:53 2016 -0500 lvmetad: use udev to ignore multipath components during scan When scanning devs to populate lvmetad during system startup, filter-mpath with native sysfs multipath component detection may not detect that a dev is multipath component. This is because the multipath devices may not be set up yet. Because of this, pvscan will scan multipath components during startup, will see them as duplicate PVs, and will disable lvmetad. This will leave lvmetad disabled on systems using multipath, unless something or someone runs pvscan --cache to rescan. To avoid this problem, the code that is scanning devices to populate lvmetad will now check the udev db to see if a dev is a multipath component that should be skipped. (This may not be perfect due to inherent udev races, but will cover most cases and will be at least as good as it's ever been.)
(In reply to David Teigland from comment #22) > Could this be retested with 7.3? I think this may be fixed in 7.3 by this > commit: > Sorry, I can't help here since I don't have access to appropriate hardware.
From what I can tell, this should be fixed, and it doesn't appear we will get any more information from this source.
There were other commits related to the one above. But more broadly, the entire duplicate PV handling was overhauled in 7.3. It's not clear how much help this one fix would be on its own, that would need to be tested (I don't think we ever reproduced the specific startup race.) Do they have an effective workaround for this in 7.2?
(In reply to David Teigland from comment #26) > There were other commits related to the one above. But more broadly, the > entire duplicate PV handling was overhauled in 7.3. It's not clear how much > help this one fix would be on its own, that would need to be tested (I don't > think we ever reproduced the specific startup race.) Do they have an > effective workaround for this in 7.2? The customer is able to reproduce this easily in 7.2.z EUS and do have a sandbox environment for us to grab data sets from. They are not using any type of workaround at the moment and let the Duplicate PV messages pass through on reboot. pvs commands seem to be working fine, but they are concerned about any potential issues that could arise from this as they can not upgrade to 7.3 (SAP HANA supportability). Can we be confident enough to tell them that these error messages can be safely ignored? Also, we can work with them to see if this can be addressed in a z-stream fix for 7.2 EUS as it will be around until 11/2019. Thanks!
If I am to package this, I say this looks fine to me. There is no way this ignore devices unless the udev DB is incorrect, in which case system is in bad state anyway. This may accept multipath component when udev initialization times out, in which case, using global_filter should be considered as udev is not managing the load.
IIUC this may need changes to configure from the commit b8779e706e[1] adding HAVE_LIBUDEV_UDEV_DEVICE_GET_IS_INITIALIZED but it would work even without it, as there is an alternative path for RHEL6. [1]: https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=b8779e706e Also there is an unrelated snippet in dev-ext.c touched by the commit - coming from commit f82e0210b7cb[2]. [2]: https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=f82e0210b7cb9f3926ef5051ca34a4ad383ba271 First question is whether to include that as well, or would we need an extra ZStream bug for that? For now I am including only the HAVE_LIBUDEV_UDEV_DEVICE_GET_IS_INITIALIZED in config(.in) and removing the unrelated snippet from dev-ext.c. Secondly for RHEL-7 it could be simplified and we could remove the #ifdef part and ignore the second patch altogether. Peter, what's custom/best thing to do? Minimal patch, or being close to original patch?
The patch [2] was committed first and then I hit a problem with RHEL6 where the fn was not available and hence I comitted the other patch [1] the same day. So the patch for configure to test whether the udev_device_get_is_initialized is available is only for those older libudev versions. The patch [2] applies only if user has devices/external_device_info_source="udev" in their configuration (and this one is not used by default, default is external_device_info_source="none"). The patch [2] causes that udev db records which are incomplete (due to previous timeout in udev) are not used and we error out instead (the external_device_info_source="udev" is used for information on filtering, including detection of mpath components). But again, this is not default, default is external_device_info_source="none". So if we don't have reports about problems with external_device_info_source="udev", I wouldn't include the patch [2]. RHEL7 since 7.0 has the udev_device_get_is_initialized function available so we can say that the patch for "configure" [1] is not quite needed too, but if you want to be sure, feel free to add that.
Making this depend on properly cloned Bug 1458346. Once confirmed, we will close this as duplicate.
Bug 1458346 has an Errata released and this one is likely a duplicate. Is this still reproducible with the latest lvm2 build?
(In reply to Marian Csontos from comment #35) > Bug 1458346 has an Errata released and this one is likely a duplicate. Is > this still reproducible with the latest lvm2 build? jblume no longer works here - I don't believe it is as the customer was able to not reproduce the issue with 7.3, but we requested the backport to 7.2.z EUS because of the extended lifecycle of this EUS release from 2 to 4 years (for SAP HANA environments). Thanks!
Discussions for the backport were discussed in this Bug for 7.2.z as it was re-opened by me, but the approval to 7.2.z EUS is in this Bug [1], fixing the title and closing this Bug to avoid any further confusion. Thanks! [1] https://bugzilla.redhat.com/show_bug.cgi?id=1458346