Bug 1290811

Summary: Multipath component detection (devices/multipath_component_detection=1) not always working correctly in lvm2
Product: [Fedora] Fedora Reporter: Peter Rajnoha <prajnoha>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 24CC: agk, bmarzins, bmr, dwysocha, heinzm, jonathan, lvm-team, msnitzer, prajnoha, prockai, teigland, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-08 12:30:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Rajnoha 2015-12-11 14:25:44 UTC
There's a small window in which LVM can see incomplete multipath setup even though multipath is configured correctly.

For example, in the simplest case, if multipath consists of 2 devices /dev/sda and /dev/sdb, these devices are never present in the system simultaneously - there are small delays in between. This is the sequence which can lead to LVM seeing duplicates:

1) /dev/sda appears

2) /dev/sda is recognized as mpath component (based on the wwids file)

3) mpath device is created and /dev/sda is added to this mpath device

4) udev event is generated for the mpath device (after table load + resume which added /dev/sda to the mpath device from previous step)

5) if the mpath device is a PV at the same time, pvscan --cache triggers in udev rules based on the event

6) if lvmetad is empty, the pvscan --cache from previous step will also cause full scan to be done to fill lvmetad with initial metadata info

7) if during step 6) the other mpath component (/dev/sdb) is not yet added to the mpath device (another mpath component can appear anytime!), LVM mpath component filtering does not recognize mpath components properly - this filter works by checking /sys content and whether a device has an mpath device on top (in this case, it checks sdb device and it sees *it's not yet* included in mpath device since the mpath device is not holder of the sdb device in sysfs)

8) we end up with /dev/sdb identified as duplicate to the mpath device because of the same PV UUID that LVM sees

This problem is consequence of two factors - fixing one them solves the problem:
  A) the full scan which is done to fill lvmetad with initial info
  B) mpath filter in LVM which checks /sys content to decide whether a device is an mpath component or not

The factor B) is actually resolved by using devices/external_device_info_source="udev" already because in that case the multipath component filter checks udev databse instead of checking /sys content. The udev database contains direct information whether a device is multipath component or not no matter if the top level mpath device is already set up or not. This is done by exporting information about a device being a multipath device by calling "multipath -c <device>" within udev rules (and multipath -c checks /etc/multipath/wwids configuration file to decide whether a device is multipath component or not).

However, lvm does not use external_device_info_source="udev" by default. Also, we're still keeping the original mpath filter method. So the original multipath filter should be fixed for it to better identify multipath components, presumably not using the /sys to decide this, but directly the information from multipath configuration which needs to be exported to LVM somehow (I'd say calling "multipath -c" in the LVM's mpath filter is not quite perfect, but that's one way of how we could make it or alternatively reading the wwids file directly from LVM mpath filter).

Alternatively, we can fix factor A) so that when pvscan --cache <device> is called, only that one device is scanned and nothing else. This has its own consequences - the full initial rescan to fill lvmetad is there on purpose as someone could have stopped lvmetad and now it needs to be filled with missing metadata info again!

One way or the other, if not fixed, people can end up seeing duplicates in lvm2, e.g.:

# pvs
  Found duplicate PV gK69H1zhNRD81es14seQvBwMcBIZvAU2: using /dev/mapper/mpath_test not /dev/sdt
  Using duplicate PV /dev/mapper/mpath_test from subsystem DM, ignoring /dev/sdt
  Found duplicate PV gK69H1zhNRD81es14seQvBwMcBIZvAU2: using /dev/mapper/mpath_test not /dev/sds
  Using duplicate PV /dev/mapper/mpath_test from subsystem DM, ignoring /dev/sds
  Found duplicate PV gK69H1zhNRD81es14seQvBwMcBIZvAU2: using /dev/mapper/mpath_test not /dev/sdr
  Using duplicate PV /dev/mapper/mpath_test from subsystem DM, ignoring /dev/sdr
  Found duplicate PV gK69H1zhNRD81es14seQvBwMcBIZvAU2: using /dev/mapper/mpath_test not /dev/sdq
  Using duplicate PV /dev/mapper/mpath_test from subsystem DM, ignoring /dev/sdq
  PV                     VG     Fmt  Attr PSize   PFree  
  /dev/mapper/mpath_test        lvm2 ---  104.00m 104.00m

Comment 1 David Teigland 2015-12-11 15:31:36 UTC
As we've discussed before, I'm hoping to fix 'pvscan --cache <device>' so that it only scans the one device.  I'd like to make lvmetad fork a 'pvscan --cache' to initialize itself when it starts.

Comment 2 Jan Kurik 2016-02-24 15:37:49 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle.
Changing version to '24'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora24#Rawhide_Rebase

Comment 3 Fedora End Of Life 2017-07-25 19:37:36 UTC
This message is a reminder that Fedora 24 is nearing its end of life.
Approximately 2 (two) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 24. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '24'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 24 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 4 Fedora End Of Life 2017-08-08 12:30:48 UTC
Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.