Bug 2053642

Summary: [RFE] Add support in multipathd to listen for FPIN-Li events and mark effected paths as marginal
Product: Red Hat Enterprise Linux 9 Reporter: Ben Marzinski <bmarzins>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Lin Li <lilin>
Severity: unspecified Docs Contact: Kristina Slaveykova <kslaveyk>
Priority: high    
Version: 9.0CC: abhide, agk, bmarzins, emilne, heinzm, kslaveyk, lilin, mkumar, msnitzer, muneendra.kumar, prajnoha, zkabelac
Target Milestone: rcKeywords: FutureFeature, Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: device-mapper-multipath-0.8.7-6.el9 Doc Type: Enhancement
Doc Text:
.`multipathd` now supports detecting FPIN-Li events When you add a new value `fpin` for the `marginal_pathgroups` config option, you enable `multipathd` to monitor the Link Integrity Fabric Performance Impact Notification (PFIN-Li) events and move paths with link integrity issues to a marginal pathgroup. With the `fpin` value set, `multipathd` overrides its existing marginal path detection methods and relies on the Fibre Channel fabric to identify link integrity issues. With this enhancement, the `multipathd` method becomes more robust in detecting marginal paths on Fibre Channel fabrics that can issue PFIN-Li events.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-17 15:56:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben Marzinski 2022-02-11 16:59:24 UTC
Description of problem:
When link integrity issues are detected on a Fibre Channel fabric, A Link Integrity Fabric Performance Impact Notification (FPIN-Li) can be send to a node. If multipathd listens for these events, it can use them to control a path's marginal status, instead of trying to detect marginal paths internally. The path will remain marginal until a registered state change notification (RSCN) or Link Up event is received.

Comment 4 Ben Marzinski 2022-02-15 18:39:43 UTC
RHEL-9 packages with this fix are available for testing at:

http://people.redhat.com/~bmarzins/device-mapper-multipath/rpms/RHEL9/2053642/

Muneendra, It would be really helpful for getting this into RHEL-9.0 if you could test these before the end of next week.

Comment 8 Ben Marzinski 2022-02-21 16:35:24 UTC
Muneendra, just a ping to remind you that it would be very helpful to get this tested before the end of this week.

Comment 10 MUNEENDRA (Broadcom) 2022-02-22 10:44:15 UTC
Hi Benjamin,
I have installed the packages and tested the same on top of RHEL9 Beta.
We have injected the FPIN from the switch while the host is running the traffic.
On the host the affected paths and port_states have  been marked as marginal and the traffic has been shifted to active paths.
And things are working fine as expected.
With this testing we can conclude that the packages which you have sent are working fine.
And it is good to go.

Comment 13 Lin Li 2022-02-23 03:03:08 UTC
Move to verified according to comment 10 and comment 12.

Comment 15 errata-xmlrpc 2022-05-17 15:56:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: device-mapper-multipath), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:3971

Comment 17 Ewan D. Milne 2023-02-09 16:29:30 UTC
# man multipath.conf
defaults section 
     marginal_pathgroups
                        If set to off, the delay_*_checks, marginal_path_*, and san_path_err_* options will keep
                        marginal, or "shaky", paths from being reinstated until they  have  been  monitored  for
                        some time. This can cause situations where all non-marginal paths are down, and no paths
                        are usable until multipathd detects this and reinstates a marginal path. If  the  multi‐
                        path device is not configured to queue IO in this case, it can cause IO errors to occur,
                        even though there are marginal paths available.  However, if this option is set  to  on,
                        when  one  of the marginal path detecting methods determines that a path is marginal, it
                        will be reinstated and placed in a seperate pathgroup that will only be used  after  all
                        the  non-marginal  pathgroups have been tried first. This prevents the possibility of IO
                        errors occuring while marginal paths are still usable. After the path has been monitored
                        for  the  configured  time,  and  is declared healthy, it will be returned to its normal
                        pathgroup.  If this option is set to fpin, multipathd will receive  fpin  notifications,
                        set  path  states to "marginal" accordingly, and regroup paths as described for on. This
                        option can't be used in combination with other options for "Shaky path  detection"  (see
                        below). Note: If this is set to fpin, the marginal_path_* and san_path_err_* options are
                        implicitly set to no. Also, this option cannot be switched either to or from fpin  on  a
                        multipathd reconfigure. multipathd must be restarted for the change to take effect.  See
                        "Shaky paths detection" below for more information.

                        The default is: off


# cat /etc/multipath.conf
defaults {
	user_friendly_names yes
	find_multipaths yes
        marginal_pathgroups fpin
}


# multipathd show config
defaults {
	marginal_pathgroups "off"