Bug 1845915

Summary: NVMe-FC: mpatha: nvme0n1 - tur checker doesn't support this device
Product: Red Hat Enterprise Linux 8 Reporter: Marco Patalano <mpatalan>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Marco Patalano <mpatalan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.3CC: agk, bmarzins, heinzm, hong.chung, lilin, msnitzer, prajnoha, zkabelac
Target Milestone: rc   
Target Release: 8.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: device-mapper-multipath-0.8.4-3.el8 Doc Type: Bug Fix
Doc Text:
Cause: A change in how multipath auto-detected checkers caused it to select the TUR checker for all devices where auto-detection failed. Consequence: multipath was incorrectly assigning the TUR checker to nvme devices Fix: multipath now only automatically selects the TUR checker for devices that are successfully report that they support ALUA Result: multipath no longer incorrectly assigns devices the TUR checker.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 01:59:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1842946    
Attachments:
Description Flags
output of multipath -t and multipath -v4 -ll none

Description Marco Patalano 2020-06-10 12:25:17 UTC
Description of problem: On a RHEL-8.3 host, we connect to an NVMe-FC namespace over 4 LIFs with DM-Multipath configured. multipath -ll shows the devices as undef instead of ready:

mpatha (uuid.e8b8f505-afe0-4c77-b5ac-19c0f5460f84) dm-4 NVME,NetApp ONTAP Controller               
size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=0 status=enabled
| |- 1:35840:1:1 nvme1n1 259:0 active undef running
| `- 2:35904:1:1 nvme2n1 259:2 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
  |- 0:25857:1:1 nvme0n1 259:1 active undef running
  `- 3:25921:1:1 nvme3n1 259:3 active undef running

In the log, the following messages are being constantly output:

Jun  9 11:17:25 storageqe-14 multipathd[980]: nvme0n1: unusable path (wild) - checker failed
Jun  9 11:17:25 storageqe-14 multipathd[980]: mpatha: nvme0n1 - tur checker doesn't support this device
Jun  9 11:17:27 storageqe-14 multipathd[980]: nvme2n1: unusable path (wild) - checker failed
Jun  9 11:17:27 storageqe-14 multipathd[980]: mpatha: nvme2n1 - tur checker doesn't support this device
Jun  9 11:17:28 storageqe-14 multipathd[980]: nvme3n1: unusable path (wild) - checker failed
Jun  9 11:17:28 storageqe-14 multipathd[980]: mpatha: nvme3n1 - tur checker doesn't support this device
Jun  9 11:17:29 storageqe-14 multipathd[980]: nvme1n1: unusable path (wild) - checker failed
Jun  9 11:17:29 storageqe-14 multipathd[980]: mpatha: nvme1n1 - tur checker doesn't support this device
Jun  9 11:17:30 storageqe-14 multipathd[980]: nvme0n1: unusable path (wild) - checker failed
Jun  9 11:17:30 storageqe-14 multipathd[980]: mpatha: nvme0n1 - tur checker doesn't support this device
Jun  9 11:17:32 storageqe-14 multipathd[980]: nvme2n1: unusable path (wild) - checker failed
Jun  9 11:17:32 storageqe-14 multipathd[980]: mpatha: nvme2n1 - tur checker doesn't support this device

Below is the contents of the multipath.conf which we have used for RHEL-8.2 testing:

[root@storageqe-14 crash]# cat /etc/multipath.conf
defaults {
        user_friendly_names yes
        find_multipaths yes
}
devices {
    device {
        vendor "NVME"
        product "NetApp ONTAP Controller"
        path_grouping_policy group_by_prio
        prio ana
        failback immediate
        no_path_retry queue
        }
    device {
        vendor "NETAPP"
        product "LUN.*"
        path_grouping_policy group_by_prio
        path_checker tur
        features "3 queue_if_no_path pg_init_retries 50"
        hardware_handler 0
        prio ontap
        failback immediate
        rr_weight uniform
        rr_min_io 128
        flush_on_last_del yes
        dev_loss_tmo infinity
        retain_attached_hw_handler yes
        detect_prio yes
    }
}
blacklist {
}


Version-Release number of selected component (if applicable):
# uname -r
4.18.0-211.el8.x86_64

# rpm -qa device-mapper-multipath
device-mapper-multipath-0.8.4-1.el8.x86_64


How reproducible: 100%


Steps to Reproduce:
1. See above


Additional info: Ben has indicated that this is a problem with the checker autodetection code. We can workaround this issue for now by adding:

detect_checker no

to the device config for your NVME devices.

Comment 1 Marco Patalano 2020-06-10 12:26:42 UTC
Created attachment 1696482 [details]
output of multipath -t and multipath -v4 -ll

Comment 3 Ben Marzinski 2020-07-07 16:12:28 UTC
A change in how multipath autodetected checkers caused it to set the ALUA checker on devices that it shouldn't, including nvme devices.  This has been fixed to work as before.

Comment 6 Marco Patalano 2020-07-21 15:15:40 UTC
Verified with device-mapper-multipath-0.8.4-3.el8:

# rpm -qa device-mapper-multipath
device-mapper-multipath-0.8.4-3.el8.x86_64


[root@storageqe-01 ~]# multipath -ll mpatha
mpatha (uuid.15542735-1e61-4cbe-919f-9dc1ba282a6e) dm-4 NVME,NetApp ONTAP Controller                 
size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 0:448:1:1   nvme0n1 259:1 active ready running
| `- 1:512:1:1   nvme1n1 259:2 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  |- 2:58305:1:1 nvme2n1 259:0 active ready running
  `- 3:58369:1:1 nvme3n1 259:3 active ready running


No "tur checker doesn't support this device" messages in the logs.

Comment 7 Ben Marzinski 2020-07-30 16:35:58 UTC
*** Bug 1861818 has been marked as a duplicate of this bug. ***

Comment 10 errata-xmlrpc 2020-11-04 01:59:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (device-mapper-multipath bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4540