RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1948690 - ANA state missing when Native NMVe Multipath is disabled
Summary: ANA state missing when Native NMVe Multipath is disabled
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.0
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: beta
: ---
Assignee: Mike Snitzer
QA Contact: Marco Patalano
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-12 18:33 UTC by Marco Patalano
Modified: 2021-12-07 21:57 UTC (History)
10 users (show)

Fixed In Version: kernel-5.13.0-0.rc4.33.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-07 21:55:02 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)

Description Marco Patalano 2021-04-12 18:33:52 UTC
Description of problem: With RHEL-9, if Native NVMe Multipath is disabled, the ANA state is not preset when issuing an "nvme list-subsys":

[root@storageqe-14 ~]# uname -r
5.11.0-2.el9.x86_64

[root@storageqe-14 ~]# cat /sys/module/nvme_core/parameters/multipath
N

[root@storageqe-14 ~]# nvme list-subsys /dev/nvme0n1
nvme-subsys1 - NQN=nqn.1992-08.com.netapp:sn.e18bfca87d5e11e98c0800a098cbcac6:subsystem.st14_nvme_ss_1_1
\
 +- nvme0 fc traddr=nn-0x204600a098cbcac6:pn-0x204900a098cbcac6 host_traddr=nn-0x20000090fae0b5f6:pn-0x10000090fae0b5f6 live 
 +- nvme1 fc traddr=nn-0x204600a098cbcac6:pn-0x204800a098cbcac6 host_traddr=nn-0x20000090fae0b5f5:pn-0x10000090fae0b5f5 live 
 +- nvme2 fc traddr=nn-0x204600a098cbcac6:pn-0x204700a098cbcac6 host_traddr=nn-0x20000090fae0b5f5:pn-0x10000090fae0b5f5 live 
 +- nvme3 fc traddr=nn-0x204600a098cbcac6:pn-0x204a00a098cbcac6 host_traddr=nn-0x20000090fae0b5f6:pn-0x10000090fae0b5f6 live 


This deviates from the behavior observed in RHEL-8.4:

[root@storageqe-01 ~]# uname -r
4.18.0-293.el8.x86_64


[root@storageqe-01 ~]# cat /sys/module/nvme_core/parameters/multipath
N

[root@storageqe-01 ~]# nvme list-subsys /dev/nvme2n1
nvme-subsys2 - NQN=nqn.1992-08.com.netapp:sn.c9ecc9187b1111e98c0800a098cbcac6:subsystem.vs_nvme_multipath_1_subsystem_468
\
 +- nvme0 fc traddr=nn-0x203b00a098cbcac6:pn-0x203d00a098cbcac6 host_traddr=nn-0x20000024ff19bb62:pn-0x21000024ff19bb62 live optimized
 +- nvme1 fc traddr=nn-0x203b00a098cbcac6:pn-0x204c00a098cbcac6 host_traddr=nn-0x20000024ff19bb63:pn-0x21000024ff19bb63 live inaccessible
 +- nvme2 fc traddr=nn-0x203b00a098cbcac6:pn-0x203100a098cbcac6 host_traddr=nn-0x20000024ff19bb62:pn-0x21000024ff19bb62 live optimized
 +- nvme3 fc traddr=nn-0x203b00a098cbcac6:pn-0x204d00a098cbcac6 host_traddr=nn-0x20000024ff19bb63:pn-0x21000024ff19bb63 live inaccessible

If this RHEL-9  behavior is expected, then how would path priority work with device-mapper-multipath when setting prio to ana in the multipath.conf?

Version-Release number of selected component (if applicable):
kernel-5.11.0-2.el9

Comment 1 Mike Snitzer 2021-04-13 13:47:34 UTC
The NVMe patch we're carrying in RHEL8 needs to be forward-ported to RHEL9.

I'll do so ASAP.

Comment 2 Mike Snitzer 2021-04-13 13:58:29 UTC
(In reply to Mike Snitzer from comment #1)
> The NVMe patch we're carrying in RHEL8 needs to be forward-ported to RHEL9.

Turns out we're carrying 4 patches in RHEL8:

ef4ab90c12db [nvme] nvme: Return BLK_STS_TARGET if the DNR bit is set
f8fb6ea1226e [nvme] nvme: update failover handling to work with REQ_FAILFAST_TRANSPORT
b904f4b8e0f9 [nvme] nvme: decouple basic ANA log page re-read support from native multipathing
7dadadb07251 [nvme] nvme: allow retry for requests with REQ_FAILFAST_TRANSPORT set

I'll forward-port them to RHEL9... really not interested in revisiting trying to get upstream NVMe maintainers to take these changes (given how "loaded" allowing proper DM multipath support for NVMe is for Christoph Hellwig).

Comment 3 Mike Snitzer 2021-04-13 14:38:46 UTC
(In reply to Marco Patalano from comment #0)

> If this RHEL-9  behavior is expected, then how would path priority work with
> device-mapper-multipath when setting prio to ana in the multipath.conf?

BTW, this _should_ still "just work". My understanding is that device-mapper-multipath will fallback to doing its own ana_state collection if the sysfs file doesn't exist. Did you find that not to be the case?

Comment 4 Ewan D. Milne 2021-04-13 15:03:55 UTC
RHEL8 also has this.  I'm not sure if we want to change RHEL9
to be different from the upstream default like we did in RHEL8
give how RHEL9 is supposed to more closely track upstream?

Also, NetApp at least prefers to only support Native NVMe multipath
for their products, they say they don't have the resources to certify
DM-multipath also (even though we tell them they really have to for RHEL).

However, I'm not sure we can do this if we need to support upgrade
e.g. of a boot-from-SAN NVMe fabrics system (when we add this feature)
from RHEL8 to RHEL9.

---

commit 8be4b84f8e35dc6e5453b72fe5e60acef795a299
Author: Ewan Milne <emilne>
Date:   Fri Mar 22 17:33:52 2019 -0400

    [nvme] nvme: multipath: Change default of kernel NVMe multipath to be disabled
    
    Message-id: <20190322173354.8197-20-emilne>
    Patchwork-id: 247393
    O-Subject: [RHEL8.1 PATCH 19/21] nvme: multipath: Change default of kernel NVMe multipath to be disabled
    Bugzilla: 1690940
    RH-Acked-by: Tony Camuso <tcamuso>
    RH-Acked-by: Mike Snitzer <snitzer>
    
    Now that CONFIG_NVME_MULTIPATH is enabled, make the default behavior
    of the kernel to be "disabled" for compatibility with earlier RHEL8.
    
    RHEL-only
    
    Signed-off-by: Ewan D. Milne <emilne>
    Signed-off-by: Herton R. Krzesinski <herton>

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 1c9c3d39ca76..5a3f11439402 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -7,7 +7,7 @@
 #include <trace/events/block.h>
 #include "nvme.h"
 
-static bool multipath = true;
+static bool multipath = false;
 module_param(multipath, bool, 0444);
 MODULE_PARM_DESC(multipath,
        "turn on native support for multiple controllers per subsystem");

Comment 5 Ewan D. Milne 2021-04-13 15:07:48 UTC
BTW

When I asked about RHEL-specific changes in RHEL9 in the KWF meeting
I was told that we need to use CONFIG_RHEL_DIFFERENCES so that we
don't end up changing Fedora as well.

Supposedly RHEL-specific changes will survive the RHEL9 rebasing.

Comment 6 Ewan D. Milne 2021-04-13 15:10:55 UTC
(In reply to Mike Snitzer from comment #3)
> (In reply to Marco Patalano from comment #0)
> 
> > If this RHEL-9  behavior is expected, then how would path priority work with
> > device-mapper-multipath when setting prio to ana in the multipath.conf?
> 
> BTW, this _should_ still "just work". My understanding is that
> device-mapper-multipath will fallback to doing its own ana_state collection
> if the sysfs file doesn't exist. Did you find that not to be the case?

I think Marco said it didn't work right.

Also, without the RHEL-specific changes the error classification
(path vs. device-reported) doesn't get handled properly for DM, right?

Comment 7 Ben Marzinski 2021-04-13 15:50:24 UTC
If it really doesn't work correctly without the sysfs file, then that needs looking into.  The RHEL9 patch is identical to the RHEL8 patch, and if multipath doesn't get a valid state when trying the sysfs file method, it will just fail back to using the regular ioctl method.  If it's not working without the sysfs file, then it's possible that multipath's regular ana priority method is broken, and we've never noticed it because we've always been using the sysfs method.

Comment 8 Mike Snitzer 2021-04-13 17:32:21 UTC
(In reply to Ewan D. Milne from comment #5)
> BTW
> 
> When I asked about RHEL-specific changes in RHEL9 in the KWF meeting
> I was told that we need to use CONFIG_RHEL_DIFFERENCES so that we
> don't end up changing Fedora as well.
> 
> Supposedly RHEL-specific changes will survive the RHEL9 rebasing.

I talked to Don Zickus about that today.  He said that _may_ be needed but we'll see.

I'll get the commit you pointed out added to my MR now (needed to figure out how to do that..)

But here is the MR I created for ARK:
https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1024

Comment 9 Marco Patalano 2021-04-14 13:33:24 UTC
Hello Mike,

With Native NVMe Multipath disabled and using device-mapper-multipath, I have 4 paths to the NVMe namespace (2 should be optimized and 2 inaccessible although I am not able to find this info using nvme-cli).

[root@storageqe-14 ~]# multipath -ll
mpatha (uuid.e8b8f505-afe0-4c77-b5ac-19c0f5460f84) dm-4 NVME,NetApp ONTAP Controller                 
size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:27968:1:1 nvme1n1 259:0 active ready running
| `- 2:28032:1:1 nvme2n1 259:2 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  |- 0:63041:1:1 nvme0n1 259:1 active ready running
  `- 3:63105:1:1 nvme3n1 259:3 active ready running

I then begin to generate I/O to /dev/mapper/mpatha and subsequently perform a storage failover on the array. The 2 paths are removed as expected:

[root@storageqe-14 ~]# multipath -ll
mpatha (uuid.e8b8f505-afe0-4c77-b5ac-19c0f5460f84) dm-4 NVME,NetApp ONTAP Controller                 
size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 0:63041:1:1 nvme0n1 259:1 active ready running
  `- 3:63105:1:1 nvme3n1 259:3 active ready running

The FIO job completes successfully and the paths are restored after the failover/failback completes:

[root@storageqe-14 ~]# multipath -ll
mpatha (uuid.e8b8f505-afe0-4c77-b5ac-19c0f5460f84) dm-4 NVME,NetApp ONTAP Controller                 
size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 4:64:1:1    nvme4n1 259:0 active ready running
| `- 2:128:1:1   nvme2n1 259:2 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  |- 0:63041:1:1 nvme0n1 259:1 active ready running
  `- 3:63105:1:1 nvme3n1 259:3 active ready running


Therefore, I would say that this is working properly and that my concern is that I am not able to find the ANA state with nvme-cli as we used to with RHEL-8.4.

Marco

Comment 17 Marco Patalano 2021-06-03 18:30:30 UTC
Verified with kernel-5.13.0-0.rc4.33.el9:

[root@storageqe-01 ~]# uname -r
5.13.0-0.rc4.33.el9.x86_64

Native NVMe Multipath Disabled:

[root@storageqe-01 ~]# cat /sys/module/nvme_core/parameters/multipath
N

[root@storageqe-01 ~]# multipath -ll
3600a098038304267573f4d37784f6849 dm-3 NETAPP,LUN C-Mode
size=15G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  `- 0:0:1:0    sdb     8:16   active ready running
mpatha (uuid.15542735-1e61-4cbe-919f-9dc1ba282a6e) dm-4 NVME,NetApp ONTAP Controller                 
size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 0:192:1:1  nvme0n1 259:0  active ready running
| `- 1:256:1:1  nvme1n1 259:3  active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  |- 2:3585:1:1 nvme2n1 259:6  active ready running
  `- 3:3649:1:1 nvme3n1 259:7  active ready running
mpathb (uuid.5e417006-37c1-408d-91af-d1fb74d4786b) dm-5 NVME,NetApp ONTAP Controller                 
size=5.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 0:192:2:2  nvme0n2 259:1  active ready running
| `- 1:256:2:2  nvme1n2 259:4  active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  |- 2:3585:2:2 nvme2n2 259:8  active ready running
  `- 3:3649:2:2 nvme3n2 259:9  active ready running
mpathc (uuid.4a067514-ce49-40d7-ad7e-45342a31e754) dm-6 NVME,NetApp ONTAP Controller                 
size=7.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 0:192:3:3  nvme0n3 259:2  active ready running
| `- 1:256:3:3  nvme1n3 259:5  active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  |- 2:3585:3:3 nvme2n3 259:10 active ready running
  `- 3:3649:3:3 nvme3n3 259:11 active ready running


[root@storageqe-01 ~]# nvme list-subsys /dev/nvme0n1
nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.c9ecc9187b1111e98c0800a098cbcac6:subsystem.vs_nvme_multipath_1_subsystem_468
\
 +- nvme0 fc traddr=nn-0x203b00a098cbcac6:pn-0x203d00a098cbcac6 host_traddr=nn-0x20000024ff19bb62:pn-0x21000024ff19bb62 live optimized
 +- nvme1 fc traddr=nn-0x203b00a098cbcac6:pn-0x203100a098cbcac6 host_traddr=nn-0x20000024ff19bb62:pn-0x21000024ff19bb62 live optimized
 +- nvme2 fc traddr=nn-0x203b00a098cbcac6:pn-0x204d00a098cbcac6 host_traddr=nn-0x20000024ff19bb63:pn-0x21000024ff19bb63 live inaccessible
 +- nvme3 fc traddr=nn-0x203b00a098cbcac6:pn-0x204c00a098cbcac6 host_traddr=nn-0x20000024ff19bb63:pn-0x21000024ff19bb63 live inaccessible


Note You need to log in before you can comment on or make changes to this bug.