Bug 1948690
| Summary: | ANA state missing when Native NMVe Multipath is disabled | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Marco Patalano <mpatalan> |
| Component: | kernel | Assignee: | Mike Snitzer <msnitzer> |
| kernel sub component: | NVMe | QA Contact: | Marco Patalano <mpatalan> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | high | CC: | agk, bmarzins, emilne, gtiwari, heinzm, hkrzesin, minlei, msnitzer, prajnoha, zkabelac |
| Version: | 9.0 | Keywords: | Triaged |
| Target Milestone: | beta | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-5.13.0-0.rc4.33.el9 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-12-07 21:55:02 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Marco Patalano
2021-04-12 18:33:52 UTC
The NVMe patch we're carrying in RHEL8 needs to be forward-ported to RHEL9. I'll do so ASAP. (In reply to Mike Snitzer from comment #1) > The NVMe patch we're carrying in RHEL8 needs to be forward-ported to RHEL9. Turns out we're carrying 4 patches in RHEL8: ef4ab90c12db [nvme] nvme: Return BLK_STS_TARGET if the DNR bit is set f8fb6ea1226e [nvme] nvme: update failover handling to work with REQ_FAILFAST_TRANSPORT b904f4b8e0f9 [nvme] nvme: decouple basic ANA log page re-read support from native multipathing 7dadadb07251 [nvme] nvme: allow retry for requests with REQ_FAILFAST_TRANSPORT set I'll forward-port them to RHEL9... really not interested in revisiting trying to get upstream NVMe maintainers to take these changes (given how "loaded" allowing proper DM multipath support for NVMe is for Christoph Hellwig). (In reply to Marco Patalano from comment #0) > If this RHEL-9 behavior is expected, then how would path priority work with > device-mapper-multipath when setting prio to ana in the multipath.conf? BTW, this _should_ still "just work". My understanding is that device-mapper-multipath will fallback to doing its own ana_state collection if the sysfs file doesn't exist. Did you find that not to be the case? RHEL8 also has this. I'm not sure if we want to change RHEL9
to be different from the upstream default like we did in RHEL8
give how RHEL9 is supposed to more closely track upstream?
Also, NetApp at least prefers to only support Native NVMe multipath
for their products, they say they don't have the resources to certify
DM-multipath also (even though we tell them they really have to for RHEL).
However, I'm not sure we can do this if we need to support upgrade
e.g. of a boot-from-SAN NVMe fabrics system (when we add this feature)
from RHEL8 to RHEL9.
---
commit 8be4b84f8e35dc6e5453b72fe5e60acef795a299
Author: Ewan Milne <emilne>
Date: Fri Mar 22 17:33:52 2019 -0400
[nvme] nvme: multipath: Change default of kernel NVMe multipath to be disabled
Message-id: <20190322173354.8197-20-emilne>
Patchwork-id: 247393
O-Subject: [RHEL8.1 PATCH 19/21] nvme: multipath: Change default of kernel NVMe multipath to be disabled
Bugzilla: 1690940
RH-Acked-by: Tony Camuso <tcamuso>
RH-Acked-by: Mike Snitzer <snitzer>
Now that CONFIG_NVME_MULTIPATH is enabled, make the default behavior
of the kernel to be "disabled" for compatibility with earlier RHEL8.
RHEL-only
Signed-off-by: Ewan D. Milne <emilne>
Signed-off-by: Herton R. Krzesinski <herton>
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 1c9c3d39ca76..5a3f11439402 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -7,7 +7,7 @@
#include <trace/events/block.h>
#include "nvme.h"
-static bool multipath = true;
+static bool multipath = false;
module_param(multipath, bool, 0444);
MODULE_PARM_DESC(multipath,
"turn on native support for multiple controllers per subsystem");
BTW When I asked about RHEL-specific changes in RHEL9 in the KWF meeting I was told that we need to use CONFIG_RHEL_DIFFERENCES so that we don't end up changing Fedora as well. Supposedly RHEL-specific changes will survive the RHEL9 rebasing. (In reply to Mike Snitzer from comment #3) > (In reply to Marco Patalano from comment #0) > > > If this RHEL-9 behavior is expected, then how would path priority work with > > device-mapper-multipath when setting prio to ana in the multipath.conf? > > BTW, this _should_ still "just work". My understanding is that > device-mapper-multipath will fallback to doing its own ana_state collection > if the sysfs file doesn't exist. Did you find that not to be the case? I think Marco said it didn't work right. Also, without the RHEL-specific changes the error classification (path vs. device-reported) doesn't get handled properly for DM, right? If it really doesn't work correctly without the sysfs file, then that needs looking into. The RHEL9 patch is identical to the RHEL8 patch, and if multipath doesn't get a valid state when trying the sysfs file method, it will just fail back to using the regular ioctl method. If it's not working without the sysfs file, then it's possible that multipath's regular ana priority method is broken, and we've never noticed it because we've always been using the sysfs method. (In reply to Ewan D. Milne from comment #5) > BTW > > When I asked about RHEL-specific changes in RHEL9 in the KWF meeting > I was told that we need to use CONFIG_RHEL_DIFFERENCES so that we > don't end up changing Fedora as well. > > Supposedly RHEL-specific changes will survive the RHEL9 rebasing. I talked to Don Zickus about that today. He said that _may_ be needed but we'll see. I'll get the commit you pointed out added to my MR now (needed to figure out how to do that..) But here is the MR I created for ARK: https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1024 Hello Mike, With Native NVMe Multipath disabled and using device-mapper-multipath, I have 4 paths to the NVMe namespace (2 should be optimized and 2 inaccessible although I am not able to find this info using nvme-cli). [root@storageqe-14 ~]# multipath -ll mpatha (uuid.e8b8f505-afe0-4c77-b5ac-19c0f5460f84) dm-4 NVME,NetApp ONTAP Controller size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:27968:1:1 nvme1n1 259:0 active ready running | `- 2:28032:1:1 nvme2n1 259:2 active ready running `-+- policy='service-time 0' prio=1 status=enabled |- 0:63041:1:1 nvme0n1 259:1 active ready running `- 3:63105:1:1 nvme3n1 259:3 active ready running I then begin to generate I/O to /dev/mapper/mpatha and subsequently perform a storage failover on the array. The 2 paths are removed as expected: [root@storageqe-14 ~]# multipath -ll mpatha (uuid.e8b8f505-afe0-4c77-b5ac-19c0f5460f84) dm-4 NVME,NetApp ONTAP Controller size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=50 status=active |- 0:63041:1:1 nvme0n1 259:1 active ready running `- 3:63105:1:1 nvme3n1 259:3 active ready running The FIO job completes successfully and the paths are restored after the failover/failback completes: [root@storageqe-14 ~]# multipath -ll mpatha (uuid.e8b8f505-afe0-4c77-b5ac-19c0f5460f84) dm-4 NVME,NetApp ONTAP Controller size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 4:64:1:1 nvme4n1 259:0 active ready running | `- 2:128:1:1 nvme2n1 259:2 active ready running `-+- policy='service-time 0' prio=1 status=enabled |- 0:63041:1:1 nvme0n1 259:1 active ready running `- 3:63105:1:1 nvme3n1 259:3 active ready running Therefore, I would say that this is working properly and that my concern is that I am not able to find the ANA state with nvme-cli as we used to with RHEL-8.4. Marco Verified with kernel-5.13.0-0.rc4.33.el9: [root@storageqe-01 ~]# uname -r 5.13.0-0.rc4.33.el9.x86_64 Native NVMe Multipath Disabled: [root@storageqe-01 ~]# cat /sys/module/nvme_core/parameters/multipath N [root@storageqe-01 ~]# multipath -ll 3600a098038304267573f4d37784f6849 dm-3 NETAPP,LUN C-Mode size=15G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active `- 0:0:1:0 sdb 8:16 active ready running mpatha (uuid.15542735-1e61-4cbe-919f-9dc1ba282a6e) dm-4 NVME,NetApp ONTAP Controller size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 0:192:1:1 nvme0n1 259:0 active ready running | `- 1:256:1:1 nvme1n1 259:3 active ready running `-+- policy='service-time 0' prio=1 status=enabled |- 2:3585:1:1 nvme2n1 259:6 active ready running `- 3:3649:1:1 nvme3n1 259:7 active ready running mpathb (uuid.5e417006-37c1-408d-91af-d1fb74d4786b) dm-5 NVME,NetApp ONTAP Controller size=5.0G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 0:192:2:2 nvme0n2 259:1 active ready running | `- 1:256:2:2 nvme1n2 259:4 active ready running `-+- policy='service-time 0' prio=1 status=enabled |- 2:3585:2:2 nvme2n2 259:8 active ready running `- 3:3649:2:2 nvme3n2 259:9 active ready running mpathc (uuid.4a067514-ce49-40d7-ad7e-45342a31e754) dm-6 NVME,NetApp ONTAP Controller size=7.0G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 0:192:3:3 nvme0n3 259:2 active ready running | `- 1:256:3:3 nvme1n3 259:5 active ready running `-+- policy='service-time 0' prio=1 status=enabled |- 2:3585:3:3 nvme2n3 259:10 active ready running `- 3:3649:3:3 nvme3n3 259:11 active ready running [root@storageqe-01 ~]# nvme list-subsys /dev/nvme0n1 nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.c9ecc9187b1111e98c0800a098cbcac6:subsystem.vs_nvme_multipath_1_subsystem_468 \ +- nvme0 fc traddr=nn-0x203b00a098cbcac6:pn-0x203d00a098cbcac6 host_traddr=nn-0x20000024ff19bb62:pn-0x21000024ff19bb62 live optimized +- nvme1 fc traddr=nn-0x203b00a098cbcac6:pn-0x203100a098cbcac6 host_traddr=nn-0x20000024ff19bb62:pn-0x21000024ff19bb62 live optimized +- nvme2 fc traddr=nn-0x203b00a098cbcac6:pn-0x204d00a098cbcac6 host_traddr=nn-0x20000024ff19bb63:pn-0x21000024ff19bb63 live inaccessible +- nvme3 fc traddr=nn-0x203b00a098cbcac6:pn-0x204c00a098cbcac6 host_traddr=nn-0x20000024ff19bb63:pn-0x21000024ff19bb63 live inaccessible |