Bug 1524966
Summary: | [RFE]: NetApp 7.5: Add group_by_prio support in DM-multipath for NVMe namespaces | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | gowrav <gowrav.mahadevaiah> |
Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Lin Li <lilin> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.5 | CC: | agk, akarlsso, bcodding, bmarzins, bubrown, coughlan, dick.kennedy, dinil, emilne, fgarciad, gokulnat, gowrav.mahadevaiah, heinzm, james.smart, jbrassow, laurie.barry, lilin, marting, matt.schulte, msnitzer, ng-hsg-engcustomer-iop-bz, ng-redhat-bugzilla, prajnoha, revers, rhandlin, sschremm |
Target Milestone: | rc | Keywords: | FutureFeature |
Target Release: | 7.8 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-01-09 19:56:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1500798, 1500889, 1563290 |
Description
gowrav
2017-12-12 11:37:17 UTC
Are you using the default settings for NVMe multipaths? The default path grouping policy is FAILOVER, and your output from multipath -ll looks correct for that. This isn't a bug. The only support for multipath NVMe that is being supported in rhel-7.5 is simple failover support. I'll leave this bug around as a feature request for better path grouping support, but that won't be in rhel-7.5 (In reply to Ben Marzinski from comment #3) > This isn't a bug. The only support for multipath NVMe that is being > supported in rhel-7.5 is simple failover support. I'll leave this bug > around as a feature request for better path grouping support, but that won't > be in rhel-7.5 Hi Ben, I shall change the bug title to reflect it as feature request. Is the a possibility to support "group_by_prio" option in future? For ONTAP storage Also in the current release, if we explicitly override "failover" option with "multibus" option, will it work? (In reply to gowrav from comment #5) > Hi Ben, > > I shall change the bug title to reflect it as feature request. Is the a > possibility to support "group_by_prio" option in future? For ONTAP storage Yes, we are actively looking at improving multipath support for NVMe, including better path grouping. That's why I'm leaving this bug open as a feature request. > Also in the current release, if we explicitly override "failover" option > with "multibus" option, will it work? From multipaths point of view, yes, it will create a device with multiple paths that can all get IO. Whether this will actually work in reality depends on your device and drives. This is not tested at all, and I would definitely recommend against it, except for playing around with in non-production setups, but I don't personally know of any specific reasons why it absolutely can't work. cc'ing Mike Snitzer, my understanding is that kernel changes are required to support load balancing (multiple simultaneous active paths) on NVMe devices in dm-multipath. There currently is an upstream NetApp builtin config like this: { /* * NVMe-FC namespace devices: MULTIBUS, queueing preferred * * The hwtable is searched backwards, so place this after "Gener ic NVMe" */ .vendor = "NVME", .product = "^NetApp ONTAP Controller", .pgpolicy = MULTIBUS, .no_path_retry = NO_PATH_RETRY_QUEUE, }, Which has been working fine for people, at least with recent fedora releases. Did you end up doing any testing with MULTIBUS in RHEL-7. At least from a multipath tools perspective, this should work fine, and I don't know of any kernel work that needs to be done for RHEL-7.7 to make this work. Mike there's not anything missing in the kernel to handle multiple NVMe paths per pathgroup in RHEL-7 (non-failback setups), is there? (In reply to Ewan D. Milne from comment #8) > cc'ing Mike Snitzer, my understanding is that kernel changes are required to > support load balancing (multiple simultaneous active paths) on NVMe devices > in dm-multipath. Sorry for late reply, that is only the case for bio-based NVMe (when "queue_mode bio" is specified on DM multipath table load). This was to model what native NVMe multipathing supports. But if "queue_mode rq" (default) or "queue_mode mq" (blk-mq) are used then round-robin will work as usual. (In reply to Ben Marzinski from comment #9) > Did you end up doing any testing with MULTIBUS in RHEL-7. At > least from a multipath tools perspective, this should work fine, and I don't > know of any kernel work that needs to be done for RHEL-7.7 to make this > work. Mike there's not anything missing in the kernel to handle multiple > NVMe paths per pathgroup in RHEL-7 (non-failback setups), is there? No, should be cool. But can you or others test? Or do you need me to? I'm trying to understand the outcome of this discussion. For NetApp E-Series, where we ultimately want to end up is a multipath configuration something like this: device { vendor "NVME" product "NetApp E-Series*" path_grouping_policy group_by_prio prio ana failback immediate no_path_retry 30 } Once we have the updated device-mapper-multipath package installed, which has the ANA prio, is it okay to use 'group_by_prio', or do we need to be using 'failover' for now? Also, do you have recommendation of what to use for queue_mode for NVMe multipath? Thanks, Steve (In reply to Steve Schremmer from comment #11) > I'm trying to understand the outcome of this discussion. For NetApp > E-Series, where we ultimately want to end up is a multipath configuration > something like this: > device { > vendor "NVME" > product "NetApp E-Series*" > path_grouping_policy group_by_prio > prio ana > failback immediate > no_path_retry 30 > } > > Once we have the updated device-mapper-multipath package installed, which > has the ANA prio, is it okay to use 'group_by_prio', or do we need to be > using 'failover' for now? Ben would be the better person to ask. But I _think_ you'd use 'group_by_prio'. > Also, do you have recommendation of what to use for queue_mode for NVMe > multipath? NVMe is only blk-mq so 'queue_mode mq' would be needed (or dm_mod.use_blk_mq=Y on kernel commandline). I doubt it worthwhile to use 'queue_mode bio' because it doesn't support path selectors -- it forces use of failover. We're testing with RHEL 7.7 with the following config in /etc/multipath.conf: devices { device { vendor "NVME" product "NetApp E-Series*" path_grouping_policy group_by_prio failback immediate no_path_retry 30 } } detect_prio defaults to yes and the ANA prio gets used, as expected. I just updated to RHEL 7.7 SS2 (including kernel-3.10.0-1053 and device-mapper-multipath-0.4.9-127. We're not applying any specific settings to queue_mode. Is this okay? (In reply to Steve Schremmer from comment #13) > > We're not applying any specific settings to queue_mode. Is this okay? When I tested the rhel-7 backport of the nvme code, I didn't set queue_mode, and everything appeared to work fine. Mike, is this really necessary in rhel7? Mike, please see Ben's question. (In reply to Ben Marzinski from comment #14) > (In reply to Steve Schremmer from comment #13) > > > > We're not applying any specific settings to queue_mode. Is this okay? > > When I tested the rhel-7 backport of the nvme code, I didn't set queue_mode, > and everything appeared to work fine. Mike, is this really necessary in > rhel7? If you want the DM multipath device to use blk-mq then _yes_ it is required to set "queue_mode mq" (or you can establish dm_mod.use_blk_mq=Y on the kernel commandline and then all request-based DM multipath devices will use blk-mq). Otherwise, as just verified against RHEL7.6, even if the DM-multipath device's underlying paths are all blk-mq the DM-multipath device will still use the old .request_fn request-queue interface. FYI: Both RHEL8 and upstream no longer allow stacking old .request_fn (non-blk-mq) ontop of blk-mq paths (because old .request_fn support no longer exists in those kernels). All said, you don't need to use blk-mq for the DM multipath device but for layering multipath ontop fast NVMe underlying paths it really should offer a performance advantage (because it avoids the locking overhead associated with old .request_fn interface). But using blk-mq in RHEL7 does eliminate the use of traditional IO schedulers (e.g. deadline, cfq) -- which could prove to be an unwelcome change for devices that benefit from that upfront IO scheduling. I hope I've been clear. If not, please feel free to ask follow-up questions (and set needinfo from me accordingly). Ben, please see Mike's comments. Yeah. Running without changing the queue_mode should be fine in RHEL7, and that's what QA and our partners have tested, so I would rather not change the default in rhel-7.8. Switching it to blk-mq is also fine if you want to try to optimize performance. So Steve, if you've tested multipath running with group_by_prio and it's running fine, then I don't know of any reason why it shouldn't be, and I'm fine with changing the default config to what you've used in Comment 13. Steve, do you want go ahead with changing the default config to the one listed in Comment 13 The current plan is to have our E-Series customers modify their multipath.conf with the settings shown in comment 13. We are also using the defaults for any queuing modes. The default config shown in comment 9 is for a different product. O.k. Then I'll pull the config from comment 13 into the default configs. (In reply to Ben Marzinski from comment #21) > O.k. Then I'll pull the config from comment 13 into the default configs. We'd rather not add defaults at this time. Our plan is to have our documentation tell the customer to add the proper lines to /etc/multipath.conf fair enough. So, is there any reason to keep this bug open, or can I close it? Bug was opened by Gowrav, so I think we should ask if he needs it to still stay open. Should this be closed? Awaiting verification from NetApp that the functionality works, then we should close. Yes, this functionality works now, so this bz may be marked as FIXED. That said, there are still a couple of outstanding issues with this feature - tracked in bug 1757348 & bug 1718361. OK, thank you. Closing this BZ as the group_by_prio functionality is present. |