Bug 1291406 - [Nimble Storage] no_path_retry not working as expected with active/passive arrays when tur path checker is used. [NEEDINFO]
Summary: [Nimble Storage] no_path_retry not working as expected with active/passive ar...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: device-mapper-multipath
Version: 7.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Zhang Yi
Steven J. Levine
URL:
Whiteboard:
Keywords: OtherQA
Depends On:
Blocks: 1322532 1414128
TreeView+ depends on / blocked
 
Reported: 2015-12-14 19:49 UTC by shivamerla1
Modified: 2017-02-16 06:00 UTC (History)
9 users (show)

(edit)
The `multipathd` daemon no longer reinstates unusable Implicit ALUA ghost paths.


Previously, the `multipathd` daemon automatically reinstated Implicit ALUA devices in the GHOST state, which were not usable.  Multipath would continuously retry unusable devices, if they were the only ones present, instead of failing I/O operations. With this fix, `multipathd` no longer reinstates unusable Implicit ALUA ghost paths. As a result, multipath no longer continually retries I/O operations when only unusuable Implicit ALU
A paths are available.
Clone Of:
: 1322532 1350931 (view as bug list)
(edit)
Last Closed: 2016-11-04 08:14:51 UTC
lilin: needinfo? (shiva.krishna)


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2536 normal SHIPPED_LIVE device-mapper-multipath bug fix and enhancement update 2016-11-03 14:18:10 UTC

Description shivamerla1 2015-12-14 19:49:35 UTC
Description of problem:
TUR path checker treats "StandBy" path as active(ghost) and reinstate path. This causes I/O hang issues and lots of "change" udev events in cases where only stand-by paths are present. 

This can happen during system boot where only stand-by paths are discovered first and continuous retry of I/O's by dm-multipath and change events are hogging multipathd and slowing down the entire boot process with large number of volumes mapped (~100s).

Version-Release number of selected component (if applicable):
3.10.0-229.el7.x86_64

How reproducible:
Easily reproducible.

Steps to Reproduce:

multipath.conf.

devices {
    device {
        vendor               "Nimble"
        product              "Server"
        path_grouping_policy group_by_prio
        prio                 "alua"
        hardware_handler     "1 alua"
        path_checker         tur
        failback             immediate
        fast_io_fail_tmo     10
        no_path_retry        30
        path_selector        "round-robin 0"
    }
}


1. Delete all active paths.

mpathal (2e0176ad6309077166c9ce90033bfa248_1) dm-2 Nimble,Server
size=20G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 8:0:5:1 sdg 8:96  active ghost running
  `- 8:0:6:1 sdh 8:112 active ghost running

2. Issue I/O on mpath with zero active paths.

dd if=/dev/mapper/mpathal of=/dev/null bs=512 count=1 iflag=direct &

Nov 23 15:10:36 hitdev-rhel67 multipathd: mpathal: sdg - tur checker reports path is in standby state
Nov 23 15:10:36 hitdev-rhel67 multipathd: 8:96: reinstated
Nov 23 15:10:36 hitdev-rhel67 multipathd: mpathal: queue_if_no_path enabled
Nov 23 15:10:36 hitdev-rhel67 multipathd: mpathal: Recovered to normal mode
Nov 23 15:10:36 hitdev-rhel67 multipathd: mpathal: remaining active paths: 1
Nov 23 15:10:36 hitdev-rhel67 kernel: sd 8:0:5:1: alua: port group 02 state S non-preferred supports tolusna
Nov 23 15:10:36 hitdev-rhel67 kernel: device-mapper: multipath: Failing path 8:96.
Nov 23 15:10:36 hitdev-rhel67 multipathd: 8:96: mark as failed
Nov 23 15:10:36 hitdev-rhel67 multipathd: mpathal: Entering recovery mode: max_retries=20
Nov 23 15:10:36 hitdev-rhel67 multipathd: mpathal: remaining active paths: 0

3. Monitor udev events every 5 seconds.

[root@hitdev-rhel67 ~]# udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

KERNEL[1448320131.061837] change   /devices/virtual/block/dm-2 (block)
KERNEL[1448320131.064802] change   /devices/virtual/block/dm-2 (block)
UDEV  [1448320131.082838] change   /devices/virtual/block/dm-2 (block)
UDEV  [1448320131.102134] change   /devices/virtual/block/dm-2 (block)
KERNEL[1448320135.551737] change   /devices/virtual/block/dm-2 (block)
KERNEL[1448320135.552701] change   /devices/virtual/block/dm-2 (block)
UDEV  [1448320135.571634] change   /devices/virtual/block/dm-2 (block)
UDEV  [1448320135.591017] change   /devices/virtual/block/dm-2 (block)
KERNEL[1448320136.553368] change   /devices/virtual/block/dm-2 (block)
KERNEL[1448320136.554298] change   /devices/virtual/block/dm-2 (block)
UDEV  [1448320136.572733] change   /devices/virtual/block/dm-2 (block)
UDEV  [1448320136.592089] change   /devices/virtual/block/dm-2 (block)
KERNEL[1448320140.555389] change   /devices/virtual/block/dm-2 (block)
KERNEL[1448320140.556369] change   /devices/virtual/block/dm-2 (block)
UDEV  [1448320140.574944] change   /devices/virtual/block/dm-2 (block)
UDEV  [1448320140.594076] change   /devices/virtual/block/dm-2 (block)


Actual results:
I/O stuck forever and no_path_retry has no effect.


Expected results:
TUR checker should not re-instate "stand-by" paths as I/O will always fail. no_path_retry should work as expected even with only stand-by paths.


Additional info:

Comment 2 Ben Marzinski 2015-12-14 23:04:16 UTC
According to http://www.t10.org/lists/asc-num.htm

04/0B  DZTPROMAEBKVF  LOGICAL UNIT NOT ACCESSIBLE, TARGET PORT IN STANDBY STATE

The tur checker returns that the path is in standby state when in gets a key code qualifier of

key=0x02
asc=0x04
ascq=0x0b

As far as I can tell, this code really does mean that the controller connected to this path is in standby state.  Either the storage needs to not return this value when this path really isn't in a standby state, or you could try using a different path checker, such a directio, that wouldn't be effected by this.

Comment 3 shivamerla1 2015-12-15 00:29:32 UTC
Hi Ben, 

Yes, this is a standby controller path. I manually deleted all active paths to simulate this. The point here is with only standby paths connected to host, I/O will be hung forever. i.e no_path_retry is not working.

The following cycle continues.

1. tur checker will reinstate all standby paths.
2. I/O's will be tried by dm-multipath and will fail as well.
3. dm-multipath fails path.
4. no active paths to device and enters recovery_mode
5. tur checker will reinstate path again and puts device in normal mode.
6. continues again with step 2.

Is there a way we can make tur checker/dm-multipath from reinstating stand-by paths?.

During system boot, when stand-by paths are discovered first, due to sequential SCSI scanning, dm-multipath is hogged with udev "change" events for path failures. Hence cannot process "add" events soon for active paths subsequently.

Comment 4 Ben Marzinski 2015-12-15 05:05:40 UTC
I suppose my next question is, why isn't multipath's hardware handler making these paths active?  These paths would be usable if they were switched to active, correct?

If there are standby paths accessible, multipath is designed to not fail back the IOs. Instead, it should be switching them to active and using them. Does this array all the node to change the state with the SET TARGET GROUP STATES alua command (or to ask it a different way, does this array support explicit ALUA)?  That's what the alua hardware handler does.

Comment 5 shivamerla1 2015-12-15 05:57:32 UTC
Nimble Arrays only support implicit ALUA mode and only active controller owns all LUN's. Controller failover is triggered implicitly only when all hosts connected to array loses active controller access. 

Also, the problem with using other path checkers (directio, readsector0) with active/passive arrays is they will continuously throw errors about path failures on standby paths and might be of concern to user.

Does it make sense to enhance multipath to treat standby paths as non-active paths and fail I/O's after no_path_retry timer?. This is in cases where active paths are not coming back and controller failover is not triggered implicitly.( Case where more than one host is connected to array and only one host loses active array access).

Currently the behavior is causing I/O hang and lots of spurious udev change events for continuous path failures.

Comment 6 shivamerla1 2016-01-14 17:08:31 UTC
Hi Ben, 

any update on this?.

Comment 7 Ben Marzinski 2016-01-14 23:54:53 UTC
Seeing as this is the only case that I know where the array returns Standby but doesn't either use a hardware handler, or automatically failover when IO goes to the passive path, I can't change the tur checker, which gives the correct results for all the other devices that use it.

Probably the best option is to create a new checker. There is already an HP checker that's based on the tur checker, which uses the same code with some #ifdefs to change what's needed.  I could do the same here, and just compile the tur checker code again with a different set of #defines to enable this behaviour.

Comment 8 shivamerla1 2016-01-15 17:29:03 UTC
Ben, I think its more to do with path reinstate rather than path checker behavior.
In this case TUR checker is working fine returning path state as GHOST. Is it possible to prevent the path_reinstate when path is in stand-by(GHOST) state. Currently when oldstate is not PATH_GHOST or PATH_UP and new state is PATH_GHOST, path_reinstate will be initiated. Can this be prevented and instead of reinstate, can paths be just marked as GHOST again if they are in FAILED state?. This is needed in case of only IMPLICIT TPGS mode is supported.

Even with EXPLICIT TPGS mode, if stand-by paths are discovered first, currently with this behavior multipath-tools cause unnecessary lun failover(STPG).

Comment 9 Ben Marzinski 2016-01-15 18:13:32 UTC
They way multipath works, the kernel doesn't know anything about the PATH_GHOST state.  It either views paths as up or down. The way the multipath tools use ghost state is equivalent to saying that this path is up, and the kernel is free to fail over to it.

While it's not very hard to add a new checker function, since these are made to be device specific, changing how the ghost state is handled for only this device is much harder to do.

I understand that you don't want users to see the passive paths as failed. Perhaps the best thing to do would be to add a new state that works like PATH_FAILED, but returns something more useful like "unavailable" or "passive", and have a new path checker that returns this.

Comment 10 shivamerla1 2016-01-15 18:27:42 UTC
Ok, That will work as well. Please let me know once the changes are available, i can test the private package. Thanks.

Comment 11 shivamerla1 2016-01-25 19:51:09 UTC
Hi, Any update on this, can you share the packages to test?. Also we need to port the same fix to 6.7 hosts, can you provide packages for those as well?.

Comment 12 Lin Li 2016-02-02 09:09:29 UTC
Hello  shivamerla1,
Because we don't have Nimble Storage in our lab, could you provide test result on RHEL-7.3?
thanks a lot!

Comment 13 shivamerla1 2016-02-02 20:03:23 UTC
Hi, I don't think anything changed in 7.3 with respect to this. We still use TUR checker which causes unnecessary I/O to be issued on stand-by paths and udev change events for path failures. We need this fix for both 6.x and 7.x.

Comment 14 shivamerla1 2016-02-04 17:48:49 UTC
Hi Ben, Any update on the new path checker to handle active/stand-by paths with implicit TPGS mode?. Please let us know once you have private builds available, we can help with testing. Thanks.

Comment 15 shivamerla1 2016-02-22 20:54:59 UTC
Hi, We have submitted a patch upstream for this. Can you please port the same patch to RHEL 7.x?. Also for 6.x do i need to raise a separate bug or the same can be cloned?.

https://www.redhat.com/archives/dm-devel/2016-February/msg00115.html

Thanks.

Comment 16 shivamerla1 2016-02-24 21:46:49 UTC
We have certain customers waiting on this fix. Can you please provide an update on this case?.

Comment 17 Ben Marzinski 2016-02-26 15:56:33 UTC
I replied to the upstream patch post with the issues I have to the current patch. I'd be happy to make the changes I'd like to see and repost it. Otherwise, you are welcome to rework the patch.

Comment 18 shivamerla1 2016-02-26 17:01:09 UTC
Hi Ben, Thanks for the inputs. please go ahead and fix this issue. With changes affecting other parts of the code with your suggestions, i don't have hardware to test other vendor cases. Let me know once you have the fix, i can help verify with Nimble array.

Comment 19 shivamerla1 2016-02-26 21:12:40 UTC
I saw you latest comment. I will add checks to select_prio() and post and updated patch v3. Let me know if you have other inputs.

Comment 21 shivamerla1 2016-03-18 21:36:14 UTC
Hi Ben, Can you pull the following commit that was applied upstream.

http://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=commit;h=ecf84a828ab12311d45d1de442aa2ff3280109f4

Also, can you port the similar fix for RHEL 6.x as well?.

Thanks.

Comment 22 Ben Marzinski 2016-03-21 14:46:45 UTC
I have. It will go into the next build of the package.

Comment 23 Ben Marzinski 2016-03-30 03:17:55 UTC
Pulled in commit. Thanks for all the work here!

Comment 24 shivamerla1 2016-03-30 03:20:33 UTC
Thanks Ben. Can you port these changes to 6.7 as well.

Comment 25 Ben Marzinski 2016-03-30 16:47:03 UTC
It's too late for this to make RHEL-6.8, but I've created a bug for RHEL-6.9 (Bug 1322532), and it's possible that this could get fixed in a z-stream release.

Comment 26 shivamerla1 2016-03-30 16:55:56 UTC
ok, z-stream release is fine too. Thanks.

Comment 27 Raunak Kumar 2016-06-07 18:33:06 UTC
When is the ETA for this fix for both Redhat 6.8 and 7.2 ?

Comment 28 Ben Marzinski 2016-06-07 20:10:53 UTC
The projected rhel-7.3 release date is sometime in October, I believe. If you need a zstream for this, you need to talk to a support person.

The rhel-6.9 release hasn't even entered the planning phase yet. Again, if you need a rhle-6.8 zstream for Bug 1322532 (the rhel-6 version of this), you need to talk to someone else. I have no authority to approve zstream releases.

Comment 31 Zhang Yi 2016-08-17 08:25:28 UTC
Refer https://bugzilla.redhat.com/show_bug.cgi?id=1351430#c14, change to VERIFIED.

Thanks
Yi

Comment 33 errata-xmlrpc 2016-11-04 08:14:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2536.html

Comment 34 Lin Li 2017-02-16 06:00:17 UTC
hello shivamerla1,
A new bug 1322532 was as a clone of the bug, which needinfo you and which is for RHEL6.9.
Could you help test the package and provide feedback for RHEL6.9 on your environment ?
The fixed version is device-mapper-multipath-0.4.9-94.el6.

thanks!


Note You need to log in before you can comment on or make changes to this bug.