Description of problem:
TUR path checker treats "StandBy" path as active(ghost) and reinstate path. This causes I/O hang issues and lots of "change" udev events in cases where only stand-by paths are present.
This can happen during system boot where only stand-by paths are discovered first and continuous retry of I/O's by dm-multipath and change events are hogging multipathd and slowing down the entire boot process with large number of volumes mapped (~100s).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
hardware_handler "1 alua"
path_selector "round-robin 0"
1. Delete all active paths.
mpathal (2e0176ad6309077166c9ce90033bfa248_1) dm-2 Nimble,Server
size=20G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 8:0:5:1 sdg 8:96 active ghost running
`- 8:0:6:1 sdh 8:112 active ghost running
2. Issue I/O on mpath with zero active paths.
dd if=/dev/mapper/mpathal of=/dev/null bs=512 count=1 iflag=direct &
Nov 23 15:10:36 hitdev-rhel67 multipathd: mpathal: sdg - tur checker reports path is in standby state
Nov 23 15:10:36 hitdev-rhel67 multipathd: 8:96: reinstated
Nov 23 15:10:36 hitdev-rhel67 multipathd: mpathal: queue_if_no_path enabled
Nov 23 15:10:36 hitdev-rhel67 multipathd: mpathal: Recovered to normal mode
Nov 23 15:10:36 hitdev-rhel67 multipathd: mpathal: remaining active paths: 1
Nov 23 15:10:36 hitdev-rhel67 kernel: sd 8:0:5:1: alua: port group 02 state S non-preferred supports tolusna
Nov 23 15:10:36 hitdev-rhel67 kernel: device-mapper: multipath: Failing path 8:96.
Nov 23 15:10:36 hitdev-rhel67 multipathd: 8:96: mark as failed
Nov 23 15:10:36 hitdev-rhel67 multipathd: mpathal: Entering recovery mode: max_retries=20
Nov 23 15:10:36 hitdev-rhel67 multipathd: mpathal: remaining active paths: 0
3. Monitor udev events every 5 seconds.
[root@hitdev-rhel67 ~]# udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent
KERNEL[1448320131.061837] change /devices/virtual/block/dm-2 (block)
KERNEL[1448320131.064802] change /devices/virtual/block/dm-2 (block)
UDEV [1448320131.082838] change /devices/virtual/block/dm-2 (block)
UDEV [1448320131.102134] change /devices/virtual/block/dm-2 (block)
KERNEL[1448320135.551737] change /devices/virtual/block/dm-2 (block)
KERNEL[1448320135.552701] change /devices/virtual/block/dm-2 (block)
UDEV [1448320135.571634] change /devices/virtual/block/dm-2 (block)
UDEV [1448320135.591017] change /devices/virtual/block/dm-2 (block)
KERNEL[1448320136.553368] change /devices/virtual/block/dm-2 (block)
KERNEL[1448320136.554298] change /devices/virtual/block/dm-2 (block)
UDEV [1448320136.572733] change /devices/virtual/block/dm-2 (block)
UDEV [1448320136.592089] change /devices/virtual/block/dm-2 (block)
KERNEL[1448320140.555389] change /devices/virtual/block/dm-2 (block)
KERNEL[1448320140.556369] change /devices/virtual/block/dm-2 (block)
UDEV [1448320140.574944] change /devices/virtual/block/dm-2 (block)
UDEV [1448320140.594076] change /devices/virtual/block/dm-2 (block)
I/O stuck forever and no_path_retry has no effect.
TUR checker should not re-instate "stand-by" paths as I/O will always fail. no_path_retry should work as expected even with only stand-by paths.
According to http://www.t10.org/lists/asc-num.htm
04/0B DZTPROMAEBKVF LOGICAL UNIT NOT ACCESSIBLE, TARGET PORT IN STANDBY STATE
The tur checker returns that the path is in standby state when in gets a key code qualifier of
As far as I can tell, this code really does mean that the controller connected to this path is in standby state. Either the storage needs to not return this value when this path really isn't in a standby state, or you could try using a different path checker, such a directio, that wouldn't be effected by this.
Yes, this is a standby controller path. I manually deleted all active paths to simulate this. The point here is with only standby paths connected to host, I/O will be hung forever. i.e no_path_retry is not working.
The following cycle continues.
1. tur checker will reinstate all standby paths.
2. I/O's will be tried by dm-multipath and will fail as well.
3. dm-multipath fails path.
4. no active paths to device and enters recovery_mode
5. tur checker will reinstate path again and puts device in normal mode.
6. continues again with step 2.
Is there a way we can make tur checker/dm-multipath from reinstating stand-by paths?.
During system boot, when stand-by paths are discovered first, due to sequential SCSI scanning, dm-multipath is hogged with udev "change" events for path failures. Hence cannot process "add" events soon for active paths subsequently.
I suppose my next question is, why isn't multipath's hardware handler making these paths active? These paths would be usable if they were switched to active, correct?
If there are standby paths accessible, multipath is designed to not fail back the IOs. Instead, it should be switching them to active and using them. Does this array all the node to change the state with the SET TARGET GROUP STATES alua command (or to ask it a different way, does this array support explicit ALUA)? That's what the alua hardware handler does.
Nimble Arrays only support implicit ALUA mode and only active controller owns all LUN's. Controller failover is triggered implicitly only when all hosts connected to array loses active controller access.
Also, the problem with using other path checkers (directio, readsector0) with active/passive arrays is they will continuously throw errors about path failures on standby paths and might be of concern to user.
Does it make sense to enhance multipath to treat standby paths as non-active paths and fail I/O's after no_path_retry timer?. This is in cases where active paths are not coming back and controller failover is not triggered implicitly.( Case where more than one host is connected to array and only one host loses active array access).
Currently the behavior is causing I/O hang and lots of spurious udev change events for continuous path failures.
any update on this?.
Seeing as this is the only case that I know where the array returns Standby but doesn't either use a hardware handler, or automatically failover when IO goes to the passive path, I can't change the tur checker, which gives the correct results for all the other devices that use it.
Probably the best option is to create a new checker. There is already an HP checker that's based on the tur checker, which uses the same code with some #ifdefs to change what's needed. I could do the same here, and just compile the tur checker code again with a different set of #defines to enable this behaviour.
Ben, I think its more to do with path reinstate rather than path checker behavior.
In this case TUR checker is working fine returning path state as GHOST. Is it possible to prevent the path_reinstate when path is in stand-by(GHOST) state. Currently when oldstate is not PATH_GHOST or PATH_UP and new state is PATH_GHOST, path_reinstate will be initiated. Can this be prevented and instead of reinstate, can paths be just marked as GHOST again if they are in FAILED state?. This is needed in case of only IMPLICIT TPGS mode is supported.
Even with EXPLICIT TPGS mode, if stand-by paths are discovered first, currently with this behavior multipath-tools cause unnecessary lun failover(STPG).
They way multipath works, the kernel doesn't know anything about the PATH_GHOST state. It either views paths as up or down. The way the multipath tools use ghost state is equivalent to saying that this path is up, and the kernel is free to fail over to it.
While it's not very hard to add a new checker function, since these are made to be device specific, changing how the ghost state is handled for only this device is much harder to do.
I understand that you don't want users to see the passive paths as failed. Perhaps the best thing to do would be to add a new state that works like PATH_FAILED, but returns something more useful like "unavailable" or "passive", and have a new path checker that returns this.
Ok, That will work as well. Please let me know once the changes are available, i can test the private package. Thanks.
Hi, Any update on this, can you share the packages to test?. Also we need to port the same fix to 6.7 hosts, can you provide packages for those as well?.
Because we don't have Nimble Storage in our lab, could you provide test result on RHEL-7.3?
thanks a lot!
Hi, I don't think anything changed in 7.3 with respect to this. We still use TUR checker which causes unnecessary I/O to be issued on stand-by paths and udev change events for path failures. We need this fix for both 6.x and 7.x.
Hi Ben, Any update on the new path checker to handle active/stand-by paths with implicit TPGS mode?. Please let us know once you have private builds available, we can help with testing. Thanks.
Hi, We have submitted a patch upstream for this. Can you please port the same patch to RHEL 7.x?. Also for 6.x do i need to raise a separate bug or the same can be cloned?.
We have certain customers waiting on this fix. Can you please provide an update on this case?.
I replied to the upstream patch post with the issues I have to the current patch. I'd be happy to make the changes I'd like to see and repost it. Otherwise, you are welcome to rework the patch.
Hi Ben, Thanks for the inputs. please go ahead and fix this issue. With changes affecting other parts of the code with your suggestions, i don't have hardware to test other vendor cases. Let me know once you have the fix, i can help verify with Nimble array.
I saw you latest comment. I will add checks to select_prio() and post and updated patch v3. Let me know if you have other inputs.
Hi Ben, Can you pull the following commit that was applied upstream.
Also, can you port the similar fix for RHEL 6.x as well?.
I have. It will go into the next build of the package.
Pulled in commit. Thanks for all the work here!
Thanks Ben. Can you port these changes to 6.7 as well.
It's too late for this to make RHEL-6.8, but I've created a bug for RHEL-6.9 (Bug 1322532), and it's possible that this could get fixed in a z-stream release.
ok, z-stream release is fine too. Thanks.
When is the ETA for this fix for both Redhat 6.8 and 7.2 ?
The projected rhel-7.3 release date is sometime in October, I believe. If you need a zstream for this, you need to talk to a support person.
The rhel-6.9 release hasn't even entered the planning phase yet. Again, if you need a rhle-6.8 zstream for Bug 1322532 (the rhel-6 version of this), you need to talk to someone else. I have no authority to approve zstream releases.
Refer https://bugzilla.redhat.com/show_bug.cgi?id=1351430#c14, change to VERIFIED.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
A new bug 1322532 was as a clone of the bug, which needinfo you and which is for RHEL6.9.
Could you help test the package and provide feedback for RHEL6.9 on your environment ?
The fixed version is device-mapper-multipath-0.4.9-94.el6.