Bug 1398031

Summary: I/O stuck on dm-mpath device even when physical paths are recovered
Product: Red Hat Enterprise Linux 6 Reporter: shivamerla1 <shiva.krishna>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Lin Li <lilin>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.7CC: agk, bmarzins, heinzm, jbrassow, lilin, msnitzer, prajnoha, rbalakri, shiva.krishna, zkabelac
Target Milestone: rcFlags: bmarzins: needinfo? (shiva.krishna)
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-29 22:46:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description shivamerla1 2016-11-23 22:42:07 UTC
Description of problem:
During controller failover tests with Nimble array, we have seen I/O on one of the dm device has not resumed after stand-by controller takeover. We also see that physical paths have been recovered, but I/O was not retried on those paths. As debugging, we issued certain reads manually on the device and then all stuck I/O's are flushed.

Version-Release number of selected component (if applicable):
2.6.32-573.el6.x86_64

How reproducible:
Seen few times

Steps to Reproduce:
1.Reboot/Fail active controller
2.Stand-by controller will takeover and all iSCSI sessions will be redirected to stand-by.
3.Takeover complete, but I/O is hung on dm device, even after physical paths are recovered.

Actual results:
I/O hung on dm-13, even after paths are recovered.

Expected results:
I/O to resume after controller failover.

Additional info:

mpathbx (28ed72d3288ecea7c6c9ce900d2416567) dm-13 Nimble,Server
size=117G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 25:0:0:0 sdz  65:144 active ready running
  `- 24:0:0:0 sdy  65:128 active ready running

No pending I/O on physical paths:

[root@rtp-smc-qa24-vm2 ~]# cat /sys/block/sdz/inflight 
       0        0
[root@rtp-smc-qa24-vm2 ~]# cat /sys/block/sdy/inflight 
       0        0

Dm device has I/O stuck:

[root@rtp-smc-qa24-vm2 ~]# cat /sys/block/dm-13/inflight 
       4        1

[root@rtp-smc-qa24-vm2 ~]# ls -l  /sys/block/dm-13/slaves/
total 0
lrwxrwxrwx. 1 root root 0 Nov 23 13:02 sdy -> ../../../../platform/host24/session25/target24:0:0/24:0:0:0/block/sdy
lrwxrwxrwx. 1 root root 0 Nov 23 13:02 sdz -> ../../../../platform/host25/session26/target25:0:0/25:0:0:0/block/sdz

dmsetup status:
mpathbx: 0 245760000 multipath 2 0 1 0 1 1 A 0 2 0 65:144 A 1 65:128 A 0

After running some reads, manually the I/O’s are flushed.

[root@rtp-smc-qa24-vm2 ~]# dd if=/dev/dm-13 of=/dev/null bs=512 count=10  iflag=direct
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.852584 s, 6.0 kB/s

[root@rtp-smc-qa24-vm2 ~]# cat /sys/block/dm-13/inflight 
       0        0