Bug 1398031 - I/O stuck on dm-mpath device even when physical paths are recovered [NEEDINFO]
Summary: I/O stuck on dm-mpath device even when physical paths are recovered
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: device-mapper-multipath
Version: 6.7
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Lin Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-23 22:42 UTC by shivamerla1
Modified: 2017-09-29 22:46 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-29 22:46:06 UTC
Target Upstream Version:
bmarzins: needinfo? (shiva.krishna)


Attachments (Terms of Use)

Description shivamerla1 2016-11-23 22:42:07 UTC
Description of problem:
During controller failover tests with Nimble array, we have seen I/O on one of the dm device has not resumed after stand-by controller takeover. We also see that physical paths have been recovered, but I/O was not retried on those paths. As debugging, we issued certain reads manually on the device and then all stuck I/O's are flushed.

Version-Release number of selected component (if applicable):
2.6.32-573.el6.x86_64

How reproducible:
Seen few times

Steps to Reproduce:
1.Reboot/Fail active controller
2.Stand-by controller will takeover and all iSCSI sessions will be redirected to stand-by.
3.Takeover complete, but I/O is hung on dm device, even after physical paths are recovered.

Actual results:
I/O hung on dm-13, even after paths are recovered.

Expected results:
I/O to resume after controller failover.

Additional info:

mpathbx (28ed72d3288ecea7c6c9ce900d2416567) dm-13 Nimble,Server
size=117G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 25:0:0:0 sdz  65:144 active ready running
  `- 24:0:0:0 sdy  65:128 active ready running

No pending I/O on physical paths:

[root@rtp-smc-qa24-vm2 ~]# cat /sys/block/sdz/inflight 
       0        0
[root@rtp-smc-qa24-vm2 ~]# cat /sys/block/sdy/inflight 
       0        0

Dm device has I/O stuck:

[root@rtp-smc-qa24-vm2 ~]# cat /sys/block/dm-13/inflight 
       4        1

[root@rtp-smc-qa24-vm2 ~]# ls -l  /sys/block/dm-13/slaves/
total 0
lrwxrwxrwx. 1 root root 0 Nov 23 13:02 sdy -> ../../../../platform/host24/session25/target24:0:0/24:0:0:0/block/sdy
lrwxrwxrwx. 1 root root 0 Nov 23 13:02 sdz -> ../../../../platform/host25/session26/target25:0:0/25:0:0:0/block/sdz

dmsetup status:
mpathbx: 0 245760000 multipath 2 0 1 0 1 1 A 0 2 0 65:144 A 1 65:128 A 0

After running some reads, manually the I/O’s are flushed.

[root@rtp-smc-qa24-vm2 ~]# dd if=/dev/dm-13 of=/dev/null bs=512 count=10  iflag=direct
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.852584 s, 6.0 kB/s

[root@rtp-smc-qa24-vm2 ~]# cat /sys/block/dm-13/inflight 
       0        0


Note You need to log in before you can comment on or make changes to this bug.