Hide Forgot
Description of problem: During controller failover tests with Nimble array, we have seen I/O on one of the dm device has not resumed after stand-by controller takeover. We also see that physical paths have been recovered, but I/O was not retried on those paths. As debugging, we issued certain reads manually on the device and then all stuck I/O's are flushed. Version-Release number of selected component (if applicable): 2.6.32-573.el6.x86_64 How reproducible: Seen few times Steps to Reproduce: 1.Reboot/Fail active controller 2.Stand-by controller will takeover and all iSCSI sessions will be redirected to stand-by. 3.Takeover complete, but I/O is hung on dm device, even after physical paths are recovered. Actual results: I/O hung on dm-13, even after paths are recovered. Expected results: I/O to resume after controller failover. Additional info: mpathbx (28ed72d3288ecea7c6c9ce900d2416567) dm-13 Nimble,Server size=117G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=50 status=active |- 25:0:0:0 sdz 65:144 active ready running `- 24:0:0:0 sdy 65:128 active ready running No pending I/O on physical paths: [root@rtp-smc-qa24-vm2 ~]# cat /sys/block/sdz/inflight 0 0 [root@rtp-smc-qa24-vm2 ~]# cat /sys/block/sdy/inflight 0 0 Dm device has I/O stuck: [root@rtp-smc-qa24-vm2 ~]# cat /sys/block/dm-13/inflight 4 1 [root@rtp-smc-qa24-vm2 ~]# ls -l /sys/block/dm-13/slaves/ total 0 lrwxrwxrwx. 1 root root 0 Nov 23 13:02 sdy -> ../../../../platform/host24/session25/target24:0:0/24:0:0:0/block/sdy lrwxrwxrwx. 1 root root 0 Nov 23 13:02 sdz -> ../../../../platform/host25/session26/target25:0:0/25:0:0:0/block/sdz dmsetup status: mpathbx: 0 245760000 multipath 2 0 1 0 1 1 A 0 2 0 65:144 A 1 65:128 A 0 After running some reads, manually the I/O’s are flushed. [root@rtp-smc-qa24-vm2 ~]# dd if=/dev/dm-13 of=/dev/null bs=512 count=10 iflag=direct 10+0 records in 10+0 records out 5120 bytes (5.1 kB) copied, 0.852584 s, 6.0 kB/s [root@rtp-smc-qa24-vm2 ~]# cat /sys/block/dm-13/inflight 0 0