Bug 591608
Created attachment 413489 [details]
output of "ps aux" after the failure. I could see numerous processes in D states
Created attachment 413490 [details]
Process traces(output of "echo t >/proc/sysrq-trigger")
Created attachment 413491 [details]
output of "multipath -ll"
Created attachment 413492 [details]
/etc/multipath.conf
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Ben, Please take a look at the messages. Is this a case where we don't switch to an alternate path, so I/O hangs indefinitely? Thanks. Tom I'm confused about when this output was gathered. Looking at the ps aux output, there are a lot of kdmflush processes in the D state, but there are none in the process traces. Did these get out of the D state before you ran the sysrq? Also, I don't understand the state that the devicees are in, in the multipath -ll output. For, instance, looking at mpathah: from /var/log/messages, I see the following messages related to mpathah: May 11 11:29:30 txmpp1 multipathd: sdah: add path (uevent) May 11 11:29:30 txmpp1 multipathd: mpathah: load table [0 8388608 multipath 2 pg_init_retries 50 1 rdac 1 1 round-robin 0 1 1 66:16 1000] May 11 11:29:30 txmpp1 multipathd: mpathah: event checker started May 11 11:29:30 txmpp1 multipathd: sdah path added to devmap mpathah May 11 11:29:43 txmpp1 multipathd: mpathah: load table [0 8388608 multipath 2 pg_init_retries 50 1 rdac 2 1 round-robin 0 1 1 66:16 1000 round-robin 0 1 1 66:272 May 11 11:29:43 txmpp1 multipathd: sdkd path added to devmap mpathah May 11 11:46:06 txmpp1 kernel: device-mapper: multipath: Failing path 66:16. May 11 11:48:06 txmpp1 multipathd: 66:16: mark as failed May 11 11:48:06 txmpp1 multipathd: mpathah: remaining active paths: 1 May 11 11:48:15 txmpp1 multipathd: sdah: rdac checker reports path is ghost May 11 11:48:15 txmpp1 multipathd: 66:16: reinstated May 11 11:48:15 txmpp1 multipathd: mpathah: remaining active paths: 2 May 11 11:48:18 txmpp1 multipathd: mpathah: switch to path group #1 May 11 11:48:18 txmpp1 multipathd: mpathah: switch to path group #1 May 11 11:48:18 txmpp1 kernel: sd 8:0:0:28: rdac: array SnowMass_126_127, ctlr 0, queueing MODE_SELECT command May 11 11:48:35 txmpp1 kernel: sd 8:0:0:28: rdac: array SnowMass_126_127, ctlr 0, MODE_SELECT completed May 11 11:48:49 txmpp1 multipathd: sdah: rdac checker reports path is up May 11 11:48:49 txmpp1 multipathd: 66:16: reinstated So, at the end, sdah should be in "ready", however, its in "ghost" according to the multipath -ll output. mpathah (360080e50001b09820000599b4be8866e) dm-1 LSI,INF-01-00 size=4.0G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=4 status=enabled | `- 8:0:0:28 sdah 66:16 failed ghost running `-+- policy='round-robin 0' prio=3 status=enabled `- 9:0:0:28 sdkd 66:272 failed ready running So, I'm just wondering when all these pieces of information were gathered in relation to the hang and to eachother. Also, looking at the your /var/log/messages output, it appears thea this reproduced right away. Is it possible to manually do this manually, and with only one multipath device? Or does it need a large number of devices and/or the script to reproduce it? Also, what does LinuxSmash64 do? My best guess is that multipathd switched back to the new path group, and the rdac handler activated the other controller (which can be seen in the /var/log/messages output), but the queued IO never got sent. However, I can't be sure. If the multipath -ll output was taken from after all this happened, then you should see the high priority path in the "ready" state, not the "ghost" state. There is also a bug that is causing multipathd to switch the pathgroup twice in a row under some circumstances. I'm not sure if this is part of the problem, but I can definitely fix that. multipath -ll output was taken before the failure when everything was optimal. ps aux and sysrq output was taken almost together after the failure. Just to be clear, logs are collected much after the failure. LinuxSmash64 is out I/O tool, which sends I/O to raw devices. I will attempt one more recreate manually today .. Let me know if you want me to try your fix(switch pathgroup). Please point me where i can get your fix. I have recreated this issue one more time. This time only with 10 luns. Failure timestamp is May 21 14:43:29. This is the time when one of the path is coming back from temparory offline state. Luns should become optimal after this point. I am taken the output of sg_map command. I see that all the paths are online and responding to the commands. I am going to attach the logs.. Created attachment 415774 [details]
messages file from second recreate(failure timestamp is May 21 14:43:29)
Created attachment 415775 [details]
multipath -ll output before the failure
Created attachment 415776 [details]
multipath -ll output after the failure
This output shows some of the physical paths as failed. But I found that physical paths are actually not failed. I could send inquiry commands and they are responding.
Created attachment 415777 [details]
ps aux output after the failure.
Created attachment 415778 [details]
output of "echo t > /proc/sysrq-trigger"
Created attachment 415779 [details]
output of sg_map -x to show devices are respoding.
Here are some packages that have the fix to switching the pathgroup multiple times: http://people.redhat.com/bmarzins/device-mapper-multipath/rpms/RHEL6/ unfortunately, I don't think this is likely to help.. But I could be wrong, so if you can get around to testing these, that would be great. I'm going to try reproducing this with some RDAC storage, since I can't reproduce it without it. Ben, I am able reproduce this issue even with 1 LUN. Also I am seeing same hang during failover and all path failure condition too. I verified with new device-mapper-multipath tool mentioned in the above link and it did not help much. I am attaching the logs for the hang during failover. Following steps I ran to repoduce it. step 1: Start IO on the dm device Step 2: fail one contoller IO hang was seen after the controller was failed. multipath -ll output after IO hang: ======================== [root@linux ~]# multipath -ll 3600a0b800029ebc00000cd604bfa1055 dm-0 LSI,INF-01-00 size=2.0G features='2 pg_init_retries 46' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=6 status=active | |- 1:0:1:1 sdc 8:32 active ready running | `- 4:0:0:1 sdd 8:48 active ready running `-+- policy='round-robin 0' prio=0 status=enabled |- #:#:#:# - #:# active faulty running `- #:#:#:# - #:# active faulty running Created attachment 416876 [details]
syslog for IO hang during failover
Attached is the syslog with output of "echo t > /proc/sysrq-trigger"
Ben, Any updates on this? This is blocking all our multipath tests. Let us know if you need any more recreates from our side.. Thanks Ben, Few more inputs from my side. Dm device is going in suspend state during table reload and never resumes back. Looks to me the reason IO hang. Do you have any update? What version of device-mapper and the kernel are you using? Can you please attach the output of: # dmsetup info from when the device hangs. Also, could you attach the output from # dmsetup table --inactive I'm trying to figure out where the device is getting suspended... You should see an error calling domap from all of the places that would reload the map. I wonder if there's some bug where you can device-mapper can return success and leave the map suspended. Created attachment 418881 [details] systemtap script to figure out why multipath is getting stuck in suspend. This is a systemtap script which will hopefully narrow down where things are going wrong to cause multipath devices to get stuck in the suspended state. You will need the systemtap package, and possibly some kernel packages to use this. To run it, you simply do # stap bz591608.stp Once you see it print starting you can start your test. This will print lots of information out, so don't want to run the test with too many devices. When you are done collecting information, you can stop the script with CTRL-C. You should see something that looks like starting multipath[5374] entered dev_suspend for mpathb with flags 3588 multipath[5374] entered dm_suspend multipath[5374] exitted dm_suspend with 0 multipath[5374] entered dm_swap_table multipath[5374] exitted dm_swap_table with -131936637831680 multipath[5374] entered dm_resume multipath[5374] exitted dm_resume with 0 multipath[5374] exitted dev_suspend with 0 multipath[5374] entered dev_suspend for mpathc with flags 3588 multipath[5374] entered dm_suspend multipath[5374] exitted dm_suspend with 0 multipath[5374] entered dm_swap_table multipath[5374] exitted dm_swap_table with -131936615876608 multipath[5374] entered dm_resume multipath[5374] exitted dm_resume with 0 multipath[5374] exitted dev_suspend with 0 ending Please attach the entire output of the script, along with the output of /var/log/messages from the test. Created attachment 418885 [details]
newer version with timestamps to make it easer to correlate with logs
Following are the o/p captured after the hang: ------------------------------------------------------- [root@linux ~]# dmsetup info Name: 3600a0b800029ebc00000cd604bfa1055 State: SUSPENDED Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 0 Number of targets: 1 UUID: mpath-3600a0b800029ebc00000cd604bfa1055 [root@linux ~]# dmsetup table --inactive 3600a0b800029ebc00000cd604bfa1055: [root@linux-214-RHEL6beta1 ~]# rpm -qa |grep device-mapper device-mapper-event-1.02.43-1.el6.x86_64 device-mapper-multipath-0.4.9-21.el6.x86_64 device-mapper-libs-1.02.43-1.el6.x86_64 device-mapper-1.02.43-1.el6.x86_64 device-mapper-devel-1.02.38-2.fc12.x86_64 device-mapper-event-libs-1.02.43-1.el6.x86_64 device-mapper-multipath-libs-0.4.9-21.el6.x86_64 I will configure systemtab in my system and will share the results. Thanks. I mentioned that you may need some more kernel packages for this script. To be specific, you need to have the appropriate kernel-debuginfo, kernel-debuginfo-common, kernel-headers, and kernel-devel packages. Vijay, by posting the FTP location in a public Bugzilla you are violating the NDA. Please avoid doing so. *** Bug 590069 has been marked as a duplicate of this bug. *** Created attachment 419469 [details]
yet another version. This one prints out more information from dm_swap_table
There are also some debug packages available at: http://people.redhat.com/bmarzins/device-mapper-multipath/rpms/RHEL6/x86_64/ These will print out the size and parameters of the new table, whenever multipath tries to reload it. Hopefully, with these packages, and the systemtap script, we should be able to figure out what's going wrong here. The output will go to /var/log/messages, so I still just need the output of the script and /var/log/messages from when you reproduce this issue. I upgraded my host to RHEL6 snapshot6 and I am not seeing this issue anymore. Thanks. I'm reopening this issue since I dup'ed 590069 to it, and the multipath package shouldn't have changed. Since I'm not sure what was wrong, I'm not willing to say for certain that it's fixed. Vijay, could you please attach your current device-mapper-multipath and kernel versions. (In reply to comment #35) > I'm reopening this issue since I dup'ed 590069 to it, and the multipath package > shouldn't have changed. Since I'm not sure what was wrong, I'm not willing to > say for certain that it's fixed. Vijay, could you please attach your current > device-mapper-multipath and kernel versions. Whoops! Sorry Ben! Oh, and can you also attach the version of the device-mapper package? Ben, Kernel version is kernel-2.6.32-30.el6.x86_64 and device-mapper-multipath version is device-mapper-multipath-0.4.9-21.el6.x86_64 device-mapper-1.02.48-2.el6.x86_64 I can confirm that I am no longer seeing the issue described in 590069 with Snapshot 6. Well, I suppose that if neither of you can see it, whatever it was must have been fixed. I'm assuming that it was a kernel issue. |
Created attachment 413488 [details] /var/log/messages Description of problem: We have noticed an I/O session hang with max Lun test(multipathd with exception) Host had 256 Luns mapped with two paths to each LUN(one active and one passive). Session would hanng forever and we cannot close the session. Only way to come out this problem is to reboot. I could see the kdmflush process in "D" states when collected "ps aux". Also same is seen when collect process traces(echo t > /proc/sysrq-trigger). I will attach all these logs. Version-Release number of selected component (if applicable): RHEL 6 Snapshot 2. Linux txmpp1 2.6.32-23.el6.x86_64 #1 SMP Tue Apr 27 21:17:28 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux How reproducible: Always Steps to Reproduce: 1. Install RHEL 6 2. Attach the active-passive storage 3. Present numerous luns(in this case 256) 4. Configure multipath(I had one active and one passive paths) 5. Start raw I/O on all 256 luns 6. Reboot the active/passive controler one after the other with 15 minutes gap between each reboot. 7. Run it for 24 hours Actual results: I/O got stuck and can't close I/O session Expected results: I/O should continue and I should be able to close the session. Additional info: