Bug 591608

Summary: [LSI 6.0 bug] IO session hang with max(256) Lun test with multipath enabled during exception test
Product: Red Hat Enterprise Linux 6 Reporter: Babu Moger <babu.moger>
Component: kernelAssignee: Ben Marzinski <bmarzins>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: medium    
Version: 6.0CC: andriusb, bdonahue, coughlan, dl-iop-bugzilla, lvm-team, revers, Sean.Stewart, vijay.chauhan
Target Milestone: rcKeywords: OtherQA, Reopened
Target Release: 6.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-07 18:57:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 554559    
Attachments:
Description Flags
/var/log/messages
none
output of "ps aux" after the failure. I could see numerous processes in D states
none
Process traces(output of "echo t >/proc/sysrq-trigger")
none
output of "multipath -ll"
none
/etc/multipath.conf
none
messages file from second recreate(failure timestamp is May 21 14:43:29)
none
multipath -ll output before the failure
none
multipath -ll output after the failure
none
ps aux output after the failure.
none
output of "echo t > /proc/sysrq-trigger"
none
output of sg_map -x to show devices are respoding.
none
syslog for IO hang during failover
none
systemtap script to figure out why multipath is getting stuck in suspend.
none
newer version with timestamps to make it easer to correlate with logs
none
yet another version. This one prints out more information from dm_swap_table none

Description Babu Moger 2010-05-12 17:02:41 UTC
Created attachment 413488 [details]
/var/log/messages

Description of problem:

We have noticed an I/O session hang with max Lun test(multipathd with exception)

Host had 256 Luns mapped with two paths to each LUN(one active and one passive).

Session would hanng forever and we cannot close the session. Only way to come out this problem is to reboot.

I could see the kdmflush process in "D" states when collected "ps aux".
Also same is seen when collect process traces(echo t > /proc/sysrq-trigger).

I will attach all these logs.

Version-Release number of selected component (if applicable):

RHEL 6 Snapshot 2.
Linux txmpp1 2.6.32-23.el6.x86_64 #1 SMP Tue Apr 27 21:17:28 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
Always

Steps to Reproduce:
1. Install RHEL 6
2. Attach the active-passive storage
3. Present numerous luns(in this case 256)
4. Configure multipath(I had one active and one passive paths)
5. Start raw I/O on all 256 luns
6. Reboot the active/passive controler one after the other with 15 minutes gap between each reboot.
7. Run it for 24 hours
  
Actual results:

I/O got stuck and can't close I/O session

Expected results:
I/O should continue and I should be able to close the session.

Additional info:

Comment 1 Babu Moger 2010-05-12 17:05:14 UTC
Created attachment 413489 [details]
output of "ps aux" after the failure. I could see numerous processes in D states

Comment 2 Babu Moger 2010-05-12 17:06:10 UTC
Created attachment 413490 [details]
Process traces(output of "echo t >/proc/sysrq-trigger")

Comment 4 Babu Moger 2010-05-12 17:06:58 UTC
Created attachment 413491 [details]
output of "multipath -ll"

Comment 5 Babu Moger 2010-05-12 17:07:34 UTC
Created attachment 413492 [details]
/etc/multipath.conf

Comment 6 RHEL Program Management 2010-05-12 18:07:26 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 7 Tom Coughlan 2010-05-19 13:00:21 UTC
Ben,

Please take a look at the messages. Is this a case where we don't switch to an alternate path, so I/O hangs indefinitely? 

Thanks.

Tom

Comment 8 Ben Marzinski 2010-05-20 03:24:46 UTC
I'm confused about when this output was gathered.  Looking at the ps aux output, there are a lot of kdmflush processes in the D state, but there are none in the
process traces.  Did these get out of the D state before you ran the sysrq? Also, I don't understand the state that the devicees are in, in the multipath -ll output. For, instance, looking at mpathah:

from /var/log/messages, I see the following messages related to mpathah:

May 11 11:29:30 txmpp1 multipathd: sdah: add path (uevent)
May 11 11:29:30 txmpp1 multipathd: mpathah: load table [0 8388608 multipath 2 pg_init_retries 50 1 rdac 1 1 round-robin 0 1 1 66:16 1000]
May 11 11:29:30 txmpp1 multipathd: mpathah: event checker started
May 11 11:29:30 txmpp1 multipathd: sdah path added to devmap mpathah
May 11 11:29:43 txmpp1 multipathd: mpathah: load table [0 8388608 multipath 2 pg_init_retries 50 1 rdac 2 1 round-robin 0 1 1 66:16 1000 round-robin 0 1 1 66:272
May 11 11:29:43 txmpp1 multipathd: sdkd path added to devmap mpathah
May 11 11:46:06 txmpp1 kernel: device-mapper: multipath: Failing path 66:16.
May 11 11:48:06 txmpp1 multipathd: 66:16: mark as failed
May 11 11:48:06 txmpp1 multipathd: mpathah: remaining active paths: 1
May 11 11:48:15 txmpp1 multipathd: sdah: rdac checker reports path is ghost
May 11 11:48:15 txmpp1 multipathd: 66:16: reinstated
May 11 11:48:15 txmpp1 multipathd: mpathah: remaining active paths: 2
May 11 11:48:18 txmpp1 multipathd: mpathah: switch to path group #1
May 11 11:48:18 txmpp1 multipathd: mpathah: switch to path group #1
May 11 11:48:18 txmpp1 kernel: sd 8:0:0:28: rdac: array SnowMass_126_127, ctlr 0, queueing MODE_SELECT command
May 11 11:48:35 txmpp1 kernel: sd 8:0:0:28: rdac: array SnowMass_126_127, ctlr 0, MODE_SELECT completed
May 11 11:48:49 txmpp1 multipathd: sdah: rdac checker reports path is up
May 11 11:48:49 txmpp1 multipathd: 66:16: reinstated

So, at the end, sdah should be in "ready", however, its in "ghost" according
to the multipath -ll output.

mpathah (360080e50001b09820000599b4be8866e) dm-1 LSI,INF-01-00
size=4.0G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=4 status=enabled
| `- 8:0:0:28  sdah 66:16   failed ghost running
`-+- policy='round-robin 0' prio=3 status=enabled
  `- 9:0:0:28  sdkd 66:272  failed ready running

So, I'm just wondering when all these pieces of information were gathered in relation to the hang and to eachother.

Also, looking at the your /var/log/messages output, it appears thea this reproduced right away.  Is it possible to manually do this manually, and with only one multipath device? Or does it need a large number of devices and/or the script to reproduce it? Also, what does LinuxSmash64 do?

Comment 9 Ben Marzinski 2010-05-20 03:46:18 UTC
My best guess is that multipathd switched back to the new path group, and the rdac handler activated the other controller (which can be seen in the /var/log/messages output), but the queued IO never got sent.  However, I can't be sure. If the multipath -ll output was taken from after all this happened, then you should see the high priority path in the "ready" state, not the "ghost" state.

There is also a bug that is causing multipathd to switch the pathgroup twice in a row under some circumstances. I'm not sure if this is part of the problem, but I can definitely fix that.

Comment 10 Babu Moger 2010-05-20 15:21:51 UTC
multipath -ll output was taken before the failure when everything was optimal.
ps aux and sysrq output was taken almost together after the failure. Just to be clear, logs are collected much after the failure. LinuxSmash64 is out I/O tool, which sends I/O to raw devices.

I will attempt one more recreate manually today ..  Let me know if you want me to try your fix(switch pathgroup). Please point me where i can get your fix.

Comment 11 Babu Moger 2010-05-21 21:02:31 UTC
I have recreated this issue one more time. This time only with 10 luns. Failure timestamp is May 21 14:43:29.  This is the time when one of the path is coming back from temparory offline state.  Luns should become optimal after this point. I am taken the output of sg_map command.  I see that all the paths are online and responding to the commands.  I am going to attach the logs..

Comment 12 Babu Moger 2010-05-21 21:05:17 UTC
Created attachment 415774 [details]
messages file from second recreate(failure timestamp is May 21 14:43:29)

Comment 13 Babu Moger 2010-05-21 21:07:25 UTC
Created attachment 415775 [details]
multipath -ll output before the failure

Comment 14 Babu Moger 2010-05-21 21:09:32 UTC
Created attachment 415776 [details]
multipath -ll output after the failure

This output shows some of the physical paths as failed. But I found that physical paths are actually not failed.  I could send inquiry commands and they are responding.

Comment 15 Babu Moger 2010-05-21 21:10:13 UTC
Created attachment 415777 [details]
ps aux output after the failure.

Comment 16 Babu Moger 2010-05-21 21:11:23 UTC
Created attachment 415778 [details]
output of "echo t > /proc/sysrq-trigger"

Comment 17 Babu Moger 2010-05-21 21:12:10 UTC
Created attachment 415779 [details]
output of sg_map -x to show devices are respoding.

Comment 18 Ben Marzinski 2010-05-22 00:25:20 UTC
Here are some packages that have the fix to switching the pathgroup multiple times:

http://people.redhat.com/bmarzins/device-mapper-multipath/rpms/RHEL6/

unfortunately, I don't think this is likely to help.. But I could be wrong, so if you can get around to testing these, that would be great.  I'm going to try reproducing this with some RDAC storage, since I can't reproduce it without it.

Comment 19 Vijay Kumar 2010-05-26 15:09:21 UTC
Ben,

I am able reproduce this issue even with 1 LUN. Also I am seeing same hang during failover and all path failure condition too. 

I verified with new device-mapper-multipath tool mentioned in the above link and it did not help much.

I am attaching the logs for the hang during failover. Following steps I ran to repoduce it.

step 1: Start IO on the dm device
Step 2: fail one contoller

IO hang was seen after the controller was failed.

multipath -ll output after IO hang:
========================
[root@linux ~]# multipath -ll
3600a0b800029ebc00000cd604bfa1055 dm-0 LSI,INF-01-00
size=2.0G features='2 pg_init_retries 46' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| |- 1:0:1:1 sdc 8:32 active ready running
| `- 4:0:0:1 sdd 8:48 active ready running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- #:#:#:# -   #:#  active faulty running
  `- #:#:#:# -   #:#  active faulty running

Comment 20 Vijay Kumar 2010-05-26 15:15:03 UTC
Created attachment 416876 [details]
syslog for IO hang during  failover

Attached is the syslog with output of "echo t > /proc/sysrq-trigger"

Comment 21 Babu Moger 2010-05-28 15:18:52 UTC
Ben,
 Any updates on this?  This is blocking all our multipath tests.  Let us know if you need any more recreates from our side..
Thanks

Comment 22 Vijay Kumar 2010-06-01 12:08:43 UTC
Ben,
Few more inputs from my side. Dm device is going in suspend state during table reload and never resumes back. Looks to me the reason IO hang. Do you have any update?

Comment 23 Ben Marzinski 2010-06-01 17:25:41 UTC
What version of device-mapper and the kernel are you using?  Can you please attach the output of:

# dmsetup info

from when the device hangs. Also, could you attach the output from

# dmsetup table --inactive

I'm trying to figure out where the device is getting suspended... You should see an error calling domap from all of the places that would reload the map. I wonder if there's some bug where you can device-mapper can return success and leave the map suspended.

Comment 24 Ben Marzinski 2010-06-02 02:56:48 UTC
Created attachment 418881 [details]
systemtap script to figure out why multipath is getting stuck in suspend.

This is a systemtap script which will hopefully narrow down where things are going wrong to cause multipath devices to get stuck in the suspended state. You will need the systemtap package, and possibly some kernel packages to use this. To run it, you simply do

# stap bz591608.stp

Once you see it print

starting

you can start your test.  This will print  lots of information out, so don't want to run the test with too many devices. When you are done collecting information, you can stop the script with CTRL-C. You should see something that looks like

starting
multipath[5374] entered dev_suspend for mpathb with flags 3588
multipath[5374] entered dm_suspend
multipath[5374] exitted dm_suspend with 0
multipath[5374] entered dm_swap_table
multipath[5374] exitted dm_swap_table with -131936637831680
multipath[5374] entered dm_resume
multipath[5374] exitted dm_resume with 0
multipath[5374] exitted dev_suspend with 0
multipath[5374] entered dev_suspend for mpathc with flags 3588
multipath[5374] entered dm_suspend
multipath[5374] exitted dm_suspend with 0
multipath[5374] entered dm_swap_table
multipath[5374] exitted dm_swap_table with -131936615876608
multipath[5374] entered dm_resume
multipath[5374] exitted dm_resume with 0
multipath[5374] exitted dev_suspend with 0
ending

Please attach the entire output of the script, along with the output of /var/log/messages from the test.

Comment 25 Ben Marzinski 2010-06-02 03:18:01 UTC
Created attachment 418885 [details]
newer version with timestamps to make it easer to correlate with logs

Comment 26 Vijay Kumar 2010-06-02 08:09:03 UTC
Following are the o/p captured after the hang:
-------------------------------------------------------
[root@linux ~]# dmsetup info
Name:              3600a0b800029ebc00000cd604bfa1055
State:             SUSPENDED
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      1
Major, minor:      253, 0
Number of targets: 1
UUID: mpath-3600a0b800029ebc00000cd604bfa1055


[root@linux ~]# dmsetup table --inactive
3600a0b800029ebc00000cd604bfa1055:

[root@linux-214-RHEL6beta1 ~]# rpm -qa |grep device-mapper
device-mapper-event-1.02.43-1.el6.x86_64
device-mapper-multipath-0.4.9-21.el6.x86_64
device-mapper-libs-1.02.43-1.el6.x86_64
device-mapper-1.02.43-1.el6.x86_64
device-mapper-devel-1.02.38-2.fc12.x86_64
device-mapper-event-libs-1.02.43-1.el6.x86_64
device-mapper-multipath-libs-0.4.9-21.el6.x86_64

I will configure systemtab in my system and will share the results.

Thanks.

Comment 27 Ben Marzinski 2010-06-02 14:44:52 UTC
I mentioned that you may need some more kernel packages for this script.  To be specific, you need to have the appropriate kernel-debuginfo, kernel-debuginfo-common, kernel-headers, and kernel-devel packages.

Comment 30 Andrius Benokraitis 2010-06-02 19:09:31 UTC
Vijay, by posting the FTP location in a public Bugzilla you are violating the NDA. Please avoid doing so.

Comment 31 Ben Marzinski 2010-06-03 14:19:03 UTC
*** Bug 590069 has been marked as a duplicate of this bug. ***

Comment 32 Ben Marzinski 2010-06-03 18:08:18 UTC
Created attachment 419469 [details]
yet another version.  This one prints out more information from dm_swap_table

Comment 33 Ben Marzinski 2010-06-04 04:22:50 UTC
There are also some debug packages available at:

http://people.redhat.com/bmarzins/device-mapper-multipath/rpms/RHEL6/x86_64/

These will print out the size and parameters of the new table, whenever multipath tries to reload it.  Hopefully, with these packages, and the systemtap script, we
should be able to figure out what's going wrong here.

The output will go to /var/log/messages, so I still just need the output of the script and /var/log/messages from when you reproduce this issue.

Comment 34 Vijay Kumar 2010-06-04 11:24:30 UTC
I upgraded my host to RHEL6 snapshot6 and I am not seeing this issue anymore. Thanks.

Comment 35 Ben Marzinski 2010-06-04 14:46:12 UTC
I'm reopening this issue since I dup'ed 590069 to it, and the multipath package shouldn't have changed. Since I'm not sure what was wrong, I'm not willing to say for certain that it's fixed.  Vijay, could you please attach your current device-mapper-multipath and kernel versions.

Comment 36 Andrius Benokraitis 2010-06-04 15:09:48 UTC
(In reply to comment #35)
> I'm reopening this issue since I dup'ed 590069 to it, and the multipath package
> shouldn't have changed. Since I'm not sure what was wrong, I'm not willing to
> say for certain that it's fixed.  Vijay, could you please attach your current
> device-mapper-multipath and kernel versions.    

Whoops! Sorry Ben!

Comment 37 Ben Marzinski 2010-06-04 16:16:44 UTC
Oh, and can you also attach the version of the device-mapper package?

Comment 38 Vijay Kumar 2010-06-04 16:24:16 UTC
Ben,

Kernel version is kernel-2.6.32-30.el6.x86_64 and device-mapper-multipath version is device-mapper-multipath-0.4.9-21.el6.x86_64

Comment 39 Vijay Kumar 2010-06-04 16:25:43 UTC
device-mapper-1.02.48-2.el6.x86_64

Comment 40 Sean Stewart 2010-06-07 16:41:18 UTC
I can confirm that I am no longer seeing the issue described in 590069 with Snapshot 6.

Comment 41 Ben Marzinski 2010-06-07 18:57:15 UTC
Well, I suppose that if neither of you can see it, whatever it was must have been fixed. I'm assuming that it was a kernel issue.