Bug 460455

Summary: [FOCUS][24] R2:SAN:Hang triggered by filesystem testing on SAN
Product: Red Hat Enterprise MRG Reporter: IBM Bug Proxy <bugproxy>
Component: realtime-kernelAssignee: Steven Rostedt <srostedt>
Status: CLOSED ERRATA QA Contact:
Severity: urgent Docs Contact:
Priority: medium    
Version: 1.0CC: bhu, davids, lgoncalv, pzijlstr, srostedt, tglx, williams
Target Milestone: 1.0.3   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-10-07 19:21:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
online/offline test
none
patch to fix compat semaphore in qla2x00 driver
none
patch to catch misuses of rtmutexs none

Description IBM Bug Proxy 2008-08-28 10:40:28 UTC
Problem description:
====================
Setup the San blast test and left the system overnight.  The next day the file
system was locked and the box had to be rebooted dmesg could no be pulled:(. 
The box was not hung but the / files system was Read only as best I can tell.  

In var log messages
Jul 23 18:39:31 elm3c24 kernel: WARNING: at kernel/rtmutex.c:1732
rt_read_slowunlock()
Jul 23 18:39:31 elm3c24 kernel: Pid: 9733, comm: pdflush Not tainted
2.6.24.7-72ibmrt2.5 #1
Jul 23 18:39:31 elm3c24 kernel: 
Jul 23 18:39:31 elm3c24 kernel: Call Trace:
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff81285ef1>] rt_read_slowunlock+0x14c/0x43e
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8105dc1d>]
rt_rwlock_read_unlock+0x258/0x25d
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff81287257>] rt_read_unlock+0x9/0xb
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8819ce80>] :dm_mod:dm_get_table+0x35/0x3d
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8819d045>]
:dm_mod:dm_any_congested+0x18/0x53
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8108b3ee>] write_cache_pages+0x207/0x2f7
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8108acb2>] ? __writepage+0x0/0x36
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8108b500>] generic_writepages+0x22/0x28
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8108b538>] do_writepages+0x32/0x3a
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff810cce87>]
__writeback_single_inode+0x1a4/0x30a
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff81287257>] ? rt_read_unlock+0x9/0xb
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff810cd38e>] sync_sb_inodes+0x1be/0x27c
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff810cd661>] writeback_inodes+0x74/0xd1
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8108bd38>] background_writeout+0x87/0xb9
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8108c549>] ? pdflush+0x0/0x1fc
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8108c69a>] pdflush+0x151/0x1fc
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8108bcb1>] ? background_writeout+0x0/0xb9
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8105143b>] kthread+0x49/0x76
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8100d048>] child_rip+0xa/0x12
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff810513f2>] ? kthread+0x0/0x76
Jul 23 18:39:31 elm3c24 kernel:  [<ffffffff8100d03e>] ? child_rip+0x0/0x12
Jul 23 18:39:31 elm3c24 kernel: 

Jul 23 18:47:01 elm3c24 kernel: WARNING: at kernel/rtmutex.c:1732
rt_read_slowunlock()
Jul 23 18:47:01 elm3c24 kernel: Pid: 9862, comm: pdflush Not tainted
2.6.24.7-72ibmrt2.5 #1
Jul 23 18:47:01 elm3c24 kernel: 
Jul 23 18:47:01 elm3c24 kernel: Call Trace:
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff81285ef1>] rt_read_slowunlock+0x14c/0x43e
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8105dc1d>]
rt_rwlock_read_unlock+0x258/0x25d
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff81287257>] rt_read_unlock+0x9/0xb
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8819ce80>] :dm_mod:dm_get_table+0x35/0x3d
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8819d0d2>] :dm_mod:__split_bio+0x20/0x3b4
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8112b509>] ? __make_request+0x57c/0x5c7
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8819dcc0>] :dm_mod:dm_request+0x1fd/0x223
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff811283aa>]
generic_make_request+0x332/0x36d
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff81086b57>] ? mempool_alloc_slab+0x11/0x13
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff88071ce7>] ?
:ext3:ext3_get_block+0x0/0xfc
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8112850a>] submit_bio+0x125/0x12e
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff88071ce7>] ?
:ext3:ext3_get_block+0x0/0xfc
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff810d0cf3>] submit_bh+0xed/0x111
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff810d2921>]
__block_write_full_page+0x1cb/0x2b7
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff88071ce7>] ?
:ext3:ext3_get_block+0x0/0xfc
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff88071ce7>] ?
:ext3:ext3_get_block+0x0/0xfc
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff810d2ace>]
block_write_full_page+0xc1/0xcd
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff880733a0>]
:ext3:ext3_ordered_writepage+0xc6/0x171
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8108acc7>] __writepage+0x15/0x36
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8108b38c>] write_cache_pages+0x1a5/0x2f7
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8108acb2>] ? __writepage+0x0/0x36
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff81287193>] ? rt_spin_lock+0x9/0xb
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff81045f67>] ? lock_timer_base+0x26/0x4b
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8108b500>] generic_writepages+0x22/0x28
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8108b538>] do_writepages+0x32/0x3a
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff810cce87>]
__writeback_single_inode+0x1a4/0x30a
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff81287257>] ? rt_read_unlock+0x9/0xb
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff810cd38e>] sync_sb_inodes+0x1be/0x27c
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff810cd661>] writeback_inodes+0x74/0xd1
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8108bd38>] background_writeout+0x87/0xb9
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8108c549>] ? pdflush+0x0/0x1fc
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8108c69a>] pdflush+0x151/0x1fc
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8108bcb1>] ? background_writeout+0x0/0xb9
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8105143b>] kthread+0x49/0x76
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8100d048>] child_rip+0xa/0x12
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff810513f2>] ? kthread+0x0/0x76
Jul 23 18:47:01 elm3c24 kernel:  [<ffffffff8100d03e>] ? child_rip+0x0/0x12


2.6.24.7-72ibmrt2.5 kerenl version.... logs show not that much time before
repro. Again DMESG was not avaible so this just means the disk was rw.

=Comment: #1=================================================
KEITH MANNTHEY <mannthey.com> - 2008-07-24 19:51 EDT
The SAN swith topology in this blade center is still unstable :( might be
related.  Working on it.... 

=Comment: #2=================================================
KEITH MANNTHEY <mannthey.com> - 2008-07-24 19:52 EDT
Kicked the tests off again on the box..  A few min into the test. 

krcupreemptd new prio is 115??
WARNING: at kernel/rtmutex.c:1732 rt_read_slowunlock()
Pid: 9478, comm: pdflush Not tainted 2.6.24.7-72ibmrt2.5 #1

Call Trace:
 [<ffffffff81287257>] ? rt_read_unlock+0x9/0xb
 [<ffffffff81285ef1>] rt_read_slowunlock+0x14c/0x43e
 [<ffffffff8105f455>] rt_mutex_up_read+0x25b/0x260
 [<ffffffff8105fd52>] rt_up_read+0x9/0xb
 [<ffffffff810cd675>] writeback_inodes+0x88/0xd1
 [<ffffffff8108bd38>] background_writeout+0x87/0xb9
 [<ffffffff8108c549>] ? pdflush+0x0/0x1fc
 [<ffffffff8108c69a>] pdflush+0x151/0x1fc
 [<ffffffff8108bcb1>] ? background_writeout+0x0/0xb9
 [<ffffffff8105143b>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513f2>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

=Comment: #3=================================================
KEITH MANNTHEY <mannthey.com> - 2008-07-24 20:05 EDT
Grabed a dmesg.  IPMI kcs interface initialized
ipmi device interface
krcupreemptd new prio is 115??
WARNING: at kernel/rtmutex.c:1732 rt_read_slowunlock()
Pid: 9478, comm: pdflush Not tainted 2.6.24.7-72ibmrt2.5 #1

Call Trace:
 [<ffffffff81287257>] ? rt_read_unlock+0x9/0xb
 [<ffffffff81285ef1>] rt_read_slowunlock+0x14c/0x43e
 [<ffffffff8105f455>] rt_mutex_up_read+0x25b/0x260
 [<ffffffff8105fd52>] rt_up_read+0x9/0xb
 [<ffffffff810cd675>] writeback_inodes+0x88/0xd1
 [<ffffffff8108bd38>] background_writeout+0x87/0xb9
 [<ffffffff8108c549>] ? pdflush+0x0/0x1fc
 [<ffffffff8108c69a>] pdflush+0x151/0x1fc
 [<ffffffff8108bcb1>] ? background_writeout+0x0/0xb9
 [<ffffffff8105143b>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513f2>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

APIC error on CPU1: 00(40)
APIC error on CPU5: 00(40)
APIC error on CPU0: 00(40)
APIC error on CPU3: 00(40)
APIC error on CPU4: 00(40)
APIC error on CPU6: 00(40)
APIC error on CPU7: 00(40)
APIC error on CPU2: 00(40)
Machine check events logged

I was able to log in via ssh it appears the SAN is up but the system is
unusable.  Not sure what is going on but it is bad. Can't get updtime or top or
tab in a bash shell. After a while can't do anything.  It feels like oom or
something like that. Need to get an strace off when in this state. Rebooting box. 

=Comment: #4=================================================
KEITH MANNTHEY <mannthey.com> - 2008-07-24 20:26 EDT
more pain... 

eth2: no IPv6 routers present
krcupreemptd new prio is 115??
WARNING: at kernel/rtmutex.c:1732 rt_read_slowunlock()
Pid: 9470, comm: pdflush Not tainted 2.6.24.7-72ibmrt2.5 #1

Call Trace:
 [<ffffffff81285ef1>] rt_read_slowunlock+0x14c/0x43e
 [<ffffffff8105dc1d>] rt_rwlock_read_unlock+0x258/0x25d
 [<ffffffff81287257>] rt_read_unlock+0x9/0xb
 [<ffffffff8819ce80>] :dm_mod:dm_get_table+0x35/0x3d
 [<ffffffff8819d045>] :dm_mod:dm_any_congested+0x18/0x53
 [<ffffffff8108b3ee>] write_cache_pages+0x207/0x2f7
 [<ffffffff8108acb2>] ? __writepage+0x0/0x36
 [<ffffffff8108b500>] generic_writepages+0x22/0x28
 [<ffffffff8108b538>] do_writepages+0x32/0x3a
 [<ffffffff810cce87>] __writeback_single_inode+0x1a4/0x30a
 [<ffffffff81287257>] ? rt_read_unlock+0x9/0xb
 [<ffffffff810cd38e>] sync_sb_inodes+0x1be/0x27c
 [<ffffffff810cd661>] writeback_inodes+0x74/0xd1
 [<ffffffff8108bd38>] background_writeout+0x87/0xb9
 [<ffffffff8108c549>] ? pdflush+0x0/0x1fc
 [<ffffffff8108c69a>] pdflush+0x151/0x1fc
 [<ffffffff8108bcb1>] ? background_writeout+0x0/0xb9
 [<ffffffff8105143b>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513f2>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

WARNING: at kernel/rtmutex.c:1732 rt_read_slowunlock()
Pid: 9491, comm: pdflush Not tainted 2.6.24.7-72ibmrt2.5 #1

Call Trace:
 [<ffffffff881c0d64>] ? :dm_multipath:map_io+0x130/0x140
 [<ffffffff81285ef1>] rt_read_slowunlock+0x14c/0x43e
 [<ffffffff8105dc1d>] rt_rwlock_read_unlock+0x258/0x25d
 [<ffffffff81287257>] rt_read_unlock+0x9/0xb
 [<ffffffff8819ce80>] :dm_mod:dm_get_table+0x35/0x3d
 [<ffffffff8819d0d2>] :dm_mod:__split_bio+0x20/0x3b4
 [<ffffffff8112b509>] ? __make_request+0x57c/0x5c7
 [<ffffffff8819dcc0>] :dm_mod:dm_request+0x1fd/0x223
 [<ffffffff811283aa>] generic_make_request+0x332/0x36d
 [<ffffffff81086b57>] ? mempool_alloc_slab+0x11/0x13
 [<ffffffff88071ce7>] ? :ext3:ext3_get_block+0x0/0xfc
 [<ffffffff8112850a>] submit_bio+0x125/0x12e
 [<ffffffff88071ce7>] ? :ext3:ext3_get_block+0x0/0xfc
 [<ffffffff810d0cf3>] submit_bh+0xed/0x111
 [<ffffffff810d2921>] __block_write_full_page+0x1cb/0x2b7
 [<ffffffff88071ce7>] ? :ext3:ext3_get_block+0x0/0xfc
 [<ffffffff88071ce7>] ? :ext3:ext3_get_block+0x0/0xfc
 [<ffffffff810d2ace>] block_write_full_page+0xc1/0xcd
 [<ffffffff880733a0>] :ext3:ext3_ordered_writepage+0xc6/0x171
 [<ffffffff8108acc7>] __writepage+0x15/0x36
 [<ffffffff8108b38c>] write_cache_pages+0x1a5/0x2f7
 [<ffffffff8108acb2>] ? __writepage+0x0/0x36
 [<ffffffff8108b500>] generic_writepages+0x22/0x28
 [<ffffffff8108b538>] do_writepages+0x32/0x3a
 [<ffffffff810cce87>] __writeback_single_inode+0x1a4/0x30a
 [<ffffffff81287257>] ? rt_read_unlock+0x9/0xb
 [<ffffffff810cd38e>] sync_sb_inodes+0x1be/0x27c
 [<ffffffff810cd661>] writeback_inodes+0x74/0xd1
 [<ffffffff8108bd38>] background_writeout+0x87/0xb9
 [<ffffffff8108c549>] ? pdflush+0x0/0x1fc
 [<ffffffff8108c69a>] pdflush+0x151/0x1fc
 [<ffffffff8108bcb1>] ? background_writeout+0x0/0xb9
 [<ffffffff8105143b>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513f2>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

=Comment: #5=================================================
KEITH MANNTHEY <mannthey.com> - 2008-07-24 20:27 EDT
After the above warning box as just fine.

=Comment: #6=================================================
Darren V. Hart <dvhltc.com> - 2008-07-24 20:44 EDT
Keith,
krcupreemptd new prio is 115??

should be fixed in current R2.  What are you testing on here?

=Comment: #7=================================================
KEITH MANNTHEY <mannthey.com> - 2008-07-24 21:11 EDT
Umm lets see R2 from about a week ago. This box was setup for SAN testing a
while ago.   Latest.tar was linux-rt-Developer_Release-1122. 

I am moving to a current svn built kernel that includes a small fix (I don't
suspect it is related).  

Check out the weird top output.... This is weird. the box has bad cpus and a
pre-ga SAN card.  I am setting up on a k"suppored" platfrom to reproduce there. 

top - 18:27:17 up 57 min,  4 users,  load average: 20.03, 19.68, 16.40
Tasks: 272 total,   1 running, 271 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 25.0%id, 74.9%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8053068k total,  1869144k used,  6183924k free,   227428k buffers
Swap:  2031608k total,        0k used,  2031608k free,  1299264k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
   37 root     -31  -5     0    0    0 S    0  0.0   0:01.11 sirq-sched/2       
 8267 root      39  19     0    0    0 S    0  0.0   0:25.45 kipmi0             
 9373 root      20   0 12852 1224  820 R    0  0.0   0:28.73 top                
    1 root      20   0 10332  700  588 S    0  0.0   0:05.44 init               
    2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd           
    3 root      RT  -5     0    0    0 S    0  0.0   0:00.00 migration/0        
    4 root      RT  -5     0    0    0 S    0  0.0   0:00.00 posix_cpu_timer    
    5 root     -31  -5     0    0    0 S    0  0.0   0:00.00 sirq-high/0        
    6 root     -31  -5     0    0    0 S    0  0.0   0:03.77 sirq-timer/0       
    7 root     -91  -5     0    0    0 S    0  0.0   0:00.00 sirq-net-tx/0      
    8 root     -91  -5     0    0    0 S    0  0.0   0:00.06 sirq-net-rx/0      
    9 root     -31  -5     0    0    0 S    0  0.0   0:42.42 sirq-block/0       
   10 root     -31  -5     0    0    0 S    0  0.0   0:00.00 sirq-tasklet/0     
   11 root     -31  -5     0    0    0 S    0  0.0   0:01.12 sirq-sched/0       
   12 root     -93  -5     0    0    0 S    0  0.0   0:00.00 sirq-hrtimer/0     
   13 root     -31  -5     0    0    0 S    0  0.0   0:00.79 sirq-rcu/0         
   14 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/0         
   15 root      10 -10     0    0    0 S    0  0.0   0:00.00 desched/0          
   16 root      RT  -5     0    0    0 S    0  0.0   0:00.00 migration/1        
   17 root      RT  -5     0    0    0 S    0  0.0   0:00.00 posix_cpu_timer    
   18 root     -31  -5     0    0    0 S    0  0.0   0:00.00 sirq-high/1        
   19 root     -31  -5     0    0    0 S    0  0.0   0:04.41 sirq-timer/1       
   20 root     -91  -5     0    0    0 S    0  0.0   0:00.00 sirq-net-tx/1      
   21 root     -91  -5     0    0    0 S    0  0.0   0:00.01 sirq-net-rx/1      
   22 root     -31  -5     0    0    0 S    0  0.0   0:27.28 sirq-block/1       
   23 root     -31  -5     0    0    0 S    0  0.0   0:00.00 sirq-tasklet/1     
   24 root     -31  -5     0    0    0 S    0  0.0   0:00.95 sirq-sched/1       
   25 root     -93  -5     0    0    0 S    0  0.0   0:00.04 sirq-hrtimer/1     
   26 root     -31  -5     0    0    0 S    0  0.0   0:00.74 sirq-rcu/1         
   27 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/1         
   28 root      10 -10     0    0    0 S    0  0.0   0:00.00 desched/1          
   29 root      RT  -5     0    0    0 S    0  0.0   0:00.00 migration/2        
   30 root      RT  -5     0    0    0 S    0  0.0   0:00.00 posix_cpu_timer    
   31 root     -31  -5     0    0    0 S    0  0.0   0:00.00 sirq-high/2        
   32 root     -31  -5     0    0    0 S    0  0.0   0:03.36 sirq-timer/2       

=Comment: #8=================================================
KEITH MANNTHEY <mannthey.com> - 2008-07-24 21:18 EDT
Ok asked blast to stop.

My top shell was ok. 3 other shells were dead in the water. Called sync on one.
and pressed tab in the other 2.  Totally dead shells but my top shell was alive.
 I was able to quit top and run a few commands (tab even).  This lasted for a
few min then pressed tab again and the shell locked up.  Once a shell lockes up
I don't ever see it come back. 

I am running tests as root.  

=Comment: #11=================================================
KEITH MANNTHEY <mannthey.com> - 2008-07-25 12:13 EDT
This issue has only been seen with code that is in R2.  I don't know the root
issue yet but as this is a SAN box I don't think we should head to MRG yet. 

=Comment: #12=================================================
KEITH MANNTHEY <mannthey.com> - 2008-07-25 13:01 EDT
Hmm moved to 2.6.24.7-74ibmrt2.5 and am still seeing the weird hang state. 

=Comment: #13=================================================
JEFFREY D. LANE <jdlane.com> - 2008-07-29 10:18 EDT
(In reply to comment #12)
> Hmm moved to 2.6.24.7-74ibmrt2.5 and am still seeing the weird hang state. 

Just to add a second datapoint, I am seeing this as well on HS21XM running R2
RC1 in a SAN boot configuration.

IN my case, however, I am using MPx and not blast, and my system remains up and
stable, but /var/log/messages is full of similar rt_read_slowunlock() traces:

(also, mine seem to mostly be centered around sirq-net-rx)

Jul 29 06:17:21 HS21XM-R2RC1 kernel:  rport-1:0-32: blocked FC remote port time
out: removing target and saving binding
Jul 29 06:17:21 HS21XM-R2RC1 kernel:  rport-0:0-32: blocked FC remote port time
out: removing target and saving binding
Jul 29 09:36:11 HS21XM-R2RC1 kernel: WARNING: at kernel/rtmutex.c:1732
rt_read_slowunlock()
Jul 29 09:36:11 HS21XM-R2RC1 kernel: Pid: 8, comm: sirq-net-rx/0 Not tainted
2.6.24.7-68ibmrt2.5 #1
Jul 29 09:36:11 HS21XM-R2RC1 kernel: 
Jul 29 09:36:11 HS21XM-R2RC1 kernel: Call Trace:
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff81288591>]
rt_read_slowunlock+0x14c/0x43e
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff810a8c9a>] ?
kmem_cache_free+0x63/0x70
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff8105db19>]
rt_rwlock_read_unlock+0x258/0x25d
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff812898f7>] rt_read_unlock+0x9/0xb
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff8125753b>]
__udp4_lib_rcv+0x4a0/0x7d4
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff81252e3a>] ? tcp_v4_rcv+0x707/0xab5
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff81257884>] udp_rcv+0x15/0x17
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff81235c73>]
ip_local_deliver_finish+0x169/0x231
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff81236157>]
ip_local_deliver+0x72/0x7b
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff81235ad9>] ip_rcv_finish+0x331/0x362
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff81289833>] ? rt_spin_lock+0x9/0xb
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff810a9d89>] ?
kmem_cache_alloc_node+0xe6/0x10f
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff812360b1>] ip_rcv+0x274/0x2a8
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff81212fdd>]
netif_receive_skb+0x3a9/0x462
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff81143222>] ?
swiotlb_map_single+0x3b/0xb4
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff882826ce>]
:bnx2:bnx2_poll+0xc3a/0xe17
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff81036769>] ?
finish_task_switch+0x4c/0xdc
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff81215402>] net_rx_action+0xab/0x1e9
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff81042a18>] ksoftirqd+0x16a/0x26f
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff810428ae>] ? ksoftirqd+0x0/0x26f
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff810428ae>] ? ksoftirqd+0x0/0x26f
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff810513e7>] kthread+0x49/0x76
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff8100d048>] child_rip+0xa/0x12
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff8105139e>] ? kthread+0x0/0x76
Jul 29 09:36:11 HS21XM-R2RC1 kernel:  [<ffffffff8100d03e>] ? child_rip+0x0/0x12


I am also seeing these pop up as well:

Jul 28 05:21:50 HS21XM-R2RC1 kernel: sirq-net-rx/0: page allocation failure.
order:0, mode:0x20
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Pid: 8, comm: sirq-net-rx/0 Not tainted
2.6.24.7-68ibmrt2.5 #1
Jul 28 05:21:50 HS21XM-R2RC1 kernel: 
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Call Trace:
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff8108a3b3>] __alloc_pages+0x2fa/0x312
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff810a9bd2>] kmem_getpages+0x9d/0x16e
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff810aa1aa>]
fallback_alloc+0x138/0x1dc
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff810aa36f>]
____cache_alloc_node+0x121/0x136
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff810a9d6a>]
kmem_cache_alloc_node+0xc7/0x10f
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff810a9dd6>] __kmalloc_node+0x24/0x29
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff8120df57>] __alloc_skb+0x69/0x133
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff8120ebb6>]
__netdev_alloc_skb+0x31/0x4d
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff882822b8>]
:bnx2:bnx2_poll+0x824/0xe17
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff81033431>] ?
try_to_wake_up+0x20d/0x21e
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff81215402>] net_rx_action+0xab/0x1e9
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff81042a18>] ksoftirqd+0x16a/0x26f
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff810428ae>] ? ksoftirqd+0x0/0x26f
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff810428ae>] ? ksoftirqd+0x0/0x26f
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff810513e7>] kthread+0x49/0x76
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff8100d048>] child_rip+0xa/0x12
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff8105139e>] ? kthread+0x0/0x76
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  [<ffffffff8100d03e>] ? child_rip+0x0/0x12
Jul 28 05:21:50 HS21XM-R2RC1 kernel: 
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Mem-info:
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Node 0 DMA per-cpu:
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    0: Hot: hi:    0, btch:   1 usd:   0
  Cold: hi:    0, btch:   1 usd:   0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    1: Hot: hi:    0, btch:   1 usd:   0
  Cold: hi:    0, btch:   1 usd:   0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    2: Hot: hi:    0, btch:   1 usd:   0
  Cold: hi:    0, btch:   1 usd:   0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    3: Hot: hi:    0, btch:   1 usd:   0
  Cold: hi:    0, btch:   1 usd:   0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    4: Hot: hi:    0, btch:   1 usd:   0
  Cold: hi:    0, btch:   1 usd:   0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    5: Hot: hi:    0, btch:   1 usd:   0
  Cold: hi:    0, btch:   1 usd:   0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    6: Hot: hi:    0, btch:   1 usd:   0
  Cold: hi:    0, btch:   1 usd:   0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    7: Hot: hi:    0, btch:   1 usd:   0
  Cold: hi:    0, btch:   1 usd:   0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Node 0 DMA32 per-cpu:
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    0: Hot: hi:  186, btch:  31 usd:  30
  Cold: hi:   62, btch:  15 usd:   1
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    1: Hot: hi:  186, btch:  31 usd: 169
  Cold: hi:   62, btch:  15 usd:   0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    2: Hot: hi:  186, btch:  31 usd:   2
  Cold: hi:   62, btch:  15 usd:   0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    3: Hot: hi:  186, btch:  31 usd:  76
  Cold: hi:   62, btch:  15 usd:   0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    4: Hot: hi:  186, btch:  31 usd:  31
  Cold: hi:   62, btch:  15 usd:  19
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    5: Hot: hi:  186, btch:  31 usd: 161
  Cold: hi:   62, btch:  15 usd:  17
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    6: Hot: hi:  186, btch:  31 usd:  20
  Cold: hi:   62, btch:  15 usd:   0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    7: Hot: hi:  186, btch:  31 usd:  32
  Cold: hi:   62, btch:  15 usd:   0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Node 0 Normal per-cpu:
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    0: Hot: hi:  186, btch:  31 usd:  42
  Cold: hi:   62, btch:  15 usd:  42
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    1: Hot: hi:  186, btch:  31 usd: 157
  Cold: hi:   62, btch:  15 usd:  48
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    2: Hot: hi:  186, btch:  31 usd: 131
  Cold: hi:   62, btch:  15 usd:   6
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    3: Hot: hi:  186, btch:  31 usd: 163
  Cold: hi:   62, btch:  15 usd:  52
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    4: Hot: hi:  186, btch:  31 usd: 157
  Cold: hi:   62, btch:  15 usd:  48
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    5: Hot: hi:  186, btch:  31 usd: 184
  Cold: hi:   62, btch:  15 usd:  57
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    6: Hot: hi:  186, btch:  31 usd: 105
  Cold: hi:   62, btch:  15 usd:  57
Jul 28 05:21:50 HS21XM-R2RC1 kernel: CPU    7: Hot: hi:  186, btch:  31 usd: 164
  Cold: hi:   62, btch:  15 usd:  49
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Active:1538732 inactive:205001 dirty:8868
writeback:1263 unstable:4019
Jul 28 05:21:50 HS21XM-R2RC1 kernel:  free:8849 slab:190850 mapped:15042
pagetables:6819 bounce:0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Node 0 DMA free:10320kB min:12kB low:12kB
high:16kB active:0kB inactive:0kB present:9628kB pages_scanned:0 all_unreclaim
able? yes
Jul 28 05:21:50 HS21XM-R2RC1 kernel: lowmem_reserve[]: 0 2704 7954 7954
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Node 0 DMA32 free:22436kB min:3876kB
low:4844kB high:5812kB active:2305936kB inactive:20kB present:2769008kB pages_scann
ed:4684 all_unreclaimable? no
Jul 28 05:21:50 HS21XM-R2RC1 kernel: lowmem_reserve[]: 0 0 5250 5250
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Node 0 Normal free:2640kB min:7528kB
low:9408kB high:11292kB active:3848992kB inactive:819984kB present:5376000kB pages_
scanned:1196 all_unreclaimable? no
Jul 28 05:21:50 HS21XM-R2RC1 kernel: lowmem_reserve[]: 0 0 0 0
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Node 0 DMA: 4*4kB 2*8kB 5*16kB 3*32kB
4*64kB 3*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 2*4096kB = 10320kB
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Node 0 DMA32: 90*4kB 57*8kB 47*16kB 25*32kB
15*64kB 5*128kB 2*256kB 3*512kB 2*1024kB 3*2048kB 2*4096kB = 22400kB
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Node 0 Normal: 235*4kB 1*8kB 0*16kB 0*32kB
0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3124kB
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Swap cache: add 12122, delete 10774, find
35740/35832, race 0+3
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Free swap  = 1985760kB
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Total swap = 2031608kB
Jul 28 05:21:50 HS21XM-R2RC1 kernel: Free swap:       1985760kB

and I'm seeing some from pdflush as well.  I am NOT seeing the APIC failure
messages though.

=Comment: #14=================================================
KEITH MANNTHEY <mannthey.com> - 2008-07-30 15:04 EDT
Thanks for the data point. Looks like the box is nearing oom?  

=Comment: #15=================================================
JOSHUA A. TRIPLETT <josht.com> - 2008-08-13 11:53 EDT
Changing the bug summary; the kernel WARNING message doesn't seem to relate to
the hang.

=Comment: #16=================================================
KEITH MANNTHEY <mannthey.com> - 2008-08-13 11:59 EDT
Ok there have not been many updates to this bug in the last little bit:

At a hight level.  JV and Josh are driving the bug. 

The hang seems related to the kind of IO the "blast" runs. I think it is
something about lots small files but not 100% sure about that. 

Josh and JV this is a good place to keep technical. It is an easy place for
others to look at. 

=Comment: #17=================================================
JOSHUA A. TRIPLETT <josht.com> - 2008-08-13 12:04 EDT
jvrao and I currently have various tests going on trying to narrow down the list
of possible variables here.  We can fairly consistently reproduce the problem on
elm3c24 by running blast on the SAN partitions.  We think we managed to
reproduce the problem on elm3c24 by running iopuser (a simple Python filesystem
test we have run in the past), though it took longer; tweaking iopuser's
parameters to write smaller files and more of them would make it behave more
like blast, which might reproduce the problem faster.  Currently trying to
reproduce the problem on elm3c29.

=Comment: #18=================================================
Venkateswarara Jujjuri <jvrao.com> - 2008-08-13 22:23 EDT
In an effort to reproduce the problem on another machine, Josh started blast on
one SAN disk. This single threaded blast ran fine overnight and it ran fine.
Later On Keith's advise, I created 3 more LUNs, added to the machine, and
started 4 thread blast. After 20 mins we observed that the system is hung. 

At this point we believed that we *DID* reproduced the problem on elm3c29.

As per the plan, we need to retry the same test on local disk. We planned to
create 4 dirs on the local disk, and restart 4 blast threads. Trying to reboot
the system (elm3c29) for the last one hour.. it is still trying to reboot...and
I could see lots of IO errors on the console. I checked that DS4k status is fine
from the DS4k point of view.

Will continue to investigate and try to run the blast on local disk.

=Comment: #19=================================================
Venkateswarara Jujjuri <jvrao.com> - 2008-08-14 01:21 EDT
Rebooted elm3c29 took long time but finally came up.
Fired off 4 threads of blast on to the local disk.
We just have one local disk available on this machine, hence I directed all 4
threads to the same disk, but to different directories. We will know the status
by tomorrow morning.

=Comment: #20=================================================
Venkateswarara Jujjuri <jvrao.com> - 2008-08-14 12:47 EDT
Single LOCAL disk with 4 blast threads ran overnight without any problems.
Machine was responsive in the morning.

Next Step:
Just to make apples to apples comparison, I am starting a test on 29 with 1 SAN
disk, and 4 blast threads on that disk.

=Comment: #21=================================================
Darren V. Hart <dvhltc.com> - 2008-08-15 18:55 EDT

online/offline test

On a whim I tried looking at the Red Hat reported multipathd bugs, such as:

https://bugzilla.redhat.com/show_bug.cgi?id=428338

I ran the test.sh script in the attachment from that bug (called
san-fault-test.sh attached here).  It seemed to online/offline the drive just
fine for 100 iterations, but when I killed the test the shells all hung with an
ls command (even as prio 99 shells, sshd, interrupt and softirq threads).  Will
run again with a fixed 100 iterations rather than ctrl+c'ing to exit.

=Comment: #22=================================================
Darren V. Hart <dvhltc.com> - 2008-08-15 20:42 EDT
I worked with JV and niv and we collected a lot of information from the system
as it slowly became less responsive.  Given the quantity of data I have copied
it all over to kbic here:

http://kernel.beaverton.ibm.com/realtime/kdump_cores/bz46744/elm3c24/

It includes a crashdump core (FINALLY) a sysrq-t trace, and a ps command output.
 We had several commands hung at the time of each of these including some
straced rm commands, an ls, and a sync.  They all appear hung IN the kernel, on
getdents().  We also observed that there were 9MB or so of dirty pages that we
couldn't flush out with sync according to meminfo and vmstat (assuming that is
why sync never returned).

Plan to get everything uploaded tonight and commence crashdump analysis on Monday.

=Comment: #23=================================================
Venkateswarara Jujjuri <jvrao.com> - 2008-08-18 12:27 EDT
Took a dump when the system is about to hang. Will look into the dump. 
In parallel plan is run tests on non-RT kernel.

=Comment: #24=================================================
Venkateswarara Jujjuri <jvrao.com> - 2008-08-19 00:32 EDT
Me and Keith started analyzing the crash. Here are few observations.

- 3 Blast threads running. 
  2 of them are in __wait_on_bit_lock -> sync_buffer

- Many bash threads.. most of them are waiting to get scheduled. But one thread is:
 #0 [ffff8101105a1d48] schedule at ffffffff8128565f
 #1 [ffff8101105a1e00] rt_mutex_slowlock at ffffffff81286dd7
 #2 [ffff8101105a1ed0] rt_mutex_lock at ffffffff812862c6
 #3 [ffff8101105a1ee0] _mutex_lock at ffffffff81287739
 #4 [ffff8101105a1ef0] vfs_readdir at ffffffff810bcbea
 #5 [ffff8101105a1f30] sys_getdents at ffffffff810bce86

It will be interesting to see who is holding the rt_mutex_slowlock.

Other thing we need to see is, who is holding the scheduler.and why many bash
threads are waiting to get scheduled. It is normal for the stack to show waiting
to get scheduled. But we know that the system was not responding and shell are
hanging when we took the crash.

* In addition, our attempt create a vanilla 2.6.24.7 kernel with scsi_dh_rdac is
not successful. We will be working on creating that kernel to reproduce it on a
non-rt system

=Comment: #25=================================================
Venkateswarara Jujjuri <jvrao.com> - 2008-08-20 02:01 EDT
Continuing to investigate the problem. This appears to be tricky and involved
issue. 

Analyzed crash more deeply. Here is what We have learned.

1. Most of the CPUs are idle. This is puzzling...especially when the system is
   not responsive; one would assume CPUs to be occupied.
2. Most of the threads are blocked on IO. This explains why some bash shells are 
   responsive and some are not. Any command/operation causing page fault or
   doing disk access are hanging.

Myself, Chandra, Darren, and Keith spent time on this. Here is our plan of attack.

1. Try to run the same test on a machine with NO SAN.
   With the help of Keith, I have installed R2 on elm3b210 and ran blast.
   Blast is continuing to run even after 6 hours. This tells us, no problem
   with non-san-rt.

2. Try running SAN on vanilla 26-2.4
   Ran on this kernel for more than 4 hours. This tells that No problem in this
   stack too.

At this point we are sure that the problem exists on RT SAN combination.
We believe that this problem could be there for long time but is getting exposed
now..because the way blast does its testing.
Blast creates many small files and keeps reading/writing/deleting/creating small
files. As per Keith and Darren we have never tested this scenario.

Now the target is to isolate the driver in the SAN IO stack. 

We are attempting to take out dm-rdac and dh modules from the stack.
With the help of Keith I am installing a non-lvm single path SAN on elm3c29.
This machine just finished installing base RH. Tomorrow We will put RT kernle on
it...and test with blast.

If blast runs fine on this, it tells us that the problem is there in the new
dm-rdac/dh modules. If not we need to continue investigating the IO stack.
Will update more info .. as we learn. 

=Comment: #26=================================================
TIMOTHY R. CHAVEZ <chavezt.com> - 2008-08-20 16:51 EDT
Apparently elm3b210 which is an LS41 with a local disk configuration seized
after running the blaster test for some time.  The system was still responsive
to SysRq so I tried to get as much information from the hung machine as I
possibly could.  Unfortunately, I was not able to trigger a crashdump :/  Here's
what I got (I have not analyze yet):

SysRq : Show State
init          S [ffff81013fa8b480] ffffffff8129ebe0     0     1      0
 ffff81013fa8d9d8 0000000000000086 0000000000000000 ffff81013fb57540
 ffff81013fa8d968 ffffffff812872bb ffff81013fa8b480 ffff81013fb61580
 ffff81013fa8b7c8 0000000201bbf9ff ffffffffffffffff 0000000101bbf9ff
Call Trace:
 [<ffffffff812872bb>] ? rt_spin_lock+0x9/0xb
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff8128586c>] schedule_timeout+0x8d/0xb4
 [<ffffffff81045994>] ? process_timeout+0x0/0xb
 [<ffffffff81285867>] ? schedule_timeout+0x88/0xb4
 [<ffffffff810bd561>] do_select+0x47f/0x4f2
 [<ffffffff810bda4e>] ? __pollwait+0x0/0xdf
 [<ffffffff81033423>] ? default_wake_function+0x0/0x14
 [<ffffffff8107edca>] ? __rcu_read_unlock+0x5a/0x5c
 [<ffffffff810c18f0>] ? __d_lookup+0x116/0x128
 [<ffffffff810b781d>] ? do_lookup+0x63/0x1b1
 [<ffffffff810c0ddd>] ? dput+0x22/0x11a
 [<ffffffff810b9d8b>] ? __link_path_walk+0xc66/0xdbe
 [<ffffffff812872bb>] ? rt_spin_lock+0x9/0xb
 [<ffffffff810a8bc3>] ? _slab_irq_disable+0x37/0x5c
 [<ffffffff810c5edf>] ? mntput_no_expire+0x1e/0x98
 [<ffffffff810b9fcb>] ? link_path_walk+0xe8/0xfc
 [<ffffffff810bd79a>] core_sys_select+0x1c6/0x275
 [<ffffffff810c5edf>] ? mntput_no_expire+0x1e/0x98
 [<ffffffff810b72e6>] ? path_release+0x2c/0x30
 [<ffffffff810b338c>] ? cp_new_stat+0xf6/0x10f
 [<ffffffff810bdc7b>] sys_select+0x14e/0x176
 [<ffffffff8100c1fe>] system_call_ret+0x0/0x5

kthreadd      S [ffff81013fa8aa60] ffffffff8129ebe0     0     2      0
 ffff81013fa8ff00 0000000000000046 0000000000000000 ffffffff8105df69
 ffffffff814cc260 0000000000000286 ffff81013fa8aa60 ffff81013f063540
 ffff81013fa8ada8 00000005957bdd80 ffffffffffffffff ffff8100957bdd10
Call Trace:
 [<ffffffff8105df69>] ? __rt_mutex_adjust_prio+0x11/0x24
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff8105120c>] kthreadd+0x7b/0x142
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff81051191>] ? kthreadd+0x0/0x142
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

migration/0   S [ffff81013fa8a040] ffff81000100c158     0     3      2
 ffff81013fa93ed0 0000000000000046 ffff810001016980 0000000000000001
 ffff81013f12c040 ffff810001016980 ffff81013fa8a040 ffff81012e303540
 ffff81013fa8a388 0000000081030e71 0000000000000001 ffff81013f12c040
Call Trace:
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff81037bab>] migration_thread+0x16d/0x20d
 [<ffffffff81037a3e>] ? migration_thread+0x0/0x20d
 [<ffffffff81037a3e>] ? migration_thread+0x0/0x20d
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

posix_cpu_tim S [ffff81013fa954c0] 0000000000000000     0     4      2
 ffff81013fa97ee0 0000000000000046 0000000000000000 ffff81013fa8b480
 ffffffff813aa300 ffff81000100b980 ffff81013fa954c0 ffff81013fa8b480
 ffff81013fa95808 0000000000000000 0000000000000000 ffff81013fa8b480
Call Trace:
 [<ffffffff81053627>] ? posix_cpu_timers_thread+0x0/0x125
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff810536b2>] posix_cpu_timers_thread+0x8b/0x125
 [<ffffffff81053627>] ? posix_cpu_timers_thread+0x0/0x125
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

sirq-high/0   S [ffff81013fa94aa0] ffff810001008240     0     5      2
 ffff81013fa99eb0 0000000000000046 ffff81013fa95278 0000000000000001
 ffffffff8129ebe0 ffff81013fa99e50 ffff81013fa94aa0 ffff81013fa94080
 ffff81013fa94de8 0000000000000001 ffff81013fa99e70 ffffffff8107edca
Call Trace:
 [<ffffffff8107edca>] ? __rcu_read_unlock+0x5a/0x5c
 [<ffffffff81036667>] ? do_sched_setscheduler+0x6f/0x7c
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff810429c6>] ksoftirqd+0xbc/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

sirq-timer/0  S [ffff81013fa94080] ffff8100010085e8     0     6      2
 ffff81013fa9deb0 0000000000000046 0000000600000001 ffffffff81545740
 0000000000000000 0000000000000001 ffff81013fa94080 ffff81013fab3580
 ffff81013fa943c8 000000008103982e 0000000600000001 ffff8100010082c0
Call Trace:
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff810429c6>] ksoftirqd+0xbc/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

sirq-net-tx/0 S [ffff81013fa9f500] ffff810001008310     0     7      2
 ffff81013faa1eb0 0000000000000046 ffff81013fa9fcd8 0000000000000001
 ffffffff8129ebe0 ffff81013faa1e50 ffff81013fa9f500 ffff81013fa9eae0
 ffff81013fa9f848 0000000000000001 ffff81013faa1e70 ffffffff8107edca
Call Trace:
 [<ffffffff8107edca>] ? __rcu_read_unlock+0x5a/0x5c
 [<ffffffff81036667>] ? do_sched_setscheduler+0x6f/0x7c
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff810429c6>] ksoftirqd+0xbc/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

sirq-net-rx/0 S [ffff81013fa9eae0] ffffffff8129ebe0     0     8      2
 ffff81013faa3eb0 0000000000000046 0000000000000000 0000000000000082
 ffff81013faa3e60 00000040b3d2c100 ffff81013fa9eae0 ffffffff813aa7a0
 ffff81013fa9ee28 00000000010094f8 ffffffffffffffff ffff810001008390
Call Trace:
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff810429c6>] ksoftirqd+0xbc/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

sirq-block/0  S [ffff81013fa9e0c0] ffff8100010085e8     0     9      2
 ffff81013faa7eb0 0000000000000046 ffff81009569b380 ffff81013d0ce000
 0000000000000005 0000000000000010 ffff81013fa9e0c0 ffff81013b1eb480
 ffff81013fa9e408 000000009569b380 0000000600000001 ffff8100010083f8
Call Trace:
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff810429c6>] ksoftirqd+0xbc/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

sirq-tasklet/ S [ffff81013faa9540] ffff8100010085e8     0    10      2
 ffff81013faabeb0 0000000000000046 ffff81013faa9d18 0000000000000001
 ffffffff8129ebe0 ffff81013fa8b480 ffff81013faa9540 ffff81013fa8b480
 ffff81013faa9888 0000000000000001 0000000600000001 ffff810001008460
Call Trace:
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff810429c6>] ksoftirqd+0xbc/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

sirq-sched/0  S [ffff81013faa8b20] ffff8100010085e8     0    11      2
 ffff81013faadeb0 0000000000000046 ffff81000100b980 0000000101bbf1da
 000000018129ebe0 0000000000000400 ffff81013faa8b20 ffff81013fa94080
 ffff81013faa8e68 0000000000000000 0000000600000001 ffff8100010084c8
Call Trace:
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff810429c6>] ksoftirqd+0xbc/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

sirq-hrtimer/ S [ffff81013faa8100] ffff8100010085e8     0    12      2
 ffff81013fab1eb0 0000000000000046 ffff81013fab1e40 ffffffff8105dac5
 0000000600000001 ffff810001008710 ffff81013faa8100 ffff81013ccc4aa0
 ffff81013faa8448 0000000000000080 0000000600000001 ffff810001008530
Call Trace:
 [<ffffffff8105dac5>] ? rt_rwlock_read_unlock+0xf8/0x25d
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff810429c6>] ksoftirqd+0xbc/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

sirq-rcu/0    S [ffff81013fab3580] ffff8100010085e8     0    13      2
 ffff81013fab5eb0 0000000000000046 0000000000000000 ffff8100b68c3870
 ffff8100b68c38f0 ffff81013fa94080 ffff81013fab3580 ffff81013b1eb480
 ffff81013fab38c8 0000000000000000 0000000600000001 ffff810001008598
Call Trace:
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff810429c6>] ksoftirqd+0xbc/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff8104290a>] ? ksoftirqd+0x0/0x26f
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

watchdog/0    S [ffff81013fab2b60] 0000000000000000     0    14      2
 ffff81013fab7ef0 0000000000000046 0000000000000000 0000000000000246
 ffff81013fab7ea0 ffffffff8102ff0a ffff81013fab2b60 ffff81013fa8aa60
 ffff81013fab2ea8 00000000814cc260 ffff81013fab7eb0 ffffffff8102ffee
Call Trace:
 [<ffffffff8102ff0a>] ? __update_rq_clock+0x1a/0xe5
 [<ffffffff8102ffee>] ? update_rq_clock+0x19/0x1b
 [<ffffffff8107add4>] ? watchdog+0x0/0x58
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff8107ae1c>] watchdog+0x48/0x58
 [<ffffffff8128559a>] ? schedule+0xdf/0xff
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

desched/0     S [ffff81013fab2140] ffffffff8103bf65     0    15      2
 ffff81013fabbef0 0000000000000046 ffff81013fabbe90 ffff81013fabbeb4
 ffff81013da71540 ffff81013fc290c0 ffff81013fab2140 ffff81012e1c9500
 ffff81013fab2488 00000000810a9052 ffff81013fabbeb0 ffffffff81089c85
Call Trace:
 [<ffffffff81089c85>] ? __free_pages+0x18/0x21
 [<ffffffff8103bf65>] ? desched_thread+0x0/0x146
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff8103c076>] desched_thread+0x111/0x146
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

migration/1   S [ffff81013fabd480] ffff810001017158     0    16      2
 ffff81013fabfed0 0000000000000046 ffff8100010



SysRq : Show Memory
Node 0 DMA per-cpu:
CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    2: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    4: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    5: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    6: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    7: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd: 172   Cold: hi:   62, btch:  15 usd:  60
CPU    1: Hot: hi:  186, btch:  31 usd:  29   Cold: hi:   62, btch:  15 usd:  52
CPU    2: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:  53
CPU    3: Hot: hi:  186, btch:  31 usd:  51   Cold: hi:   62, btch:  15 usd:  57
CPU    4: Hot: hi:  186, btch:  31 usd:  70   Cold: hi:   62, btch:  15 usd:  55
CPU    5: Hot: hi:  186, btch:  31 usd:  42   Cold: hi:   62, btch:  15 usd:  55
CPU    6: Hot: hi:  186, btch:  31 usd: 165   Cold: hi:   62, btch:  15 usd:  56
CPU    7: Hot: hi:  186, btch:  31 usd:  22   Cold: hi:   62, btch:  15 usd:  61
Node 0 Normal per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd:  86   Cold: hi:   62, btch:  15 usd:  52
CPU    1: Hot: hi:  186, btch:  31 usd: 153   Cold: hi:   62, btch:  15 usd:  61
CPU    2: Hot: hi:  186, btch:  31 usd:  76   Cold: hi:   62, btch:  15 usd:  59
CPU    3: Hot: hi:  186, btch:  31 usd:  50   Cold: hi:   62, btch:  15 usd:  49
CPU    4: Hot: hi:  186, btch:  31 usd:  29   Cold: hi:   62, btch:  15 usd:  54
CPU    5: Hot: hi:  186, btch:  31 usd:  62   Cold: hi:   62, btch:  15 usd:  47
CPU    6: Hot: hi:  186, btch:  31 usd: 173   Cold: hi:   62, btch:  15 usd:  60
CPU    7: Hot: hi:  186, btch:  31 usd:  49   Cold: hi:   62, btch:  15 usd:  48
Active:124620 inactive:802981 dirty:91167 writeback:697 unstable:0
 free:5928 slab:77502 mapped:2680 pagetables:566 bounce:0
Node 0 DMA free:10388kB min:16kB low:20kB high:24kB active:0kB inactive:0kB
present:9668kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 2960 3960 3960
Node 0 DMA32 free:11104kB min:6012kB low:7512kB high:9016kB active:342056kB
inactive:2405708kB present:3031040kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 1000 1000
Node 0 Normal free:2220kB min:2028kB low:2532kB high:3040kB active:156424kB
inactive:806216kB present:1024000kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 3*4kB 3*8kB 3*16kB 4*32kB 3*64kB 2*128kB 2*256kB 0*512kB 1*1024kB
0*2048kB 2*4096kB = 10388kB
Node 0 DMA32: 842*4kB 39*8kB 28*16kB 24*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 1*2048kB 1*4096kB = 11040kB
Node 0 Normal: 54*4kB 14*8kB 11*16kB 1*32kB 3*64kB 4*128kB 1*256kB 1*512kB
0*1024kB 0*2048kB 0*4096kB = 2008kB
Swap cache: add 35, delete 35, find 0/0, race 0+0
Free swap  = 16458444kB
Total swap = 16458584kB


=Comment: #27=================================================
TIMOTHY R. CHAVEZ <chavezt.com> - 2008-08-20 17:01 EDT
Here is a list of all the blocked tasks (generated with sysrq+w).  I've not
looked this data yet...

SysRq : Show Blocked State
khubd         D [ffff81013f3a9580] ffffffff8129ebe0     0   324      2
 ffff81013ec55d90 0000000000000046 0000000000000000 ffff81013d9b5b60
 ffff81013ec55d20 ffffffff812872bb ffff81013f3a9580 ffff81013f063540
 ffff81013f3a98c8 0000000501bbf1b1 ffffffffffffffff 0000000101bbf1b1
Call Trace:
 [<ffffffff812872bb>] ? rt_spin_lock+0x9/0xb
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff8128586c>] schedule_timeout+0x8d/0xb4
 [<ffffffff81045994>] ? process_timeout+0x0/0xb
 [<ffffffff81285867>] ? schedule_timeout+0x88/0xb4
 [<ffffffff812858ac>] schedule_timeout_uninterruptible+0x19/0x1b
 [<ffffffff8104696b>] msleep+0x14/0x1e
 [<ffffffff811c3968>] hub_thread+0x49e/0xc8f
 [<ffffffff81051567>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff811c34ca>] ? hub_thread+0x0/0xc8f
 [<ffffffff811c34ca>] ? hub_thread+0x0/0xc8f
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

kswapd0       D [ffff81013db20a60] ffffffff8129ebe0     0   436      2
 ffff81013db25910 0000000000000046 0000000000000000 ffff81013db20a60
 ffff81013a405388 ffff81013d1d9838 ffff81013db20a60 ffffffff813aa7a0
 ffff81013db20da8 000000003d1d9818 ffffffffffffffff ffff81013d1d9818
Call Trace:
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff81285789>] io_schedule+0x5d/0x9f
 [<ffffffff8112a020>] get_request_wait+0x122/0x170
 [<ffffffff81051567>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff81126d80>] ? elv_merge+0x1df/0x21f
 [<ffffffff8112b5ae>] __make_request+0x50d/0x5c7
 [<ffffffff81139155>] ? radix_tree_delete+0x5a/0x456
 [<ffffffff810a8bc3>] ? _slab_irq_disable+0x37/0x5c
 [<ffffffff811284be>] generic_make_request+0x332/0x36d
 [<ffffffff81086bdf>] ? mempool_alloc_slab+0x11/0x13
 [<ffffffff810d667e>] ? blkdev_get_block+0x0/0x4d
 [<ffffffff8112861e>] submit_bio+0x125/0x12e
 [<ffffffff810d667e>] ? blkdev_get_block+0x0/0x4d
 [<ffffffff810d0e07>] submit_bh+0xed/0x111
 [<ffffffff810d2a35>] __block_write_full_page+0x1cb/0x2b7
 [<ffffffff810d667e>] ? blkdev_get_block+0x0/0x4d
 [<ffffffff810d667e>] ? blkdev_get_block+0x0/0x4d
 [<ffffffff810d2be2>] block_write_full_page+0xc1/0xcd
 [<ffffffff810d5723>] blkdev_writepage+0x13/0x15
 [<ffffffff8108f383>] shrink_page_list+0x351/0x5bb
 [<ffffffff8108f751>] shrink_inactive_list+0x164/0x4cb
 [<ffffffff8108fbcd>] shrink_zone+0x115/0x13b
 [<ffffffff8109016e>] kswapd+0x347/0x512
 [<ffffffff81051567>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff8108fe27>] ? kswapd+0x0/0x512
 [<ffffffff8108fe27>] ? kswapd+0x0/0x512
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

kjournald     D [ffff81013cec6100] ffff810001037980     0  1037      2
 ffff81013c9bbd10 0000000000000046 0000000007023848 ffff81013f1c87d8
 0000000000000000 ffff81013c9bbdb0 ffff81013cec6100 ffff81013f2c0040
 ffff81013cec6448 0000000481129d76 ffff81013c9bbcd0 ffff81013f1c87d8
Call Trace:
 [<ffffffff81056a4c>] ? getnstimeofday+0x31/0x88
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff81285789>] io_schedule+0x5d/0x9f
 [<ffffffff810d1f55>] sync_buffer+0x3e/0x42
 [<ffffffff81285a01>] __wait_on_bit+0x45/0x77
 [<ffffffff810d1f17>] ? sync_buffer+0x0/0x42
 [<ffffffff810d1f17>] ? sync_buffer+0x0/0x42
 [<ffffffff81285aa1>] out_of_line_wait_on_bit+0x6e/0x7b
 [<ffffffff8105159f>] ? wake_bit_function+0x0/0x2a
 [<ffffffff88024197>] ? :jbd:__journal_file_buffer+0xb5/0x161
 [<ffffffff810d1ed5>] __wait_on_buffer+0x20/0x22
 [<ffffffff880260e7>] :jbd:journal_commit_transaction+0x44c/0xd81
 [<ffffffff812872bb>] ? rt_spin_lock+0x9/0xb
 [<ffffffff81045fe6>] ? try_to_del_timer_sync+0x52/0x5c
 [<ffffffff88029981>] :jbd:kjournald+0xca/0x23f
 [<ffffffff81051567>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff880298b7>] ? :jbd:kjournald+0x0/0x23f
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

syslogd       D [ffff81013b216aa0] ffff81013d12b000     0  2871      1
 ffff81013a4e1d18 0000000000000086 ffff81013a4e1ce8 ffffffff81033412
 ffff81013a4e1de8 000000008103982e ffff81013b216aa0 ffff81013cec6100
 ffff81013b216de8 000000043c9bbee0 0000000600000001 ffff81013d12b1d0
Call Trace:
 [<ffffffff81033412>] ? try_to_wake_up+0x20d/0x21e
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff88028e21>] :jbd:log_wait_commit+0xa9/0xfe
 [<ffffffff81051567>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff8103982e>] ? __wake_up+0x3a/0x5b
 [<ffffffff88024569>] :jbd:journal_stop+0x1a4/0x1d7
 [<ffffffff88024f5f>] :jbd:journal_force_commit+0x23/0x25
 [<ffffffff8803f209>] :ext3:ext3_force_commit+0x26/0x28
 [<ffffffff8803905e>] :ext3:ext3_write_inode+0x39/0x3f
 [<ffffffff810ccf9f>] __writeback_single_inode+0x1e0/0x30a
 [<ffffffff810cd0ed>] sync_inode+0x24/0x39
 [<ffffffff8803543b>] :ext3:ext3_sync_file+0x8f/0xa0
 [<ffffffff810cff1b>] do_fsync+0x54/0xaa
 [<ffffffff810cff9f>] __do_fsync+0x2e/0x44
 [<ffffffff810cffd0>] sys_fsync+0xb/0xd
 [<ffffffff8100c37e>] traceret+0x0/0x5

kjournald     D [ffff81013cf9cb20] ffffffff8129ebe0     0 10317      2
 ffff810123499d10 0000000000000046 0000000000000000 ffffffff8112b61d
 0000000000001000 ffffffff81286cae ffff81013cf9cb20 ffffffff813aa7a0
 ffff81013cf9ce68 0000000081129d69 ffffffffffffffff ffff81013d1d9818
Call Trace:
 [<ffffffff8112b61d>] ? __make_request+0x57c/0x5c7
 [<ffffffff81286cae>] ? rt_spin_lock_slowlock+0x20f/0x235
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff81285789>] io_schedule+0x5d/0x9f
 [<ffffffff810d1f55>] sync_buffer+0x3e/0x42
 [<ffffffff81285a01>] __wait_on_bit+0x45/0x77
 [<ffffffff810d1f17>] ? sync_buffer+0x0/0x42
 [<ffffffff810d1f17>] ? sync_buffer+0x0/0x42
 [<ffffffff81285aa1>] out_of_line_wait_on_bit+0x6e/0x7b
 [<ffffffff8105159f>] ? wake_bit_function+0x0/0x2a
 [<ffffffff810d1ed5>] __wait_on_buffer+0x20/0x22
 [<ffffffff8802649b>] :jbd:journal_commit_transaction+0x800/0xd81
 [<ffffffff81045fe6>] ? try_to_del_timer_sync+0x52/0x5c
 [<ffffffff88029981>] :jbd:kjournald+0xca/0x23f
 [<ffffffff81051567>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff880298b7>] ? :jbd:kjournald+0x0/0x23f
 [<ffffffff81051443>] kthread+0x49/0x76
 [<ffffffff8100d048>] child_rip+0xa/0x12
 [<ffffffff810513fa>] ? kthread+0x0/0x76
 [<ffffffff8100d03e>] ? child_rip+0x0/0x12

blast         D [ffff81013b452100] ffffffff8129ebe0     0 10364  10337
 ffff810123461458 0000000000000086 0000000000000000 000000061ca7bc30
 ffff81013a405388 ffff81013d1d9838 ffff81013b452100 ffff81013f0b00c0
 ffff81013b452448 000000063d1d9818 ffffffffffffffff ffff81013d1d9818
Call Trace:
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff81285789>] io_schedule+0x5d/0x9f
 [<ffffffff8112a020>] get_request_wait+0x122/0x170
 [<ffffffff81051567>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff81126d80>] ? elv_merge+0x1df/0x21f
 [<ffffffff8112b5ae>] __make_request+0x50d/0x5c7
 [<ffffffff810a8bc3>] ? _slab_irq_disable+0x37/0x5c
 [<ffffffff811284be>] generic_make_request+0x332/0x36d
 [<ffffffff81086bdf>] ? mempool_alloc_slab+0x11/0x13
 [<ffffffff810d667e>] ? blkdev_get_block+0x0/0x4d
 [<ffffffff8112861e>] submit_bio+0x125/0x12e
 [<ffffffff810d667e>] ? blkdev_get_block+0x0/0x4d
 [<ffffffff810d0e07>] submit_bh+0xed/0x111
 [<ffffffff810d2a35>] __block_write_full_page+0x1cb/0x2b7
 [<ffffffff810d667e>] ? blkdev_get_block+0x0/0x4d
 [<ffffffff810d667e>] ? blkdev_get_block+0x0/0x4d
 [<ffffffff810d2be2>] block_write_full_page+0xc1/0xcd
 [<ffffffff810d5723>] blkdev_writepage+0x13/0x15
 [<ffffffff8108f383>] shrink_page_list+0x351/0x5bb
 [<ffffffff88036a30>] ? :ext3:ext3_mark_iloc_dirty+0x2bb/0x333
 [<ffffffff8108f751>] shrink_inactive_list+0x164/0x4cb
 [<ffffffff8108fbcd>] shrink_zone+0x115/0x13b
 [<ffffffff81090aa1>] try_to_free_pages+0x1df/0x2f6
 [<ffffffff8108a5b2>] __alloc_pages+0x1ce/0x312
 [<ffffffff812872bb>] ? rt_spin_lock+0x9/0xb
 [<ffffffff810a4466>] alloc_pages_current+0xa8/0xb1
 [<ffffffff810845ad>] __page_cache_alloc+0x8e/0x92
 [<ffffffff81084834>] __grab_cache_page+0x36/0x74
 [<ffffffff880391a1>] :ext3:ext3_write_begin+0x65/0x19e
 [<ffffffff8105f377>] ? rt_mutex_up_read+0xf8/0x260
 [<ffffffff810854e0>] generic_file_buffered_write+0x151/0x5d6
 [<ffffffff81041979>] ? current_fs_time+0x22/0x29
 [<ffffffff88045da2>] ? :ext3:ext3_xattr_security_get+0x21/0x23
 [<ffffffff810c4366>] ? file_update_time+0x30/0xad
 [<ffffffff81085cbe>] __generic_file_aio_write_nolock+0x359/0x3c3
 [<ffffffff810a9052>] ? kmem_cache_free+0x49/0x54
 [<ffffffff8128570e>] ? preempt_schedule_irq+0x52/0x70
 [<ffffffff81085d8c>] generic_file_aio_write+0x64/0xc0
 [<ffffffff880352ae>] :ext3:ext3_file_write+0x1e/0x9e
 [<ffffffff810afb23>] do_sync_write+0xe2/0x126
 [<ffffffff81051567>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff8128570e>] ? preempt_schedule_irq+0x52/0x70
 [<ffffffff8100c846>] ? retint_kernel+0x26/0x30
 [<ffffffff8107781d>] ? audit_syscall_entry+0x148/0x17e
 [<ffffffff810b0369>] vfs_write+0xc7/0x170
 [<ffffffff810b0979>] sys_write+0x4a/0x76
 [<ffffffff8100c37e>] traceret+0x0/0x5

blast         D [ffff810129384040] ffffffff8129ebe0     0 10365  10337
 ffff8101280df458 0000000000000086 0000000000000000 ffff810129384040
 ffff81013a405388 ffff81013d1d9838 ffff810129384040 ffffffff813aa7a0
 ffff810129384388 000000003d1d9818 ffffffffffffffff ffff81013d1d9818
Call Trace:
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff81285789>] io_schedule+0x5d/0x9f
 [<ffffffff8112a020>] get_request_wait+0x122/0x170
 [<ffffffff81051567>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff81126d80>] ? elv_merge+0x1df/0x21f
 [<ffffffff8112b5ae>] __make_request+0x50d/0x5c7
 [<ffffffff81139524>] ? radix_tree_delete+0x429/0x456
 [<ffffffff810a8bc3>] ? _slab_irq_disable+0x37/0x5c
 [<ffffffff811284be>] generic_make_request+0x332/0x36d
 [<ffffffff81086bdf>] ? mempool_alloc_slab+0x11/0x13
 [<ffffffff810d667e>] ? blkdev_get_block+0x0/0x4d
 [<ffffffff8112861e>] submit_bio+0x125/0x12e
 [<ffffffff810d667e>] ? blkdev_get_block+0x0/0x4d
 [<ffffffff810d0e07>] submit_bh+0xed/0x111
 [<ffffffff810d2a35>] __block_write_full_page+0x1cb/0x2b7
 [<ffffffff810d667e>] ? blkdev_get_block+0x0/0x4d
 [<ffffffff810d667e>] ? blkdev_get_block+0x0/0x4d
 [<ffffffff810d2be2>] block_write_full_page+0xc1/0xcd
 [<ffffffff810d5723>] blkdev_writepage+0x13/0x15
 [<ffffffff8108f383>] shrink_page_list+0x351/0x5bb
 [<ffffffff812872bb>] ? rt_spin_lock+0x9/0xb
 [<ffffffff88025864>] ? :jbd:journal_dirty_metadata+0xe8/0xf7
 [<ffffffff8108f751>] shrink_inactive_list+0x164/0x4cb
 [<ffffffff81051482>] ? bit_waitqueue+0x12/0xa6
 [<ffffffff8108fbcd>] shrink_zone+0x115/0x13b
 [<ffffffff81090aa1>] try_to_free_pages+0x1df/0x2f6
 [<ffffffff8108a5b2>] __alloc_pages+0x1ce/0x312
 [<ffffffff812872bb>] ? rt_spin_lock+0x9/0xb
 [<ffffffff810a4466>] alloc_pages_current+0xa8/0xb1
 [<ffffffff810845ad>] __page_cache_alloc+0x8e/0x92
 [<ffffffff81084834>] __grab_cache_page+0x36/0x74
 [<ffffffff880391a1>] :ext3:ext3_write_begin+0x65/0x19e
 [<ffffffff8105f377>] ? rt_mutex_up_read+0xf8/0x260
 [<ffffffff810854e0>] generic_file_buffered_write+0x151/0x5d6
 [<ffffffff81041979>] ? current_fs_time+0x22/0x29
 [<ffffffff88045da2>] ? :ext3:ext3_xattr_security_get+0x21/0x23
 [<ffffffff810c4366>] ? file_update_time+0x30/0xad
 [<ffffffff81085cbe>] __generic_file_aio_write_nolock+0x359/0x3c3
 [<ffffffff810a9052>] ? kmem_cache_free+0x49/0x54
 [<ffffffff8802458d>] ? :jbd:journal_stop+0x1c8/0x1d7
 [<ffffffff81085d8c>] generic_file_aio_write+0x64/0xc0
 [<ffffffff880352ae>] :ext3:ext3_file_write+0x1e/0x9e
 [<ffffffff810afb23>] do_sync_write+0xe2/0x126
 [<ffffffff810b0c14>] ? file_move+0x5e/0xa6
 [<ffffffff81051567>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff810ae55e>] ? nameidata_to_filp+0x2d/0x3f
 [<ffffffff810ae5a6>] ? do_filp_open+0x36/0x46
 [<ffffffff8107781d>] ? audit_syscall_entry+0x148/0x17e
 [<ffffffff810b0369>] vfs_write+0xc7/0x170
 [<ffffffff810b0979>] sys_write+0x4a/0x76
 [<ffffffff8100c37e>] traceret+0x0/0x5

blast         D [ffff81013b300040] ffffffff8129ebe0     0 10366  10337
 ffff81011454d9d8 0000000000000086 0000000000000000 0000000000200020
 ffff81013fa58d00 ffff81011454d984 ffff81013b300040 ffffffff813aa7a0
 ffff81013b300388 0000000081138a45 ffffffffffffffff ffffffff81138714
Call Trace:
 [<ffffffff81138714>] ? radix_valid_always+0x0/0xb
 [<ffffffff8108424e>] ? sync_page+0x0/0x45
 [<ffffffff8128559a>] schedule+0xdf/0xff
 [<ffffffff81285789>] io_schedule+0x5d/0x9f
 [<ffffffff8108428f>] sync_page+0x41/0x45
 [<ffffffff8128590b>] __wait_on_bit_lock+0x42/0x78
 [<ffffffff8108423f>] __lock_page+0xc2/0xc4
 [<ffffffff8105159f>] ? wake_bit_function+0x0/0x2a
 [<ffffffff81084448>] find_lock_page+0x39/0x5d
 [<ffffffff8108481a>

=Comment: #28=================================================
TIMOTHY R. CHAVEZ <chavezt.com> - 2008-08-20 17:12 EDT
This may or may not be helpful, but I'm adding for the elm3b210 hang...

Sched Debug Version: v0.07, 2.6.24.7-74ibmrt2.5 #1
now at 29394874.636514 msecs
  .sysctl_sched_latency                    : 80.000000
  .sysctl_sched_min_granularity            : 16.000000
  .sysctl_sched_wakeup_granularity         : 40.000000
  .sysctl_sched_batch_wakeup_granularity   : 40.000000
  .sysctl_sched_child_runs_first           : 0.000001
  .sysctl_sched_features                   : 39

cpu#0, 1600.057 MHz
  .nr_running                    : 5
  .load                          : 358116
  .nr_switches                   : 88879043
  .nr_load_updates               : 29394296
  .nr_uninterruptible            : -96802
  .jiffies                       : 4324061591
  .next_balance                  : 4324.061658
  .curr->pid                     : 10368
  .clock                         : 29389828.078136
  .idle_clock                    : 0.000000
  .prev_clock_raw                : 29433077.040565
  .clock_warps                   : 0
  .clock_overflows               : 195920
  .clock_deep_idle_events        : 0
  .clock_max_delta               : 0.999028
  .cpu_load[0]                   : 1024
  .cpu_load[1]                   : 518
  .cpu_load[2]                   : 1234
  .cpu_load[3]                   : 5535
  .cpu_load[4]                   : 9570
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 183
  .sched_switch                  : 0
  .sched_count                   : 90154834
  .sched_goidle                  : 28092341
  .ttwu_count                    : 54625679
  .ttwu_local                    : 50002910
  .bkl_count                     : 698
  .rto_schedule                  : 35718833
  .rto_schedule_tail             : 48115
  .rto_wakeup                    : 3824567
  .rto_pulled                    : 24
  .rto_pushed                    : 38054

Comment 1 IBM Bug Proxy 2008-08-28 10:40:37 UTC
Created attachment 315191 [details]
online/offline test

Comment 2 IBM Bug Proxy 2008-08-28 11:43:09 UTC
*** Continuation of the previous comment ***

  .lb_breaks                     : 32332
  .rt.rt_nr_running              : 2
  .rt.rt_nr_uninterruptible      : 4079

cfs_rq
  .exec_clock                    : 4898525.373802
  .MIN_vruntime                  : 34189622.761175
  .min_vruntime                  : 34189702.761175
  .max_vruntime                  : 34189622.761175
  .spread                        : 0.000000
  .spread0                       : 0.000000
  .nr_running                    : 3
  .load                          : 3072
  .bkl_count                     : 698
  .nr_spread_over                : 3069086

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime       
 sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------
          IRQ-19   746 -12494417.560092  10062067     4 -12494417.560092   
143536.971248         0.005545
          IRQ-17  2798  -2244852.954366    181099     4  -2244852.954366     
2411.567408         0.020340
            ntpd  4379  34189622.761175     34907   120  34189622.761175     
1547.736658  29332812.052997
    avahi-daemon  4598  34189622.761175         5   120  34189622.761175       
 0.317794  29333393.056235
R          blast 10368  34189702.761175   6477830   120  34189702.761175  
3658908.920682  23218972.614414

cpu#1, 1600.057 MHz
  .nr_running                    : 0
  .load                          : 0
  .nr_switches                   : 84617285
  .nr_load_updates               : 29398387
  .nr_uninterruptible            : -12763
  .jiffies                       : 4324061591
  .next_balance                  : 4324.061614
  .curr->pid                     : 0
  .clock                         : 29393919.445091
  .idle_clock                    : 0.000000
  .prev_clock_raw                : 75270672.883051
  .clock_warps                   : 0
  .clock_overflows               : 128687
  .clock_deep_idle_events        : 0
  .clock_max_delta               : 0.996688
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 0
  .cpu_load[2]                   : 0
  .cpu_load[3]                   : 0
  .cpu_load[4]                   : 0
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 416
  .sched_switch                  : 0
  .sched_count                   : 85639915
  .sched_goidle                  : 26518251
  .ttwu_count                    : 51411965
  .ttwu_local                    : 48234063
  .bkl_count                     : 527
  .rto_schedule                  : 33245639
  .rto_schedule_tail             : 27157
  .rto_wakeup                    : 2347621
  .rto_pulled                    : 45
  .rto_pushed                    : 30951
  .lb_breaks                     : 20406
  .rt.rt_nr_running              : 0
  .rt.rt_nr_uninterruptible      : -4098

cfs_rq
  .exec_clock                    : 5381010.622923
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 30962011.829487
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : -3227690.931688
  .nr_running                    : 0
  .load                          : 0
  .bkl_count                     : 527
  .nr_spread_over                : 2184127

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime       
 sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------

cpu#2, 1600.057 MHz
  .nr_running                    : 0
  .load                          : 0
  .nr_switches                   : 76753549
  .nr_load_updates               : 29398316
  .nr_uninterruptible            : 63109
  .jiffies                       : 4324061591
  .next_balance                  : 4324.061674
  .curr->pid                     : 0
  .clock                         : 29393848.455904
  .idle_clock                    : 0.000000
  .prev_clock_raw                : 75270685.700782
  .clock_warps                   : 0
  .clock_overflows               : 108048
  .clock_deep_idle_events        : 0
  .clock_max_delta               : 0.996108
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 0
  .cpu_load[2]                   : 0
  .cpu_load[3]                   : 0
  .cpu_load[4]                   : 0
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 445
  .sched_switch                  : 0
  .sched_count                   : 76966943
  .sched_goidle                  : 25457885
  .ttwu_count                    : 45870808
  .ttwu_local                    : 45203771
  .bkl_count                     : 65
  .rto_schedule                  : 29527971
  .rto_schedule_tail             : 955
  .rto_wakeup                    : 31174
  .rto_pulled                    : 18
  .rto_pushed                    : 11261
  .lb_breaks                     : 16155
  .rt.rt_nr_running              : 0
  .rt.rt_nr_uninterruptible      : -210

cfs_rq
  .exec_clock                    : 4532559.248778
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 38661343.821066
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : 4471641.059891
  .nr_running                    : 0
  .load                          : 0
  .bkl_count                     : 65
  .nr_spread_over                : 89403

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime       
 sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------

cpu#3, 1600.057 MHz
  .nr_running                    : 0
  .load                          : 0
  .nr_switches                   : 75597502
  .nr_load_updates               : 29398246
  .nr_uninterruptible            : 50434
  .jiffies                       : 4324061591
  .next_balance                  : 4324.061681
  .curr->pid                     : 0
  .clock                         : 29393778.466530
  .idle_clock                    : 0.000000
  .prev_clock_raw                : 75270693.534770
  .clock_warps                   : 0
  .clock_overflows               : 102878
  .clock_deep_idle_events        : 0
  .clock_max_delta               : 0.996048
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 0
  .cpu_load[2]                   : 0
  .cpu_load[3]                   : 0
  .cpu_load[4]                   : 0
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 972
  .sched_switch                  : 0
  .sched_count                   : 75735418
  .sched_goidle                  : 27355032
  .ttwu_count                    : 45267919
  .ttwu_local                    : 44694884
  .bkl_count                     : 44
  .rto_schedule                  : 29463844
  .rto_schedule_tail             : 341
  .rto_wakeup                    : 388
  .rto_pulled                    : 13
  .rto_pushed                    : 6784
  .lb_breaks                     : 20677
  .rt.rt_nr_running              : 0
  .rt.rt_nr_uninterruptible      : -87

cfs_rq
  .exec_clock                    : 2558515.929605
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 37206459.647282
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : 3016756.886107
  .nr_running                    : 0
  .load                          : 0
  .bkl_count                     : 44
  .nr_spread_over                : 73282

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime       
 sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------

cpu#4, 1600.057 MHz
  .nr_running                    : 1
  .load                          : 177522
  .nr_switches                   : 75246847
  .nr_load_updates               : 29398175
  .nr_uninterruptible            : 40081
  .jiffies                       : 4324061591
  .next_balance                  : 4324.061595
  .curr->pid                     : 587
  .clock                         : 29393707.511111
  .idle_clock                    : 0.000000
  .prev_clock_raw                : 75418323.767358
  .clock_warps                   : 0
  .clock_overflows               : 138647
  .clock_deep_idle_events        : 0
  .clock_max_delta               : 0.996031
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 0
  .cpu_load[2]                   : 0
  .cpu_load[3]                   : 0
  .cpu_load[4]                   : 0
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 1426
  .sched_switch                  : 0
  .sched_count                   : 75333824
  .sched_goidle                  : 28395180
  .ttwu_count                    : 45180035
  .ttwu_local                    : 44719289
  .bkl_count                     : 29
  .rto_schedule                  : 29460959
  .rto_schedule_tail             : 198
  .rto_wakeup                    : 333
  .rto_pulled                    : 8
  .rto_pushed                    : 3890
  .lb_breaks                     : 23273
  .rt.rt_nr_running              : 1
  .rt.rt_nr_uninterruptible      : -62

cfs_rq
  .exec_clock                    : 1437848.398469
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 28071407.036211
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : -6118295.724964
  .nr_running                    : 0
  .load                          : 0
  .bkl_count                     : 29
  .nr_spread_over                : 69496

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime       
 sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------
R          IRQ-3   587       179.969964       444     4       179.969964       
 9.079176         0.005109

cpu#5, 1600.057 MHz
  .nr_running                    : 0
  .load                          : 0
  .nr_switches                   : 75376770
  .nr_load_updates               : 29398104
  .nr_uninterruptible            : 26728
  .jiffies                       : 4324061591
  .next_balance                  : 4324.061595
  .curr->pid                     : 0
  .clock                         : 29393636.488096
  .idle_clock                    : 0.000000
  .prev_clock_raw                : 54153304.160724
  .clock_warps                   : 0
  .clock_overflows               : 97144
  .clock_deep_idle_events        : 0
  .clock_max_delta               : 0.998701
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 0
  .cpu_load[2]                   : 0
  .cpu_load[3]                   : 0
  .cpu_load[4]                   : 0
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 1677
  .sched_switch                  : 0
  .sched_count                   : 75458943
  .sched_goidle                  : 28789756
  .ttwu_count                    : 45282745
  .ttwu_local                    : 44965719
  .bkl_count                     : 99
  .rto_schedule                  : 29459623
  .rto_schedule_tail             : 119
  .rto_wakeup                    : 455
  .rto_pulled                    : 8
  .rto_pushed                    : 2446
  .lb_breaks                     : 18517
  .rt.rt_nr_running              : 0
  .rt.rt_nr_uninterruptible      : -41

cfs_rq
  .exec_clock                    : 909053.116367
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 25311238.380281
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : -8878464.380894
  .nr_running                    : 0
  .load                          : 0
  .bkl_count                     : 99
  .nr_spread_over                : 69037

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime       
 sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------

cpu#6, 1600.057 MHz
  .nr_running                    : 0
  .load                          : 0
  .nr_switches                   : 89140520
  .nr_load_updates               : 29398032
  .nr_uninterruptible            : -156266
  .jiffies                       : 4324061591
  .next_balance                  : 4324.061751
  .curr->pid                     : 0
  .clock                         : 29393564.500525
  .idle_clock                    : 0.000000
  .prev_clock_raw                : 75270686.526764
  .clock_warps                   : 0
  .clock_overflows               : 167827
  .clock_deep_idle_events        : 0
  .clock_max_delta               : 0.998390
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 0
  .cpu_load[2]                   : 0
  .cpu_load[3]                   : 0
  .cpu_load[4]                   : 0
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 1774
  .sched_switch                  : 0
  .sched_count                   : 90431997
  .sched_goidle                  : 30559564
  .ttwu_count                    : 55066830
  .ttwu_local                    : 54451675
  .bkl_count                     : 636
  .rto_schedule                  : 36481231
  .rto_schedule_tail             : 77343
  .rto_wakeup                    : 4217166
  .rto_pulled                    : 58
  .rto_pushed                    : 27291
  .lb_breaks                     : 5601
  .rt.rt_nr_running              : 0
  .rt.rt_nr_uninterruptible      : -52

cfs_rq
  .exec_clock                    : 2604299.438836
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 34814678.176308
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : 624975.415133
  .nr_running                    : 0
  .load                          : 0
  .bkl_count                     : 636
  .nr_spread_over                : 2998327

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime       
 sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------

cpu#7, 1600.057 MHz
  .nr_running                    : 0
  .load                          : 0
  .nr_switches                   : 76842021
  .nr_load_updates               : 29397960
  .nr_uninterruptible            : 85523
  .jiffies                       : 4324061591
  .next_balance                  : 4324.061594
  .curr->pid                     : 0
  .clock                         : 29393492.510016
  .idle_clock                    : 0.000000
  .prev_clock_raw                : 75270695.181804
  .clock_warps                   : 0
  .clock_overflows               : 101562
  .clock_deep_idle_events        : 0
  .clock_max_delta               : 0.995775
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 0
  .cpu_load[2]                   : 0
  .cpu_load[3]                   : 0
  .cpu_load[4]                   : 0
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 2428
  .sched_switch                  : 0
  .sched_count                   : 76994234
  .sched_goidle                  : 27556332
  .ttwu_count                    : 45937928
  .ttwu_local                    : 45374789
  .bkl_count                     : 52
  .rto_schedule                  : 29464909
  .rto_schedule_tail             : 374
  .rto_wakeup                    : 1307
  .rto_pulled                    : 7
  .rto_pushed                    : 4268
  .lb_breaks                     : 101688
  .rt.rt_nr_running              : 0
  .rt.rt_nr_uninterruptible      : -73

cfs_rq
  .exec_clock                    : 2344576.091583
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 34734213.666132
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : 544510.904957
  .nr_running                    : 0
  .load                          : 0
  .bkl_count                     : 52
  .nr_spread_over                : 89724

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime       
 sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------

------- Additional Comment #34 From Venkateswarara Jujjuri 2008-08-21 01:48 EDT 
Today I did three experiments.

1. Blast to local disks (No SAN configuration):
  I my previous post; I said this configuration working fine. Well my statement
was too early. I let it run overnight and it hung the system. So this confirms
that the problem is not SAN related.

2. Made elm3c29 without multipath drivers. I started blast on it. I will let it
run overnight.

3. On elm3c24 with multipath driver. I started the blast. As usual after 3 hours
it practically hung the system. 
Through the console I took system back trace twice with 1hr and 24 hours gap.
Plan is to analyze these two stacks and see which processes got hung and what
are making progress. I am hoping this gives us some indication on lock
contention. I will update the defect with my findings.

------- Additional Comment #35 From Sripathi Kodi 2008-08-21 04:46 EDT
Tim, the list of processes in comment #26 seems to be incomplete. Copy-paste
error, by any chance?

------- Additional Comment #36 From Sripathi Kodi 2008-08-21 07:39 EDT [reply] 
I am trying to make sense out of the dump mentioned in comment #22. However, it
is very difficult to get information about function parameters. Looks like we
are not passing them on the stack any more.

------- Additional Comment #37 From Vernon Mauery 2008-08-21 09:11 EDT [reply] 
For some reason, I think we went back to register passed function arguments. 
Even as far back as SR3, we are using register passed arguments.  It sure does
make it a bear to debug.

------- Additional Comment #38 From TIMOTHY R. CHAVEZ 2008-08-21 11:40 EDT 
(In reply to comment #35)
Yeah, I noticed that last night... I couldn't keep the console alive long enough
to get it all... I'll try again today.  There are some interesting stack traces
for blast and pdflush, for example.

------- Additional Comment #40 From Venkateswarara Jujjuri 2008-08-22 01:03 EDT 
This defect is giving conflicting data points. 
From the latest investigation:
- SAN configured with dm_multipath hung in 1 hour.
- SAN configured without dm_multipath ran overnight w/o any problem.
This may tell us that.. it could be two different problems. 
The one we have seen on the local disks could be a different issue.

I took the crashdump on elm3c24 when it is hung. Dump is available on KBIC.

The stack shows that most of the threads have submitted IO and waiting for it to
complete. Or waiting for a buffer to submit IO. I could not find anyone holding
lock and got preempted. 

This might give an indication that the bio_done notification could be an issue.

Another interesting thing is;  I took 3 back traces with an hour gap ..while the
system is hung. And then dump after 10 hours. 
Stack on all these three instances are identical. This tells us that the stacks
pointed by dump are that of the hung processes...not a point in time.

Next step is to instrument dm_scsi module. I will be doing that tomorrow.

------- Additional Comment #41 From Sripathi Kodi 2008-08-22 06:27 EDT [reply] 
Google led me to
http://readlist.com/lists/vger.kernel.org/linux-kernel/88/443749.html which
reports a problem with processes getting stuck in sync_buffer. We see some bash
and blast threads in our dumps which are stuck with similar backtraces.

The next mail on the thread :
http://readlist.com/lists/vger.kernel.org/linux-kernel/88/443762.html says the
problem is fixed by commit 149a051f82d2b3860fe32fa182dbc83a66274894. In fact the
code around this area has changed again since then. This patch is not in MRG
kernels.

Has this problem been recreated with latest kernels? 2.6.26-rt1? Can the above
be the patch we need?

------- Additional Comment #42 From TIMOTHY R. CHAVEZ 2008-08-25 10:13 EDT
(In reply to comment #41)
I couldn't get this elmb3b210 (the LS41 w/ local disk we were able to repo the
failure on) to boot 2,6,23,3-rt2 for some reason... blargh... trying with an
LS21 this morning.

------- Additional Comment #43 From Venkateswarara Jujjuri 2008-08-25 12:37 EDT 
Over the wk-end; I booted elm3c24 with 2.6.26.3-rt2 and was able to reproduce
the problem.
Configuration of the system:
- AMD 2 Quad Cores
- fastT San disks.
- Single Path.

I reproduced the problem on this kernel with and without multipath driver in the
stack. i.e on mpath devices and on sd devices.

Now next plan is:
1. Configure the system with non-ext3 fs( ReiserFS??) and rerun the blast.
2. Walk through the code and instrument it. With the help of Paul Mckenney's
debugfs.
3. Analyze the dump. 

------- Additional Comment #44 From Venkateswarara Jujjuri 2008-08-26 01:04 EDT 
To eliminate the FS type variable; I have configured the system with JFS
filesystem and started the blast test on multipath (mpath)devices. Test are
running for the last 10 hours.!! Ext3 multipath combination consistently hung
the system in 2 hours. Interesting that JFS+mpath has been running for the last
10 hours. We may have to wait until tomorrow morning before making any
conclusions.  

------- Additional Comment #47 From Venkateswarara Jujjuri 2008-08-26 13:56 EDT 
My testing on JFS+mpath+RT went fine for almost 20 hours. and then blast failed
with "Bus error"  
and on the terminal :

Message from syslogd@ at Tue Aug 26 10:36:43 2008 ...
elm3c24 kernel: journal commit I/O error

More digging showed that the problem is with the root fs running ext3
dmesg shows:
EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
__journal_remove_journal_head: freeing b_frozen_data
__journal_remove_journal_head: freeing b_frozen_data
journal commit I/O error
Buffer I/O error on device dm-0, logical block 15466498
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 15466499
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 15892561
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 15892605
lost page write due to I/O error on dm-0
__ratelimit: 103 messages suppressed
Buffer I/O error on device dm-0, logical block 0
lost page write due to I/O error on dm-0
EXT3-fs error (device dm-0): ext3_find_entry: reading directory #13598721 offset 0
EXT3-fs error (device dm-0): ext3_find_entry: reading directory #13598721 offset 0
EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block -
inode=17465504, block=17465350
EXT3-fs error (device dm-0): ext3_find_entry: reading directory #13598721 offset 0
metapage_write_end_io: I/O error

------- Additional Comment #48 From Venkateswarara Jujjuri 2008-08-26 14:57 EDT 
Ext3 issue reported in Comment #47 is because of SAN s/w reset. Please ignore it.

------- Additional Comment #49 From Venkateswarara Jujjuri 2008-08-27 01:19 EDT 
Tried on ext2 fs and got reproduced in around 3 hours.
To make sure; I have started JFS test again.

Me and Keith spent most of the afternoon cscoping and code walking the IO path.
Crash shows number io waits on each processor. We are trying to got to the bio
buffer head to findout the status of pending IOs. 

crash> struct rq ffff810009016980 | grep -e iowait -e counter
  nr_iowait = {
    counter = 6
crash> struct rq ffff8101500a2980 | grep -e iowait -e counter
  nr_iowait = {
    counter = 2
crash> struct rq ffff810009021980 | grep -e iowait -e counter
  nr_iowait = {
    counter = 0
crash> struct rq ffff8101500ad980 | grep -e iowait -e counter
  nr_iowait = {
    counter = 4
crash> struct rq ffff81000902c980 | grep -e iowait -e counter
  nr_iowait = {
    counter = 0
crash> struct rq ffff8101500b8980 | grep -e iowait -e counter
  nr_iowait = {
    counter = 2
crash> struct rq ffff81000900b980 | grep -e iowait -e counter
  nr_iowait = {
    counter = 27
crash> struct task_struct ffff81014d141480 > blast_task_struct
crash> struct task_struct ffff81014d141480 |grep -e blast -e bio
  comm = "blast\000d\000\000\000\000\000\000\000\000", 
  bio_list = 0x0, 
  bio_tail = 0x0, 
crash> foreach struct task_struct |less
foreach: unknown argument: "struct"
foreach: unknown argument: "task_struct"
foreach: no keywords specified
crash> foreach task|less

------- Additional Comment #50 From TIMOTHY R. CHAVEZ 2008-08-27 09:47 EDT
(In reply to comment #49)
Interesting... I left blast running for about 19 hours and 20 minutes with ext2
/ about 835 loops (I left my notes in the office so these are approximations)
without problems.  That said, I did notice I was only testing on 3 mount points
and not 4 (doh!).  I restarted this morning with ext3 mounted and made sure it
was running on 4 mounts.

Is the dumpfile available on kbic?  Were you able to get any other information
about the system via sysrq?

------- Additional Comment #51 From TIMOTHY R. CHAVEZ 2008-08-27 16:41 EDT 
(In reply to comment #41)
I was skimming through LKML and I came across this conversation which reports a
problem with a stack trace that seems awfully similar... I do not yet know what
the conclusion of this dialog was.  Adding link for documented reference...

http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-03/msg02480.html

Comment 3 IBM Bug Proxy 2008-08-28 21:30:55 UTC
A 36 hour run reproduced the problem on JFS too. This brings us back from
suspecting ext3 FS. I am analyzing the dump...so far nothing that points to a
smoking gun. But one thing that looks suspicious.

From the dump:
struct mapped_device {
....
....
/*
* A list of ios that arrived while we were suspended.
*/
atomic_t pending; = 0000000000001fa1
wait_queue_head_t wait;
struct bio_list deferred;  = 0000000000000000 0000000000000000
struct bio_list pushback;  = 0000000000000000 0000000000000000
...
....
}
I am still looking at the code if this is a valid/sane state. While pending
count is pretty high, NULLs are there in the deferred and pushback list.

In the next comment I will add complete crash analysis on how I came to this
data structure.

Comment 4 IBM Bug Proxy 2008-08-28 21:50:52 UTC
Here is the detailed analysis on how I came to the device_mapper data structure.

Took one of the stuck'd blast thread.

PID: 6840   TASK: ffff81014d141480  CPU: 1   COMMAND: "blast"
#0 [ffff8101330bb898] schedule at ffffffff8128565f
#1 [ffff8101330bb950] io_schedule at ffffffff81285b9c
#2 [ffff8101330bb970] sync_buffer at ffffffff810d21ca
#3 [ffff8101330bb980] __wait_on_bit at ffffffff81285e14
#4 [ffff8101330bb9c0] out_of_line_wait_on_bit at ffffffff81285eb4
#5 [ffff8101330bba30] __wait_on_buffer at ffffffff810d214a
#6 [ffff8101330bba40] sync_dirty_buffer at ffffffff810d2ee7
ffff8101330bba48: ffff81019dd42e98 ffff8102348272d0
ffff8101330bba58: ffff8101330bbaa8 ffffffff88025685
#7 [ffff8101330bba60] journal_dirty_data at ffffffff88025685
ffff8101330bba68: ffff8102348272d0 ffff81014c4cb800
ffff8101330bba78: ffff81019dd42e98 0000000000001000
ffff8101330bba88: ffff8102348272d0 ffff81019dd42df8
ffff8101330bba98: ffff81019dd42df8 ffff81019dd42df8
ffff8101330bbaa8: ffff8101330bbad8 ffffffff8803720c
#8 [ffff8101330bbab0] ext3_journal_dirty_data at ffffffff8803720c
#9 [ffff8101330bbae0] walk_page_buffers at ffffffff8803659b
#10 [ffff8101330bbb40] ext3_ordered_write_end at ffffffff88038dcd
#11 [ffff8101330bbba0] generic_file_buffered_write at ffffffff81085771
#12 [ffff8101330bbca0] __generic_file_aio_write_nolock at ffffffff81085eef
#13 [ffff8101330bbd50] generic_file_aio_write at ffffffff81085fbd
#14 [ffff8101330bbda0] ext3_file_write at ffffffff880352ae
#15 [ffff8101330bbdd0] do_sync_write at ffffffff810afd93
#16 [ffff8101330bbf10] vfs_write at ffffffff810b05d9
#17 [ffff8101330bbf40] sys_write at ffffffff810b0be9

crash> struct buffer_head ffff81019dd42df8
struct buffer_head {
b_state = 16413,
100000000011101
BH_Uptodate|BH_Lock|BH_Req|BH_Mapped
Indicating that, it is a locked buffer with valid disk mapping and has been
suubmitted for IO.

b_this_page = 0xffff81019dd42df8,
b_page = 0xffffe2000c8bc200,
b_blocknr = 61191,
b_size = 4096,
b_data = 0xffff8102174b0000 "w\003\035",
b_bdev = 0xffff81014f4b99c0,  ==> This is block_device
b_end_io = 0xffffffff810d36f2 <end_buffer_write_sync>,
b_private = 0xffff8101897081f0,
b_assoc_buffers = {
next = 0xffff81019dd42e40,
prev = 0xffff81019dd42e40
},
...
..
}

crash> struct block_device 0xffff81014f4b99c0
struct block_device {
bd_dev = 265289733,
bd_inode = 0xffff81014f4b9ab8,
bd_openers = 1,
....
...
bd_holder = 0xffffffff880505a0,
bd_holders = 2,
bd_holder_list = {
next = 0xffff81014f4b9a60,
prev = 0xffff81014f4b9a60
},
bd_contains = 0xffff81014f4b99c0,
bd_block_size = 4096,
bd_part = 0x0,
bd_part_count = 0,
bd_invalidated = 0,
bd_disk = 0xffff81014e0a2800, ==> This is the gendisk.
bd_list = {
next = 0xffff81014f430258,
prev = 0xffff81014f47a658
},
bd_inode_backing_dev_info = 0x0,
bd_private = 0
}

crash> struct gendisk 0xffff81014e0a2800
struct gendisk {
major = 253,
first_minor = 5,
minors = 1,
disk_name = "dm-5\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\
000\000\000\000\000\000\000\000\000\000\000\000", ==> My disk name.
part = 0x0,
part_uevent_suppress = 0,
fops = 0xffffffff881a87a0,
queue = 0xffff81014e4a1858, ==> This is the request_qaueue
private_data = 0xffff81014e174400,
capacity = 2097152,
flags = 16,
driverfs_dev = 0x0,
...
...
}

crash> struct request_queue 0xffff81014e4a1858
struct request_queue {
queue_head = {
next = 0x0,
prev = 0x0
},
....
....
make_request_fn = 0xffffffff8819dac3, <dm_request>
prep_rq_fn = 0,
unplug_fn = 0xffffffff8819d080, < dm_unplug_all >
merge_bvec_fn = 0,
prepare_flush_fn = 0,
softirq_done_fn = 0,
end_sector = 0,
boundary_rq = 0x0,
...
...
backing_dev_info = {
ra_pages = 32,
state = 0,
capabilities = 4,
congested_fn = 0xffffffff8819d02d,  <dm_any_congested>
congested_data = 0xffff81014e174400,
unplug_io_fn = 0xffffffff81127e85 <blk_backing_dev_unplug>,
unplug_io_data = 0xffff81014e4a1858,
bdi_stat = {{
lock = {
raw_lock = {
slock = 43433
},
break_lock = 0
},
count = 54880,
list = {
next = 0xffff81024d5a8158,
prev = 0xffff81014e4a1a78
},
counters = 0x7efeb0ecfebf
}, {
lock = {
raw_lock = {
slock = 57054
},
break_lock = 0
},
count = 2688,
list = {
next = 0xffff81014e4a1a50,
prev = 0xffff81014e4a1aa0
},
counters = 0x7efeb172233f
}},
completions = {
events = {
lock = {
raw_lock = {
slock = 39321
},
break_lock = 0
},
count = 169288,
list = {
next = 0xffff81014e4a1a78,
prev = 0xffff81024d95d2b0
},
counters = 0x7efeb1f4bebf
},
shift = 19,
period = 24903680,
lock = {
raw_lock = {
slock = 24158
},
break_lock = 0
}
},
dirty_exceeded = 0
},
queuedata = 0xffff81014e174400, ==> This is mapped_device
bounce_pfn = 2424832,
bounce_gfp = 16,
crash> struct mapped_device 0xffff81014e174400
struct: invalid data structure reference: mapped_device
crash>
crash> rd 0xffff81014e174400 100 // Intrpreting it as mapped_device
ffff81014e174400:  0000000000000000 0000000000005e5e   ........^^......
ffff81014e174410:  ffff81014e174410 ffff81014e174410   .D.N.....D.N....
ffff81014e174420:  ffff81014e174420 ffff81014e174420    D.N.... D.N....
ffff81014e174430:  0000000000000000 0000000000000000   ................
ffff81014e174440:  ffff81014e174440 ffff81014e174440   @D.N........
ffff81014e174450:  000000000000008c 0000000000000001   ................
ffff81014e174460:  0000000000000000 ffff81014e174468   ........hD.N....
ffff81014e174470:  ffff81014e174468 ffff81014e174478   hD.N....xD.N....
ffff81014e174480:  ffff81014e174478 0000000000000000   xD.N............
ffff81014e174490:  0000000000000000 ffff81014e174498   .........D.N....
ffff81014e1744a0:  ffff81014e174498 ffff81014e1744a8   .D.N.....D.N....
ffff81014e1744b0:  ffff81014e1744a8 0000000000000000   .D.N............
ffff81014e1744c0:  0000000000000000 0000000000000000   ................
ffff81014e1744d0:  0000000000006969 ffff81014e1744d8   ii.......D.N....
ffff81014e1744e0:  ffff81014e1744d8 ffff81014e1744e8   .D.N.....D.N....
ffff81014e1744f0:  ffff81014e1744e8 0000000000000000   .D.N............
ffff81014e174500:  0000000000000000 ffff81014e174508   .........E.N....
ffff81014e174510:  ffff81014e174508 000000000000008c   .E.N............
ffff81014e174520:  0000000000000000 0000000100000003   ................
ffff81014e174530:  0000000000000000 ffff81014e4a1858   ........X.JN....
^^^^^ request_queue
ffff81014e174540:  ffff81014e0a2800 000000353a333532   .(.N....253:5...
^^^^ gendisk              ^^^^ name
ffff81014e174550:  0000000000000000 ffff81014e0c5b40   ........@[.N....
^^^^^^^^^^^interface_ptr
ffff81014e174560:  0000000000001fa1 0000000000000000   ................
^^^^^^^^ pending?? this is a huge number.
ffff81014e174570:  ffff81014e174570 ffff81014e174570   pE.N....pE.N....
ffff81014e174580:  ffff81014e174580 ffff81014e174580   .E.N.....E.N....
ffff81014e174590:  0000000000000000 0000000000000000   ................
ffff81014e1745a0:  ffff81014e1745a0 ffff81014e1745a0   .E.N.....E.N....
^^^ wait (__wait_queue_head) ends here
ffff81014e1745b0:  0000000000000000 0000000000000000   ................
struct bio_list deferred;
ffff81014e1745c0:  0000000000000000 0000000000000000   ................
struct bio_list pushback;
ffff81014e1745d0:  ffff81014e56d200 ffff81014e99c480   ..VN.......N....
^^^^ map (struct dm_table *map;)
ffff81014e1745e0:  ffff81014e99c080 ffff81014e8dd3c0   ...N.......N....
^^ *bs(bio_set)
ffff81014e1745f0:  0000000000000001 0000000000000000   ................
ffff81014e174600:  ffff81014e174600 ffff81014e174600   .F.N.....F.N....
ffff81014e174610:  ffff81014e174610 ffff81014e174610   .F.N.....F.N....
ffff81014e174620:  0000000000000000 0000000000000000   ................
ffff81014e174630:  ffff81024a9a3c90 ffff81024a9a3c90   .<.J.....<.J....
ffff81014e174640:  0000000000000000 ffff81014e174648   ........HF.N....
ffff81014e174650:  ffff81014e174648 0000000000000000   HF.N............
ffff81014e174660:  ffff81014e174660 ffff81014e174660   `F.N....`F.N....
ffff81014e174670:  ffff81014e174670 ffff81014e174670   pF.N....pF.N....

Comment 5 IBM Bug Proxy 2008-09-02 14:34:15 UTC
Changed the default IO scheduler from "cfq" to "anticipatory" and started the
blast on mpath devices. It is running for the last 11 hours. In this
configuration with "cfq" hung the machine consistently within 3 hours.

I can see machine response is slow; but is responsive and blast is running. I
will let it run....


(In reply to comment #60)
> Changed the default IO scheduler from "cfq" to "anticipatory" and started the
> blast on mpath devices. It is running for the last 11 hours. In this
> configuration with "cfq" hung the machine consistently within 3 hours.
>

What was Ming Ming's rationale for switching to the anticipatory I/O scheduler?


(In reply to comment #61)
> (In reply to comment #60)
> > Changed the default IO scheduler from "cfq" to "anticipatory" and started the
> > blast on mpath devices. It is running for the last 11 hours. In this
> > configuration with "cfq" hung the machine consistently within 3 hours.
> >
>
>
> What was Ming Ming's rationale for switching to the anticipatory I/O scheduler?

CFQ is relatively new. Became default from 2.6.18.
So the idea is to try anticipatory which was the default one before CFQ...and is
supposed to be a stable one.

BTW, tests are still running fine.


ext3 on mpath with AS io scheduler ran longer .. but finally hung the machine
after 15 hours of the run.


I am trying to reproduce this bug on non-rt system. Compiled 2.6.26-3 kernel;
booted-up; and started the blast. Tests are running for 60+ hours.

I will be stopping these tests sometime soon...Looks like the bug is only on RT
system or.. may be it is not getting exposed on non-rt systems.


Blast on non-rt kernel ran for 75 hours. Stopped the tests.

As per Mike's request, I will be collecting dump on rt kernel tonight.


On non-rt kernel, I tried to stop blast. 1 out of 4 threads did not end
properly... and on the console:

INFO: task blast:10604 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ffff81022b54ba28 0000000000000082 0000000000000000 ffff810170e6cf18
ffff81024d17d340 ffff81024faa2b70 ffff81024d17d578 0000000300000001
00000000ffffffff 0000000000000003 0000000000000000 0000000000000000
Call Trace:
[<ffffffff802f6aab>] log_wait_commit+0x9f/0xed
[<ffffffff8023d90a>] autoremove_wake_function+0x0/0x2e
[<ffffffff802f5605>] log_do_checkpoint+0xdf/0x348
[<ffffffff8029c3ae>] __find_get_block+0x148/0x158
[<ffffffff802f2985>] journal_stop+0x181/0x18d
[<ffffffff8034c569>] __next_cpu+0x19/0x26
[<ffffffff80225401>] find_busiest_group+0x25e/0x6c6
[<ffffffff80209df6>] __switch_to+0x2ea/0x2f9
[<ffffffff8055a09f>] thread_return+0x3d/0x9c
[<ffffffff80234ce2>] lock_timer_base+0x26/0x4b
[<ffffffff802f5944>] __log_wait_for_space+0x7d/0xa1
[<ffffffff802f304a>] start_this_handle+0x2c3/0x304
[<ffffffff8023d90a>] autoremove_wake_function+0x0/0x2e
[<ffffffff8029199d>] mntput_no_expire+0x20/0x107
[<ffffffff802f3126>] journal_start+0x9b/0xd2
[<ffffffff802ea2f8>] ext3_create+0x30/0xe9
[<ffffffff8028d77b>] d_alloc+0x15b/0x1a8
[<ffffffff802850fb>] vfs_create+0x75/0xba
[<ffffffff80287a13>] do_filp_open+0x1dd/0x773
[<ffffffff8023d90a>] autoremove_wake_function+0x0/0x2e
[<ffffffff802e2835>] ext3_discard_reservation+0x52/0x61
[<ffffffff8034c5f9>] _atomic_dec_and_lock+0x39/0x54
[<ffffffff8027c4d6>] do_sys_open+0x46/0xca
[<ffffffff8020b07b>] system_call_after_swapgs+0x7b/0x80

After sometime blast killed itself for the timeout with the message:

Waiting for tests to end.
BLAST Ended on /SAN/test2 RC = 0 at  09/01/2008 21:46:38
*=== Run statistics ============================================*
| Elapsed time =  3 days, 3 hours, 38 minutes and 20 seconds
*===============================================================*
BLAST Ended on /SAN/test4 RC = 0 at  09/01/2008 21:46:42
*=== Run statistics ============================================*
| Elapsed time =  3 days, 3 hours, 38 minutes and 13 seconds
*===============================================================*
BLAST Ended on /SAN/test3 RC = 0 at  09/01/2008 21:47:21
*=== Run statistics ============================================*
| Elapsed time =  3 days, 3 hours, 38 minutes and 58 seconds
*===============================================================*
q

Forcing test termination
issuing KILL for thread 1120500032
Killed


Ran blast on the 2.6.24-ibmrt2.6 with the "noop" io scheduler... at some point
/SAN/test1 (pointing at /dev/dm-6) went "read-only"...

device-mapper: multipath: Failing path 8:32.
end_request: I/O error, dev sdh, sector 9910550
device-mapper: multipath: Failing path 8:112.
printk: 8 messages suppressed.
Buffer I/O error on device dm-4, logical block 12991
lost page write due to I/O error on dm-4
Aborting journal on device dm-4.
Buffer I/O error on device dm-4, logical block 841
lost page write due to I/O error on dm-4
Buffer I/O error on device dm-4, logical block 1179668
lost page write due to I/O error on dm-4
Buffer I/O error on device dm-4, logical block 1179669
lost page write due to I/O error on dm-4
Buffer I/O error on device dm-4, logical block 1179670
lost page write due to I/O error on dm-4
Buffer I/O error on device dm-4, logical block 1179671
lost page write due to I/O error on dm-4
ext3_abort called.
EXT3-fs error (device dm-4): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only

... this could be because the system is running without dm-mp enabled...

Comment 6 IBM Bug Proxy 2008-09-02 23:30:40 UTC
I tried configuring bonnie to do something like blast using the following script:

#!/bin/sh
set -x
FILES_X1024=100
NUM_DIRS=1
MAX_SIZE=524288
MIN_SIZE=1024
BONNIE_LOOPS=10
BONNIE_USER="-u rtuser"
BONNIE_ARGS="-x $BONNIE_LOOPS -s 0 -n $FILES_X1024:$MAX_SIZE:$MIN_SIZE:$NUM_DIRS"

rm -rf b{1,2,3,4,5,6,7,8}
mkdir b{1,2,3,4,5,6,7,8}
if [ ! -d logs ]; then
mkdir logs
fi
chmod a+rwx b{1,2,3,4,5,6,7,8} logs

bonnie++ $BONNIE_USER -p8 -x$BONNIE_LOOPS
for ((i=1;i<9;i++)); do
bonnie++ -y -d b$i $BONNIE_USER $BONNIE_ARGS > logs/`hostname`-`date
+%Y%M%d-%H:%M:%S`-bonnie++-b$i.log &
done
wait

It ran until the disk was full, but no hangs.  I didn't have separate
partitions.  I am going to need to repartition a /test partition I think in
order to be able to test multiple partitions, or maybe stick this spare drive in
my cube into 102... I think I'll try that tomorrow.

Comment 7 IBM Bug Proxy 2008-09-03 01:20:39 UTC
Currently running iozone on elm3c19, using both / and /test.

Comment 8 IBM Bug Proxy 2008-09-03 05:50:38 UTC
Reproduced on elm3c29 on /dev/sd devices. good the dump. Moving the vmcore to KBIC.

Comment 9 IBM Bug Proxy 2008-09-03 14:41:54 UTC
I have saved the core on KBIC.
Defect is reproduced directly on /dev/sd devices. This should make the debugging
process easier as we are avoiding multipath layers.

Dump can be accessed/invoked on KBIC in the following way:

jvrao@kernel:~/bz46744/elm3c29$ pwd -P
/home/services/opensource/realtime/kdump_cores/bz46744/elm3c29
jvrao@kernel:~/bz46744/elm3c29$ ./crash ./vmlinux 2008-09-02-22\:25/vmcore

Please contact me if you can't access the dump for any reasons.

Comment 10 IBM Bug Proxy 2008-09-03 15:50:56 UTC
Started analyzing the elm3c29 dump. I am updating my findings to the following file:

/home/services/opensource/realtime/kdump_cores/bz46744/elm3c29/jv-bt-all
If anyone interested, please feel free to look into it.

Comment 11 IBM Bug Proxy 2008-09-04 16:21:47 UTC
I reviewed the 2008-09-02-22:25/vmcore
It appears that the scsi host adapter is in a recovery state post a number of
IOs timing out. There are still some questions outstanding.
- Why did the IOs timeout. Since it was indicated that a failure has been
seen on non-san configurations we may need to look at a non-san failure vmcore
to determine if signatures are similar to help focus on the cause of the timeouts.
- The qla2x00_issue_iocb function is not returning during the abort
sequence. I appears to be in a mod_timer call, but that does match the source I
have so I will need a pointer to the source tree for this vmcore.

I copied a script file and edited version of the crash info used to the url
shown below.
http://kernel.beaverton.ibm.com/~andmike/bugs/ibmbz_46744/

Comment 12 IBM Bug Proxy 2008-09-05 16:10:39 UTC
Yesterday I gave code access to Mike. I placed the source on KBIC..

/home/jvrao/public/linux-2.6.24.7-ibmrt2.6-view

Comment 13 IBM Bug Proxy 2008-09-06 00:50:41 UTC
The box I was running blast on with local disks hung, but I have been unable to
collect a dump via sysrq.  JV pressed the NMI, but the system did not respond.
I wasn't able to raise anyone to press the button late this afternoon.  Given
that, I am planning to reboot the machine and kick the tests off again.

On another note, I was able to crash a SAN machine using bonnie++ instead of
blast with mixed_load.sh as a background load.  I collected the crash and
uploaded to kbic here (IBM only for now, I can share if others are interested):

http://kernel.beaverton.ibm.com/realtime/kdump_cores/bz46744/elm3c29/bonnie-crash/

A README and the script I used to execute bonnie++ are there as well.

Comment 14 IBM Bug Proxy 2008-09-08 06:00:41 UTC
On Live machine/crash confirmed that scsi_eh_0 is not making progress.
Waiting on the semaphore. This analysis shows how I tracked down that semaphore.
Still need to figure out who is holding that semaphore.

crash> bt 524
PID: 524    TASK: ffff81007eb494c0  CPU: 0   COMMAND: "scsi_eh_0"
#0 [ffff81007ebcfb10] schedule at ffffffff8127a86c
#1 [ffff81007ebcfbc8] __compat_down at ffffffff8127c6d3
#2 [ffff81007ebcfbf8] lock_timer_base at ffffffff81045f7b
#3 [ffff81007ebcfc28] __compat_down_failed at ffffffff8127c329
#4 [ffff81007ebcfc78] qla2x00_mailbox_command at ffffffff880a44fc
#5 [ffff81007ebcfd28] qla2x00_issue_iocb at ffffffff880a5add
#6 [ffff81007ebcfdb8] qla24xx_abort_command at ffffffff880a63e3
#7 [ffff81007ebcfe08] qla2xxx_eh_abort at ffffffff8809d127
#8 [ffff81007ebcfe78] __scsi_try_to_abort_cmd at ffffffff88056f31
#9 [ffff81007ebcfe88] scsi_error_handler at ffffffff8805870d
#10 [ffff81007ebcff28] kthread at ffffffff8105144f
#11 [ffff81007ebcff48] kernel_thread at ffffffff8100d048
crash>
crash> bt -f 524
PID: 524    TASK: ffff81007eb494c0  CPU: 0   COMMAND: "scsi_eh_0"
#0 [ffff81007ebcfb10] schedule at ffffffff8127a86c
...
#3 [ffff81007ebcfc28] __compat_down_failed at ffffffff8127c329
ffff81007ebcfc30: ffff81007ff97e90 0000000000000000
ffff81007ebcfc40: 0000000000000000 000000000000752f
ffff81007ebcfc50: ffff81007eb494c0 0000000000000000
ffff81007ebcfc60: ffff81007eb494c0 ffff81007ebcfca0
ffff81007ebcfc70: ffff81014ece69d0 ffffffff880a44fc
"include/linux/semaphore.h" down((struct compat_semaphore *sem)
#4 [ffff81007ebcfc78] qla2x00_mailbox_command at ffffffff880a44fc
ffff81007ebcfc80: 0000000000000000 0000000000000000
ffff81007ebcfc90: 0000000000000000 0054000100000000
ffff81007ebcfca0: 0000000000000000 0000000000200200
ffff81007ebcfcb0: 0000000102ec7256 ffffffff880a42e8
ffff81007ebcfcc0: ffff81014ece69d0 ffffffff81534700
........................^^^^^^^^^^^ sema
ffff81007ebcfcd0: ffffffff880a44b2 5f68655f69736373
ffff81007ebcfce0: 0000000000000030 000000000000020c
ffff81007ebcfcf0: ffff810009005a40 ffff81014e9d2200
ffff81007ebcfd00: ffff81014ece4610 ffff81014e9d2200
ffff81007ebcfd10: 0000000000000332 ffff81014d0f94c0
ffff81007ebcfd20: ffff81007ebcfdb0 ffffffff880a5add
#5 [ffff81007ebcfd28] qla2x00_issue_iocb at ffffffff880a5add
ffff81007ebcfd30: 00000005000000cf 22004e9d00000054
ffff81007ebcfd40: 000100007ebcfdb0 ffffffff811b6bdb
ffff81007ebcfd50: 0000000100000000 0000000000000000
ffff81007ebcfd60: 0000000000000000 ffffffffffffffff
ffff81007ebcfd70: 0000000000000000 0000000000000000
ffff81007ebcfd80: 0000000000000000 ffff81000000001e
ffff81007ebcfd90: ffff81014e8b8180 ffff81014e9d2200
ffff81007ebcfda0: ffff81014e8b8180 ffff81014ece4610
.......................................^^^^^^^^^^^^^^scsi_qla_host_t
ffff81007ebcfdb0: ffff81007ebcfe00 ffffffff880a63e3

crash> struct scsi_qla_host ffff81014ece4610|grep host_str
host_str = "qla2xxx_0\000\000\000\000\000\000",
crash> struct scsi_qla_host ffff81014ece4610|grep model_number
model_number = "QMI2472\000\000\000\000\000\000\000\000\000",

crash> struct scsi_qla_host ffff81014ece4610|grep parent
parent = 0x0,  // NO parent.

outstanding_cmds = {0x0, 0xffff810148028300, 0xffff81014d0f9cc0,
0xffff8100633e2dc0, 0xffff8100789abe00, 0xffff81014d0f9700, 0xffff8100789aba00,
0xffff81014d0f9480, 0xffff8100633e2900, 0xffff81014d0f9e00, 0xffff8100a95916c0,
0xffff8100a9591640, 0xffff81014e1bc340, 0xffff8101480284c0, 0xffff810148028fc0,
0xffff8100a9591380, 0xffff81014d0f91c0, 0xffff81014d0f9380, 0xffff8100a9591dc0,
0xffff81014e1bc2c0, 0xffff81014e1bc200, 0xffff8100345c1d00, 0xffff8100789ab880,
0xffff8100789abdc0, 0xffff810079056a40, 0xffff8100345c1680, 0xffff8100633e2a80,
0xffff81014d0f96c0, 0xffff8101480286c0, 0xffff81014d0f9980, 0xffff810148028d00,
0xffff810148028900, 0xffff81014d0f9280, 0xffff810148028600, 0xffff81014d0f95c0,
0xffff81014e1bc480, 0xffff8100a95913c0, 0xffff81014d0f9fc0, 0xffff81014d0f9900,
0xffff8100633e2f80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...},
current_outstanding_cmd = 39, <= Outstanding Commands.

crash> struct srb 0xffff810148028300
struct srb {
ha = 0xffff81014ece4610,   // Validation -> pointing back to scsi_qla_host.
Confirmed that we are looking at valid structure.
fcport = 0xffff81014e8b8180,
cmd = 0xffff81014a539940,
flags = 2,
dma_handle = 18446744071579714837,
request_sense_length = 0,
request_sense_ptr = 0x0
}

crash> struct scsi_qla_host.mbx_intr_sem ffff81014ece4610
mbx_intr_sem = {
count = {
counter = 0
},
sleepers = 1,
wait = {
lock = {
lock = {
wait_lock = {
raw_lock = {
slock = 0
},
break_lock = 0
},
wait_list = {
prio_list = {
next = 0xffff81014ece69e0,
prev = 0xffff81014ece69e0
},
node_list = {
next = 0xffff81014ece69f0,
prev = 0xffff81014ece69f0
}
},
owner = 0x0
},
break_lock = 0
},
task_list = {
next = 0xffff81007ebcfbe8,
prev = 0xffff81007ebcfbe8
}
}
},

Apparently one this process is sleeping. Not sure if there is any easy way
of finding who is holding this semaphore. Any help on how to find that out?


(In reply to comment #77)
> On Live machine/crash confirmed that scsi_eh_0 is not making progress.
> Waiting on the semaphore. This analysis shows how I tracked down that semaphore.
> Still need to figure out who is holding that semaphore.
>
> crash> bt 524
> PID: 524    TASK: ffff81007eb494c0  CPU: 0   COMMAND: "scsi_eh_0"
>  #0 [ffff81007ebcfb10] schedule at ffffffff8127a86c
>  #1 [ffff81007ebcfbc8] __compat_down at ffffffff8127c6d3
>  #2 [ffff81007ebcfbf8] lock_timer_base at ffffffff81045f7b
>  #3 [ffff81007ebcfc28] __compat_down_failed at ffffffff8127c329
>  #4 [ffff81007ebcfc78] qla2x00_mailbox_command at ffffffff880a44fc
>  #5 [ffff81007ebcfd28] qla2x00_issue_iocb at ffffffff880a5add
>  #6 [ffff81007ebcfdb8] qla24xx_abort_command at ffffffff880a63e3
>  #7 [ffff81007ebcfe08] qla2xxx_eh_abort at ffffffff8809d127
>  #8 [ffff81007ebcfe78] __scsi_try_to_abort_cmd at ffffffff88056f31
>  #9 [ffff81007ebcfe88] scsi_error_handler at ffffffff8805870d
> #10 [ffff81007ebcff28] kthread at ffffffff8105144f
> #11 [ffff81007ebcff48] kernel_thread at ffffffff8100d048
> crash>
> crash> bt -f 524
> PID: 524    TASK: ffff81007eb494c0  CPU: 0   COMMAND: "scsi_eh_0"
>  #0 [ffff81007ebcfb10] schedule at ffffffff8127a86c
> ...
>  #3 [ffff81007ebcfc28] __compat_down_failed at ffffffff8127c329
>     ffff81007ebcfc30: ffff81007ff97e90 0000000000000000
>     ffff81007ebcfc40: 0000000000000000 000000000000752f
>     ffff81007ebcfc50: ffff81007eb494c0 0000000000000000
>     ffff81007ebcfc60: ffff81007eb494c0 ffff81007ebcfca0
>     ffff81007ebcfc70: ffff81014ece69d0 ffffffff880a44fc
> "include/linux/semaphore.h" down((struct compat_semaphore *sem)
>  #4 [ffff81007ebcfc78] qla2x00_mailbox_command at ffffffff880a44fc
>     ffff81007ebcfc80: 0000000000000000 0000000000000000
>     ffff81007ebcfc90: 0000000000000000 0054000100000000
>     ffff81007ebcfca0: 0000000000000000 0000000000200200
>     ffff81007ebcfcb0: 0000000102ec7256 ffffffff880a42e8
>     ffff81007ebcfcc0: ffff81014ece69d0 ffffffff81534700
> ........................^^^^^^^^^^^ sema
>     ffff81007ebcfcd0: ffffffff880a44b2 5f68655f69736373
>     ffff81007ebcfce0: 0000000000000030 000000000000020c
>     ffff81007ebcfcf0: ffff810009005a40 ffff81014e9d2200
>     ffff81007ebcfd00: ffff81014ece4610 ffff81014e9d2200
>     ffff81007ebcfd10: 0000000000000332 ffff81014d0f94c0
>     ffff81007ebcfd20: ffff81007ebcfdb0 ffffffff880a5add
>  #5 [ffff81007ebcfd28] qla2x00_issue_iocb at ffffffff880a5add
>     ffff81007ebcfd30: 00000005000000cf 22004e9d00000054
>     ffff81007ebcfd40: 000100007ebcfdb0 ffffffff811b6bdb
>     ffff81007ebcfd50: 0000000100000000 0000000000000000
>     ffff81007ebcfd60: 0000000000000000 ffffffffffffffff
>     ffff81007ebcfd70: 0000000000000000 0000000000000000
>     ffff81007ebcfd80: 0000000000000000 ffff81000000001e
>     ffff81007ebcfd90: ffff81014e8b8180 ffff81014e9d2200
>     ffff81007ebcfda0: ffff81014e8b8180 ffff81014ece4610
> .......................................^^^^^^^^^^^^^^scsi_qla_host_t
>     ffff81007ebcfdb0: ffff81007ebcfe00 ffffffff880a63e3
>
> crash> struct scsi_qla_host ffff81014ece4610|grep host_str
>   host_str = "qla2xxx_0\000\000\000\000\000\000",
> crash> struct scsi_qla_host ffff81014ece4610|grep model_number
>   model_number = "QMI2472\000\000\000\000\000\000\000\000\000",
>
> crash> struct scsi_qla_host ffff81014ece4610|grep parent
>   parent = 0x0,  // NO parent.
>
>
>  outstanding_cmds = {0x0, 0xffff810148028300, 0xffff81014d0f9cc0,
> 0xffff8100633e2dc0, 0xffff8100789abe00, 0xffff81014d0f9700, 0xffff8100789aba00,
> 0xffff81014d0f9480, 0xffff8100633e2900, 0xffff81014d0f9e00, 0xffff8100a95916c0,
> 0xffff8100a9591640, 0xffff81014e1bc340, 0xffff8101480284c0, 0xffff810148028fc0,
> 0xffff8100a9591380, 0xffff81014d0f91c0, 0xffff81014d0f9380, 0xffff8100a9591dc0,
> 0xffff81014e1bc2c0, 0xffff81014e1bc200, 0xffff8100345c1d00, 0xffff8100789ab880,
> 0xffff8100789abdc0, 0xffff810079056a40, 0xffff8100345c1680, 0xffff8100633e2a80,
> 0xffff81014d0f96c0, 0xffff8101480286c0, 0xffff81014d0f9980, 0xffff810148028d00,
> 0xffff810148028900, 0xffff81014d0f9280, 0xffff810148028600, 0xffff81014d0f95c0,
> 0xffff81014e1bc480, 0xffff8100a95913c0, 0xffff81014d0f9fc0, 0xffff81014d0f9900,
> 0xffff8100633e2f80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...},
>   current_outstanding_cmd = 39, <= Outstanding Commands.
>
> crash> struct srb 0xffff810148028300
> struct srb {
>   ha = 0xffff81014ece4610,   // Validation -> pointing back to scsi_qla_host.
> Confirmed that we are looking at valid structure.
>   fcport = 0xffff81014e8b8180,
>   cmd = 0xffff81014a539940,
>   flags = 2,
>   dma_handle = 18446744071579714837,
>   request_sense_length = 0,
>   request_sense_ptr = 0x0
> }
>
> crash> struct scsi_qla_host.mbx_intr_sem ffff81014ece4610
>   mbx_intr_sem = {
>     count = {
>       counter = 0
>     },
>     sleepers = 1,
>     wait = {
>       lock = {
>         lock = {
>           wait_lock = {
>             raw_lock = {
>               slock = 0
>             },
>             break_lock = 0
>           },
>           wait_list = {
>             prio_list = {
>               next = 0xffff81014ece69e0,
>               prev = 0xffff81014ece69e0
>             },
>             node_list = {
>               next = 0xffff81014ece69f0,
>               prev = 0xffff81014ece69f0
>             }
>           },
>           owner = 0x0
>         },
>         break_lock = 0
>       },
>       task_list = {
>         next = 0xffff81007ebcfbe8,
>         prev = 0xffff81007ebcfbe8
>       }
>     }
>   },
>
> Apparently one this process is sleeping. Not sure if there is any easy way
> of finding who is holding this semaphore. Any help on how to find that out?
>
>
>
>
>

(In reply to comment #77)
> On Live machine/crash confirmed that scsi_eh_0 is not making progress.
> Waiting on the semaphore. This analysis shows how I tracked down that semaphore.
> Still need to figure out who is holding that semaphore.
>
> crash> bt 524
> PID: 524    TASK: ffff81007eb494c0  CPU: 0   COMMAND: "scsi_eh_0"
>  #0 [ffff81007ebcfb10] schedule at ffffffff8127a86c
>  #1 [ffff81007ebcfbc8] __compat_down at ffffffff8127c6d3
>  #2 [ffff81007ebcfbf8] lock_timer_base at ffffffff81045f7b
>  #3 [ffff81007ebcfc28] __compat_down_failed at ffffffff8127c329
>  #4 [ffff81007ebcfc78] qla2x00_mailbox_command at ffffffff880a44fc
>  #5 [ffff81007ebcfd28] qla2x00_issue_iocb at ffffffff880a5add
>  #6 [ffff81007ebcfdb8] qla24xx_abort_command at ffffffff880a63e3
>  #7 [ffff81007ebcfe08] qla2xxx_eh_abort at ffffffff8809d127
>  #8 [ffff81007ebcfe78] __scsi_try_to_abort_cmd at ffffffff88056f31
>  #9 [ffff81007ebcfe88] scsi_error_handler at ffffffff8805870d
> #10 [ffff81007ebcff28] kthread at ffffffff8105144f
> #11 [ffff81007ebcff48] kernel_thread at ffffffff8100d048
> crash>
> crash> bt -f 524
> PID: 524    TASK: ffff81007eb494c0  CPU: 0   COMMAND: "scsi_eh_0"
>  #0 [ffff81007ebcfb10] schedule at ffffffff8127a86c
> ...
>  #3 [ffff81007ebcfc28] __compat_down_failed at ffffffff8127c329
>     ffff81007ebcfc30: ffff81007ff97e90 0000000000000000
>     ffff81007ebcfc40: 0000000000000000 000000000000752f
>     ffff81007ebcfc50: ffff81007eb494c0 0000000000000000
>     ffff81007ebcfc60: ffff81007eb494c0 ffff81007ebcfca0
>     ffff81007ebcfc70: ffff81014ece69d0 ffffffff880a44fc
> "include/linux/semaphore.h" down((struct compat_semaphore *sem)
>  #4 [ffff81007ebcfc78] qla2x00_mailbox_command at ffffffff880a44fc
>     ffff81007ebcfc80: 0000000000000000 0000000000000000
>     ffff81007ebcfc90: 0000000000000000 0054000100000000
>     ffff81007ebcfca0: 0000000000000000 0000000000200200
>     ffff81007ebcfcb0: 0000000102ec7256 ffffffff880a42e8
>     ffff81007ebcfcc0: ffff81014ece69d0 ffffffff81534700
> ........................^^^^^^^^^^^ sema
>     ffff81007ebcfcd0: ffffffff880a44b2 5f68655f69736373
>     ffff81007ebcfce0: 0000000000000030 000000000000020c
>     ffff81007ebcfcf0: ffff810009005a40 ffff81014e9d2200
>     ffff81007ebcfd00: ffff81014ece4610 ffff81014e9d2200
>     ffff81007ebcfd10: 0000000000000332 ffff81014d0f94c0
>     ffff81007ebcfd20: ffff81007ebcfdb0 ffffffff880a5add
>  #5 [ffff81007ebcfd28] qla2x00_issue_iocb at ffffffff880a5add
>     ffff81007ebcfd30: 00000005000000cf 22004e9d00000054
>     ffff81007ebcfd40: 000100007ebcfdb0 ffffffff811b6bdb
>     ffff81007ebcfd50: 0000000100000000 0000000000000000
>     ffff81007ebcfd60: 0000000000000000 ffffffffffffffff
>     ffff81007ebcfd70: 0000000000000000 0000000000000000
>     ffff81007ebcfd80: 0000000000000000 ffff81000000001e
>     ffff81007ebcfd90: ffff81014e8b8180 ffff81014e9d2200
>     ffff81007ebcfda0: ffff81014e8b8180 ffff81014ece4610
> .......................................^^^^^^^^^^^^^^scsi_qla_host_t
>     ffff81007ebcfdb0: ffff81007ebcfe00 ffffffff880a63e3
>
> crash> struct scsi_qla_host ffff81014ece4610|grep host_str
>   host_str = "qla2xxx_0\000\000\000\000\000\000",
> crash> struct scsi_qla_host ffff81014ece4610|grep model_number
>   model_number = "QMI2472\000\000\000\000\000\000\000\000\000",
>
> crash> struct scsi_qla_host ffff81014ece4610|grep parent
>   parent = 0x0,  // NO parent.
>
>
>  outstanding_cmds = {0x0, 0xffff810148028300, 0xffff81014d0f9cc0,
> 0xffff8100633e2dc0, 0xffff8100789abe00, 0xffff81014d0f9700, 0xffff8100789aba00,
> 0xffff81014d0f9480, 0xffff8100633e2900, 0xffff81014d0f9e00, 0xffff8100a95916c0,
> 0xffff8100a9591640, 0xffff81014e1bc340, 0xffff8101480284c0, 0xffff810148028fc0,
> 0xffff8100a9591380, 0xffff81014d0f91c0, 0xffff81014d0f9380, 0xffff8100a9591dc0,
> 0xffff81014e1bc2c0, 0xffff81014e1bc200, 0xffff8100345c1d00, 0xffff8100789ab880,
> 0xffff8100789abdc0, 0xffff810079056a40, 0xffff8100345c1680, 0xffff8100633e2a80,
> 0xffff81014d0f96c0, 0xffff8101480286c0, 0xffff81014d0f9980, 0xffff810148028d00,
> 0xffff810148028900, 0xffff81014d0f9280, 0xffff810148028600, 0xffff81014d0f95c0,
> 0xffff81014e1bc480, 0xffff8100a95913c0, 0xffff81014d0f9fc0, 0xffff81014d0f9900,
> 0xffff8100633e2f80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...},
>   current_outstanding_cmd = 39, <= Outstanding Commands.
>
> crash> struct srb 0xffff810148028300
> struct srb {
>   ha = 0xffff81014ece4610,   // Validation -> pointing back to scsi_qla_host.
> Confirmed that we are looking at valid structure.
>   fcport = 0xffff81014e8b8180,
>   cmd = 0xffff81014a539940,
>   flags = 2,
>   dma_handle = 18446744071579714837,
>   request_sense_length = 0,
>   request_sense_ptr = 0x0
> }
>
> crash> struct scsi_qla_host.mbx_intr_sem ffff81014ece4610
>   mbx_intr_sem = {
>     count = {
>       counter = 0
>     },
>     sleepers = 1,
>     wait = {
>       lock = {
>         lock = {
>           wait_lock = {
>             raw_lock = {
>               slock = 0
>             },
>             break_lock = 0
>           },
>           wait_list = {
>             prio_list = {
>               next = 0xffff81014ece69e0,
>               prev = 0xffff81014ece69e0
>             },
>             node_list = {
>               next = 0xffff81014ece69f0,
>               prev = 0xffff81014ece69f0
>             }
>           },
>           owner = 0x0
>         },
>         break_lock = 0
>       },
>       task_list = {
>         next = 0xffff81007ebcfbe8,
>         prev = 0xffff81007ebcfbe8
>       }
>     }
>   },
>
> Apparently one this process is sleeping. Not sure if there is any easy way
> of finding who is holding this semaphore. Any help on how to find that out?
>
>
>
>
>

forgot to add:

crash> struct scsi_qla_host.mbx_intr_sem
struct scsi_qla_host {
[9152] struct compat_semaphore mbx_intr_sem;
}
ffff81014ece4610+23C0 = ffff81014ece69d0 -> Semaphore location in the structure.
We can see this pointer above on the stack (marked as "sema")

Comment 15 IBM Bug Proxy 2008-09-09 04:30:38 UTC
crash> struct scsi_qla_host ffff81014ece4610|grep mbx_cmd_flags
mbx_cmd_flags = 4,

unsigned long   mbx_cmd_flags;
#define MBX_INTERRUPT   1
#define MBX_INTR_WAIT   2
#define MBX_UPDATE_FLASH_ACTIVE 3

From the stack trace MBX_INTR_WAIT should be set..indicating that we are waiting
for the interrupt. But the dump is showing the flag value MBX_UPDATE_FLASH_ACTIVE.

If we look at the interrupt handler:

if (test_bit(MBX_INTR_WAIT, &ha->mbx_cmd_flags) &&
(status & MBX_INTERRUPT) && ha->flags.mbox_int) {
set_bit(MBX_INTERRUPT, &ha->mbx_cmd_flags);
up(&ha->mbx_intr_sem);
}
Hence interrupt handler wont call up() if this bit is not set to MBX_INTR_WAIT.
This explains why the io_done is not being called..and why all IOs are waiting.

But we need to figure out how and why this bit is set?

Another interesting factor is:

Looking at the code, MBX_UPDATE_FLASH_ACTIVE is set in suspend_hba() but if the
code goes to that path, interrupts_on should be set to 0. But the dump tells
otherwise.
crash> struct scsi_qla_host ffff81014ece4610|grep interrupts_on
interrupts_on = 1 '\001',

So it is puzzeling how the mbx_cmd_flags is set to MBX_UPDATE_FLASH_ACTIVE.

Another minor mistake(no problem in the functionality) found in the code:

tmp_intr_timer.data = (unsigned long)&ha->mbx_intr_sem;
tmp_intr_timer.expires = jiffies + mcp->tov * HZ;
tmp_intr_timer.function =
(void (*)(unsigned long))qla2x00_mbx_sem_timeout;

where mbx_intr_sem is compact_semaphore
static void
qla2x00_mbx_sem_timeout(unsigned long data)
{
struct semaphore        *sem_ptr = (struct semaphore *)data;

this data is typecasted to (struct semaphore)

Comment 16 IBM Bug Proxy 2008-09-10 06:00:37 UTC
Today's Progress:

crash> struct scsi_qla_host.mbx_intr_sem ffff81007d2a8610
mbx_intr_sem = {
count = {
counter = 0  <<< counter should be < 0 if anyone is waiting.
},
sleepers = 1, <<< Indicates that one process is waiting.
<<< This is inconsistent. It means, either we missed the wakeup or a wakeup is
<<<< missed...or some kind of race between down() and up()
wait = {
lock = {
lock = {
wait_lock = {
raw_lock = {
slock = 0
},
break_lock = 0
},
wait_list = {
prio_list = {
next = 0xffff81007d2aa9e0,
prev = 0xffff81007d2aa9e0
},
node_list = {
next = 0xffff81007d2aa9f0,
prev = 0xffff81007d2aa9f0
}
},
owner = 0x0
},
break_lock = 0
},
task_list = {
next = 0xffff81014d021be8,
prev = 0xffff81014d021be8
}
}
},

crash> struct __wait_queue.task_list
struct __wait_queue {
[24] struct list_head task_list;
}

0xffff81014d021be8 - 0x18

crash> struct __wait_queue ffff81014d021bd0
struct __wait_queue {
flags = 1,
private = 0xffff81014d6423c0, <<< This is the task_struct of the thread
waiting on the semaphore. So the thread in question , scsi_eh_1

func = 0xffffffff81034f27 <default_wake_function>,
task_list = {
next = 0xffff81007d2aaa10,
prev = 0xffff81007d2aaa10
}
}

PID: 544    TASK: ffff81014d6423c0  CPU: 3   COMMAND: "scsi_eh_1"
#0 [ffff81014d021b20] schedule at ffffffff8129f4fe
#1 [ffff81014d021bc8] __compat_down at ffffffff812a14c6
#2 [ffff81014d021c28] __compat_down_failed at ffffffff812a108d
#3 [ffff81014d021c78] qla2x00_mailbox_command at ffffffff880a56ae
#4 [ffff81014d021d28] qla2x00_issue_iocb at ffffffff880a6d11
#5 [ffff81014d021db8] qla24xx_abort_command at ffffffff880a7644
#6 [ffff81014d021e08] qla2xxx_eh_abort at ffffffff8809e1e0
#7 [ffff81014d021e78] __scsi_try_to_abort_cmd at ffffffff88058077
#8 [ffff81014d021e88] scsi_error_handler at ffffffff880598da
#9 [ffff81014d021f28] kthread at ffffffff81053ee5
#10 [ffff81014d021f48] kernel_thread at ffffffff8100d418

Comment 17 IBM Bug Proxy 2008-09-11 05:31:10 UTC
(In reply to comment #79)
> crash> struct scsi_qla_host ffff81014ece4610|grep mbx_cmd_flags
>   mbx_cmd_flags = 4,
>
>   unsigned long   mbx_cmd_flags;
> #define MBX_INTERRUPT   1
> #define MBX_INTR_WAIT   2
> #define MBX_UPDATE_FLASH_ACTIVE 3
>
> From the stack trace MBX_INTR_WAIT should be set..indicating that we are waiting
> for the interrupt. But the dump is showing the flag value MBX_UPDATE_FLASH_ACTIVE.
>
> If we look at the interrupt handler:
>
>   if (test_bit(MBX_INTR_WAIT, &ha->mbx_cmd_flags) &&
>             (status & MBX_INTERRUPT) && ha->flags.mbox_int) {
>                 set_bit(MBX_INTERRUPT, &ha->mbx_cmd_flags);
>                 up(&ha->mbx_intr_sem);
>         }
> Hence interrupt handler wont call up() if this bit is not set to MBX_INTR_WAIT.
> This explains why the io_done is not being called..and why all IOs are waiting.
>
> But we need to figure out how and why this bit is set?
>
> Another interesting factor is:
>
> Looking at the code, MBX_UPDATE_FLASH_ACTIVE is set in suspend_hba() but if the
> code goes to that path, interrupts_on should be set to 0. But the dump tells
> otherwise.
> crash> struct scsi_qla_host ffff81014ece4610|grep interrupts_on
>   interrupts_on = 1 '\001',
>
> So it is puzzeling how the mbx_cmd_flags is set to MBX_UPDATE_FLASH_ACTIVE.
>
> Another minor mistake(no problem in the functionality) found in the code:
>
>       tmp_intr_timer.data = (unsigned long)&ha->mbx_intr_sem;
>                 tmp_intr_timer.expires = jiffies + mcp->tov * HZ;
>                 tmp_intr_timer.function =
>                     (void (*)(unsigned long))qla2x00_mbx_sem_timeout;
>
> where mbx_intr_sem is compact_semaphore
>  static void
> qla2x00_mbx_sem_timeout(unsigned long data)
> {
>         struct semaphore        *sem_ptr = (struct semaphore *)data;
>
> this data is typecasted to (struct semaphore)
>

As i mentioned in this comment, as minor issue is appearing to be THE issue.
because of the rt switch in include/linux/rt_lock.h and PICK_SEM_OP() picks the
one depending on the type of the semaphore passed.

#define up(sem) PICK_SEM_OP(compat_up, rt_up, sem)

So the fix is to set the typecast correctly to compat_semaphore.
Changed and tests are running from morning.

Rostedt, gave a patch with magic check to find out hidden mistakes like this.
I will be testing with that patch tomorrow.

Comment 18 Steven Rostedt 2008-09-11 20:59:50 UTC
Created attachment 316488 [details]
patch to fix compat semaphore in qla2x00 driver

The ql2x00 driver passes the semaphore to a timeout routine via a pointer. The semaphore that is passed is really an compat_semaphore, but the callback function typecasts it to a semaphore. The up() logic calls the rt code if the type is a semaphore, and calls the old semaphore code is it is a compat_semaphore. Thus, we were calling the wrong function.

This patch is a one liner that simply makes the type of semaphore match the original one being passed in.

This should solve the issue.

Comment 19 Luis Claudio R. Goncalves 2008-09-12 17:25:38 UTC
Patch added to 2.6.24.7-80.el5rt.

Comment 20 Steven Rostedt 2008-09-12 19:56:42 UTC
Created attachment 316618 [details]
patch to catch misuses of rtmutexs

This patch adds "magic" numbers to the rtmutexs and these are checked everytime they are used. If a compat_semaphore or raw_spinlock is called to these routines, than a warning will be printed.

Comment 22 David Sommerseth 2008-10-01 17:12:26 UTC
patch to fix compat semaphore in qla2x00 driver was found in mrg-rt.git as commit b95c0db0f90e16c458100256f3c4f0daecb6d4eb.

patch to catch misuses of rtmutexs found in mrg-rt.git as commit 12538e0ad48ca5cc5f861863b178c03b7d4d28f5.

Have not managed to find suitable RHTS test system yet with the right driver available.  

Only code review done.  Both patches are found in mrg-rt-2.6.24.7-81

Comment 23 David Sommerseth 2008-10-02 12:35:27 UTC
Tried to run the "online/offline" test (https://bugzilla.redhat.com/attachment.cgi?id=315191) on veritas2.rhts.bos.redhat.com.

Ran the test with 250 iterations on 2.6.24.7-74rt, without triggering anything close to a kernel crash.  Ran the same test on 2.6.24.7-81rt, with the same results.

Well, I got one result ... root filesystem got screwed up quite well and had to reinstall OS between the runs ... but that's not so surprisingly, considering what the test actually does ... and it does it quite brutally.

Not sure if I've found a setup which is close enough to reproduce this bug.

Comment 26 errata-xmlrpc 2008-10-07 19:21:11 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0857.html

Comment 27 IBM Bug Proxy 2008-10-17 12:14:28 UTC
I've verified the compat_sem fix is in -85.