Bug 460455
Summary: | [FOCUS][24] R2:SAN:Hang triggered by filesystem testing on SAN | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | IBM Bug Proxy <bugproxy> | ||||||||
Component: | realtime-kernel | Assignee: | Steven Rostedt <srostedt> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | |||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 1.0 | CC: | bhu, davids, lgoncalv, pzijlstr, srostedt, tglx, williams | ||||||||
Target Milestone: | 1.0.3 | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | All | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-10-07 19:21:11 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
IBM Bug Proxy
2008-08-28 10:40:28 UTC
Created attachment 315191 [details]
online/offline test
*** Continuation of the previous comment *** .lb_breaks : 32332 .rt.rt_nr_running : 2 .rt.rt_nr_uninterruptible : 4079 cfs_rq .exec_clock : 4898525.373802 .MIN_vruntime : 34189622.761175 .min_vruntime : 34189702.761175 .max_vruntime : 34189622.761175 .spread : 0.000000 .spread0 : 0.000000 .nr_running : 3 .load : 3072 .bkl_count : 698 .nr_spread_over : 3069086 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- IRQ-19 746 -12494417.560092 10062067 4 -12494417.560092 143536.971248 0.005545 IRQ-17 2798 -2244852.954366 181099 4 -2244852.954366 2411.567408 0.020340 ntpd 4379 34189622.761175 34907 120 34189622.761175 1547.736658 29332812.052997 avahi-daemon 4598 34189622.761175 5 120 34189622.761175 0.317794 29333393.056235 R blast 10368 34189702.761175 6477830 120 34189702.761175 3658908.920682 23218972.614414 cpu#1, 1600.057 MHz .nr_running : 0 .load : 0 .nr_switches : 84617285 .nr_load_updates : 29398387 .nr_uninterruptible : -12763 .jiffies : 4324061591 .next_balance : 4324.061614 .curr->pid : 0 .clock : 29393919.445091 .idle_clock : 0.000000 .prev_clock_raw : 75270672.883051 .clock_warps : 0 .clock_overflows : 128687 .clock_deep_idle_events : 0 .clock_max_delta : 0.996688 .cpu_load[0] : 0 .cpu_load[1] : 0 .cpu_load[2] : 0 .cpu_load[3] : 0 .cpu_load[4] : 0 .yld_exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 416 .sched_switch : 0 .sched_count : 85639915 .sched_goidle : 26518251 .ttwu_count : 51411965 .ttwu_local : 48234063 .bkl_count : 527 .rto_schedule : 33245639 .rto_schedule_tail : 27157 .rto_wakeup : 2347621 .rto_pulled : 45 .rto_pushed : 30951 .lb_breaks : 20406 .rt.rt_nr_running : 0 .rt.rt_nr_uninterruptible : -4098 cfs_rq .exec_clock : 5381010.622923 .MIN_vruntime : 0.000001 .min_vruntime : 30962011.829487 .max_vruntime : 0.000001 .spread : 0.000000 .spread0 : -3227690.931688 .nr_running : 0 .load : 0 .bkl_count : 527 .nr_spread_over : 2184127 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- cpu#2, 1600.057 MHz .nr_running : 0 .load : 0 .nr_switches : 76753549 .nr_load_updates : 29398316 .nr_uninterruptible : 63109 .jiffies : 4324061591 .next_balance : 4324.061674 .curr->pid : 0 .clock : 29393848.455904 .idle_clock : 0.000000 .prev_clock_raw : 75270685.700782 .clock_warps : 0 .clock_overflows : 108048 .clock_deep_idle_events : 0 .clock_max_delta : 0.996108 .cpu_load[0] : 0 .cpu_load[1] : 0 .cpu_load[2] : 0 .cpu_load[3] : 0 .cpu_load[4] : 0 .yld_exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 445 .sched_switch : 0 .sched_count : 76966943 .sched_goidle : 25457885 .ttwu_count : 45870808 .ttwu_local : 45203771 .bkl_count : 65 .rto_schedule : 29527971 .rto_schedule_tail : 955 .rto_wakeup : 31174 .rto_pulled : 18 .rto_pushed : 11261 .lb_breaks : 16155 .rt.rt_nr_running : 0 .rt.rt_nr_uninterruptible : -210 cfs_rq .exec_clock : 4532559.248778 .MIN_vruntime : 0.000001 .min_vruntime : 38661343.821066 .max_vruntime : 0.000001 .spread : 0.000000 .spread0 : 4471641.059891 .nr_running : 0 .load : 0 .bkl_count : 65 .nr_spread_over : 89403 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- cpu#3, 1600.057 MHz .nr_running : 0 .load : 0 .nr_switches : 75597502 .nr_load_updates : 29398246 .nr_uninterruptible : 50434 .jiffies : 4324061591 .next_balance : 4324.061681 .curr->pid : 0 .clock : 29393778.466530 .idle_clock : 0.000000 .prev_clock_raw : 75270693.534770 .clock_warps : 0 .clock_overflows : 102878 .clock_deep_idle_events : 0 .clock_max_delta : 0.996048 .cpu_load[0] : 0 .cpu_load[1] : 0 .cpu_load[2] : 0 .cpu_load[3] : 0 .cpu_load[4] : 0 .yld_exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 972 .sched_switch : 0 .sched_count : 75735418 .sched_goidle : 27355032 .ttwu_count : 45267919 .ttwu_local : 44694884 .bkl_count : 44 .rto_schedule : 29463844 .rto_schedule_tail : 341 .rto_wakeup : 388 .rto_pulled : 13 .rto_pushed : 6784 .lb_breaks : 20677 .rt.rt_nr_running : 0 .rt.rt_nr_uninterruptible : -87 cfs_rq .exec_clock : 2558515.929605 .MIN_vruntime : 0.000001 .min_vruntime : 37206459.647282 .max_vruntime : 0.000001 .spread : 0.000000 .spread0 : 3016756.886107 .nr_running : 0 .load : 0 .bkl_count : 44 .nr_spread_over : 73282 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- cpu#4, 1600.057 MHz .nr_running : 1 .load : 177522 .nr_switches : 75246847 .nr_load_updates : 29398175 .nr_uninterruptible : 40081 .jiffies : 4324061591 .next_balance : 4324.061595 .curr->pid : 587 .clock : 29393707.511111 .idle_clock : 0.000000 .prev_clock_raw : 75418323.767358 .clock_warps : 0 .clock_overflows : 138647 .clock_deep_idle_events : 0 .clock_max_delta : 0.996031 .cpu_load[0] : 0 .cpu_load[1] : 0 .cpu_load[2] : 0 .cpu_load[3] : 0 .cpu_load[4] : 0 .yld_exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 1426 .sched_switch : 0 .sched_count : 75333824 .sched_goidle : 28395180 .ttwu_count : 45180035 .ttwu_local : 44719289 .bkl_count : 29 .rto_schedule : 29460959 .rto_schedule_tail : 198 .rto_wakeup : 333 .rto_pulled : 8 .rto_pushed : 3890 .lb_breaks : 23273 .rt.rt_nr_running : 1 .rt.rt_nr_uninterruptible : -62 cfs_rq .exec_clock : 1437848.398469 .MIN_vruntime : 0.000001 .min_vruntime : 28071407.036211 .max_vruntime : 0.000001 .spread : 0.000000 .spread0 : -6118295.724964 .nr_running : 0 .load : 0 .bkl_count : 29 .nr_spread_over : 69496 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- R IRQ-3 587 179.969964 444 4 179.969964 9.079176 0.005109 cpu#5, 1600.057 MHz .nr_running : 0 .load : 0 .nr_switches : 75376770 .nr_load_updates : 29398104 .nr_uninterruptible : 26728 .jiffies : 4324061591 .next_balance : 4324.061595 .curr->pid : 0 .clock : 29393636.488096 .idle_clock : 0.000000 .prev_clock_raw : 54153304.160724 .clock_warps : 0 .clock_overflows : 97144 .clock_deep_idle_events : 0 .clock_max_delta : 0.998701 .cpu_load[0] : 0 .cpu_load[1] : 0 .cpu_load[2] : 0 .cpu_load[3] : 0 .cpu_load[4] : 0 .yld_exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 1677 .sched_switch : 0 .sched_count : 75458943 .sched_goidle : 28789756 .ttwu_count : 45282745 .ttwu_local : 44965719 .bkl_count : 99 .rto_schedule : 29459623 .rto_schedule_tail : 119 .rto_wakeup : 455 .rto_pulled : 8 .rto_pushed : 2446 .lb_breaks : 18517 .rt.rt_nr_running : 0 .rt.rt_nr_uninterruptible : -41 cfs_rq .exec_clock : 909053.116367 .MIN_vruntime : 0.000001 .min_vruntime : 25311238.380281 .max_vruntime : 0.000001 .spread : 0.000000 .spread0 : -8878464.380894 .nr_running : 0 .load : 0 .bkl_count : 99 .nr_spread_over : 69037 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- cpu#6, 1600.057 MHz .nr_running : 0 .load : 0 .nr_switches : 89140520 .nr_load_updates : 29398032 .nr_uninterruptible : -156266 .jiffies : 4324061591 .next_balance : 4324.061751 .curr->pid : 0 .clock : 29393564.500525 .idle_clock : 0.000000 .prev_clock_raw : 75270686.526764 .clock_warps : 0 .clock_overflows : 167827 .clock_deep_idle_events : 0 .clock_max_delta : 0.998390 .cpu_load[0] : 0 .cpu_load[1] : 0 .cpu_load[2] : 0 .cpu_load[3] : 0 .cpu_load[4] : 0 .yld_exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 1774 .sched_switch : 0 .sched_count : 90431997 .sched_goidle : 30559564 .ttwu_count : 55066830 .ttwu_local : 54451675 .bkl_count : 636 .rto_schedule : 36481231 .rto_schedule_tail : 77343 .rto_wakeup : 4217166 .rto_pulled : 58 .rto_pushed : 27291 .lb_breaks : 5601 .rt.rt_nr_running : 0 .rt.rt_nr_uninterruptible : -52 cfs_rq .exec_clock : 2604299.438836 .MIN_vruntime : 0.000001 .min_vruntime : 34814678.176308 .max_vruntime : 0.000001 .spread : 0.000000 .spread0 : 624975.415133 .nr_running : 0 .load : 0 .bkl_count : 636 .nr_spread_over : 2998327 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- cpu#7, 1600.057 MHz .nr_running : 0 .load : 0 .nr_switches : 76842021 .nr_load_updates : 29397960 .nr_uninterruptible : 85523 .jiffies : 4324061591 .next_balance : 4324.061594 .curr->pid : 0 .clock : 29393492.510016 .idle_clock : 0.000000 .prev_clock_raw : 75270695.181804 .clock_warps : 0 .clock_overflows : 101562 .clock_deep_idle_events : 0 .clock_max_delta : 0.995775 .cpu_load[0] : 0 .cpu_load[1] : 0 .cpu_load[2] : 0 .cpu_load[3] : 0 .cpu_load[4] : 0 .yld_exp_empty : 0 .yld_act_empty : 0 .yld_both_empty : 0 .yld_count : 2428 .sched_switch : 0 .sched_count : 76994234 .sched_goidle : 27556332 .ttwu_count : 45937928 .ttwu_local : 45374789 .bkl_count : 52 .rto_schedule : 29464909 .rto_schedule_tail : 374 .rto_wakeup : 1307 .rto_pulled : 7 .rto_pushed : 4268 .lb_breaks : 101688 .rt.rt_nr_running : 0 .rt.rt_nr_uninterruptible : -73 cfs_rq .exec_clock : 2344576.091583 .MIN_vruntime : 0.000001 .min_vruntime : 34734213.666132 .max_vruntime : 0.000001 .spread : 0.000000 .spread0 : 544510.904957 .nr_running : 0 .load : 0 .bkl_count : 52 .nr_spread_over : 89724 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- ------- Additional Comment #34 From Venkateswarara Jujjuri 2008-08-21 01:48 EDT Today I did three experiments. 1. Blast to local disks (No SAN configuration): I my previous post; I said this configuration working fine. Well my statement was too early. I let it run overnight and it hung the system. So this confirms that the problem is not SAN related. 2. Made elm3c29 without multipath drivers. I started blast on it. I will let it run overnight. 3. On elm3c24 with multipath driver. I started the blast. As usual after 3 hours it practically hung the system. Through the console I took system back trace twice with 1hr and 24 hours gap. Plan is to analyze these two stacks and see which processes got hung and what are making progress. I am hoping this gives us some indication on lock contention. I will update the defect with my findings. ------- Additional Comment #35 From Sripathi Kodi 2008-08-21 04:46 EDT Tim, the list of processes in comment #26 seems to be incomplete. Copy-paste error, by any chance? ------- Additional Comment #36 From Sripathi Kodi 2008-08-21 07:39 EDT [reply] I am trying to make sense out of the dump mentioned in comment #22. However, it is very difficult to get information about function parameters. Looks like we are not passing them on the stack any more. ------- Additional Comment #37 From Vernon Mauery 2008-08-21 09:11 EDT [reply] For some reason, I think we went back to register passed function arguments. Even as far back as SR3, we are using register passed arguments. It sure does make it a bear to debug. ------- Additional Comment #38 From TIMOTHY R. CHAVEZ 2008-08-21 11:40 EDT (In reply to comment #35) Yeah, I noticed that last night... I couldn't keep the console alive long enough to get it all... I'll try again today. There are some interesting stack traces for blast and pdflush, for example. ------- Additional Comment #40 From Venkateswarara Jujjuri 2008-08-22 01:03 EDT This defect is giving conflicting data points. From the latest investigation: - SAN configured with dm_multipath hung in 1 hour. - SAN configured without dm_multipath ran overnight w/o any problem. This may tell us that.. it could be two different problems. The one we have seen on the local disks could be a different issue. I took the crashdump on elm3c24 when it is hung. Dump is available on KBIC. The stack shows that most of the threads have submitted IO and waiting for it to complete. Or waiting for a buffer to submit IO. I could not find anyone holding lock and got preempted. This might give an indication that the bio_done notification could be an issue. Another interesting thing is; I took 3 back traces with an hour gap ..while the system is hung. And then dump after 10 hours. Stack on all these three instances are identical. This tells us that the stacks pointed by dump are that of the hung processes...not a point in time. Next step is to instrument dm_scsi module. I will be doing that tomorrow. ------- Additional Comment #41 From Sripathi Kodi 2008-08-22 06:27 EDT [reply] Google led me to http://readlist.com/lists/vger.kernel.org/linux-kernel/88/443749.html which reports a problem with processes getting stuck in sync_buffer. We see some bash and blast threads in our dumps which are stuck with similar backtraces. The next mail on the thread : http://readlist.com/lists/vger.kernel.org/linux-kernel/88/443762.html says the problem is fixed by commit 149a051f82d2b3860fe32fa182dbc83a66274894. In fact the code around this area has changed again since then. This patch is not in MRG kernels. Has this problem been recreated with latest kernels? 2.6.26-rt1? Can the above be the patch we need? ------- Additional Comment #42 From TIMOTHY R. CHAVEZ 2008-08-25 10:13 EDT (In reply to comment #41) I couldn't get this elmb3b210 (the LS41 w/ local disk we were able to repo the failure on) to boot 2,6,23,3-rt2 for some reason... blargh... trying with an LS21 this morning. ------- Additional Comment #43 From Venkateswarara Jujjuri 2008-08-25 12:37 EDT Over the wk-end; I booted elm3c24 with 2.6.26.3-rt2 and was able to reproduce the problem. Configuration of the system: - AMD 2 Quad Cores - fastT San disks. - Single Path. I reproduced the problem on this kernel with and without multipath driver in the stack. i.e on mpath devices and on sd devices. Now next plan is: 1. Configure the system with non-ext3 fs( ReiserFS??) and rerun the blast. 2. Walk through the code and instrument it. With the help of Paul Mckenney's debugfs. 3. Analyze the dump. ------- Additional Comment #44 From Venkateswarara Jujjuri 2008-08-26 01:04 EDT To eliminate the FS type variable; I have configured the system with JFS filesystem and started the blast test on multipath (mpath)devices. Test are running for the last 10 hours.!! Ext3 multipath combination consistently hung the system in 2 hours. Interesting that JFS+mpath has been running for the last 10 hours. We may have to wait until tomorrow morning before making any conclusions. ------- Additional Comment #47 From Venkateswarara Jujjuri 2008-08-26 13:56 EDT My testing on JFS+mpath+RT went fine for almost 20 hours. and then blast failed with "Bus error" and on the terminal : Message from syslogd@ at Tue Aug 26 10:36:43 2008 ... elm3c24 kernel: journal commit I/O error More digging showed that the problem is with the root fs running ext3 dmesg shows: EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_frozen_data journal commit I/O error Buffer I/O error on device dm-0, logical block 15466498 lost page write due to I/O error on dm-0 Buffer I/O error on device dm-0, logical block 15466499 lost page write due to I/O error on dm-0 Buffer I/O error on device dm-0, logical block 15892561 lost page write due to I/O error on dm-0 Buffer I/O error on device dm-0, logical block 15892605 lost page write due to I/O error on dm-0 __ratelimit: 103 messages suppressed Buffer I/O error on device dm-0, logical block 0 lost page write due to I/O error on dm-0 EXT3-fs error (device dm-0): ext3_find_entry: reading directory #13598721 offset 0 EXT3-fs error (device dm-0): ext3_find_entry: reading directory #13598721 offset 0 EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=17465504, block=17465350 EXT3-fs error (device dm-0): ext3_find_entry: reading directory #13598721 offset 0 metapage_write_end_io: I/O error ------- Additional Comment #48 From Venkateswarara Jujjuri 2008-08-26 14:57 EDT Ext3 issue reported in Comment #47 is because of SAN s/w reset. Please ignore it. ------- Additional Comment #49 From Venkateswarara Jujjuri 2008-08-27 01:19 EDT Tried on ext2 fs and got reproduced in around 3 hours. To make sure; I have started JFS test again. Me and Keith spent most of the afternoon cscoping and code walking the IO path. Crash shows number io waits on each processor. We are trying to got to the bio buffer head to findout the status of pending IOs. crash> struct rq ffff810009016980 | grep -e iowait -e counter nr_iowait = { counter = 6 crash> struct rq ffff8101500a2980 | grep -e iowait -e counter nr_iowait = { counter = 2 crash> struct rq ffff810009021980 | grep -e iowait -e counter nr_iowait = { counter = 0 crash> struct rq ffff8101500ad980 | grep -e iowait -e counter nr_iowait = { counter = 4 crash> struct rq ffff81000902c980 | grep -e iowait -e counter nr_iowait = { counter = 0 crash> struct rq ffff8101500b8980 | grep -e iowait -e counter nr_iowait = { counter = 2 crash> struct rq ffff81000900b980 | grep -e iowait -e counter nr_iowait = { counter = 27 crash> struct task_struct ffff81014d141480 > blast_task_struct crash> struct task_struct ffff81014d141480 |grep -e blast -e bio comm = "blast\000d\000\000\000\000\000\000\000\000", bio_list = 0x0, bio_tail = 0x0, crash> foreach struct task_struct |less foreach: unknown argument: "struct" foreach: unknown argument: "task_struct" foreach: no keywords specified crash> foreach task|less ------- Additional Comment #50 From TIMOTHY R. CHAVEZ 2008-08-27 09:47 EDT (In reply to comment #49) Interesting... I left blast running for about 19 hours and 20 minutes with ext2 / about 835 loops (I left my notes in the office so these are approximations) without problems. That said, I did notice I was only testing on 3 mount points and not 4 (doh!). I restarted this morning with ext3 mounted and made sure it was running on 4 mounts. Is the dumpfile available on kbic? Were you able to get any other information about the system via sysrq? ------- Additional Comment #51 From TIMOTHY R. CHAVEZ 2008-08-27 16:41 EDT (In reply to comment #41) I was skimming through LKML and I came across this conversation which reports a problem with a stack trace that seems awfully similar... I do not yet know what the conclusion of this dialog was. Adding link for documented reference... http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-03/msg02480.html A 36 hour run reproduced the problem on JFS too. This brings us back from suspecting ext3 FS. I am analyzing the dump...so far nothing that points to a smoking gun. But one thing that looks suspicious. From the dump: struct mapped_device { .... .... /* * A list of ios that arrived while we were suspended. */ atomic_t pending; = 0000000000001fa1 wait_queue_head_t wait; struct bio_list deferred; = 0000000000000000 0000000000000000 struct bio_list pushback; = 0000000000000000 0000000000000000 ... .... } I am still looking at the code if this is a valid/sane state. While pending count is pretty high, NULLs are there in the deferred and pushback list. In the next comment I will add complete crash analysis on how I came to this data structure. Here is the detailed analysis on how I came to the device_mapper data structure. Took one of the stuck'd blast thread. PID: 6840 TASK: ffff81014d141480 CPU: 1 COMMAND: "blast" #0 [ffff8101330bb898] schedule at ffffffff8128565f #1 [ffff8101330bb950] io_schedule at ffffffff81285b9c #2 [ffff8101330bb970] sync_buffer at ffffffff810d21ca #3 [ffff8101330bb980] __wait_on_bit at ffffffff81285e14 #4 [ffff8101330bb9c0] out_of_line_wait_on_bit at ffffffff81285eb4 #5 [ffff8101330bba30] __wait_on_buffer at ffffffff810d214a #6 [ffff8101330bba40] sync_dirty_buffer at ffffffff810d2ee7 ffff8101330bba48: ffff81019dd42e98 ffff8102348272d0 ffff8101330bba58: ffff8101330bbaa8 ffffffff88025685 #7 [ffff8101330bba60] journal_dirty_data at ffffffff88025685 ffff8101330bba68: ffff8102348272d0 ffff81014c4cb800 ffff8101330bba78: ffff81019dd42e98 0000000000001000 ffff8101330bba88: ffff8102348272d0 ffff81019dd42df8 ffff8101330bba98: ffff81019dd42df8 ffff81019dd42df8 ffff8101330bbaa8: ffff8101330bbad8 ffffffff8803720c #8 [ffff8101330bbab0] ext3_journal_dirty_data at ffffffff8803720c #9 [ffff8101330bbae0] walk_page_buffers at ffffffff8803659b #10 [ffff8101330bbb40] ext3_ordered_write_end at ffffffff88038dcd #11 [ffff8101330bbba0] generic_file_buffered_write at ffffffff81085771 #12 [ffff8101330bbca0] __generic_file_aio_write_nolock at ffffffff81085eef #13 [ffff8101330bbd50] generic_file_aio_write at ffffffff81085fbd #14 [ffff8101330bbda0] ext3_file_write at ffffffff880352ae #15 [ffff8101330bbdd0] do_sync_write at ffffffff810afd93 #16 [ffff8101330bbf10] vfs_write at ffffffff810b05d9 #17 [ffff8101330bbf40] sys_write at ffffffff810b0be9 crash> struct buffer_head ffff81019dd42df8 struct buffer_head { b_state = 16413, 100000000011101 BH_Uptodate|BH_Lock|BH_Req|BH_Mapped Indicating that, it is a locked buffer with valid disk mapping and has been suubmitted for IO. b_this_page = 0xffff81019dd42df8, b_page = 0xffffe2000c8bc200, b_blocknr = 61191, b_size = 4096, b_data = 0xffff8102174b0000 "w\003\035", b_bdev = 0xffff81014f4b99c0, ==> This is block_device b_end_io = 0xffffffff810d36f2 <end_buffer_write_sync>, b_private = 0xffff8101897081f0, b_assoc_buffers = { next = 0xffff81019dd42e40, prev = 0xffff81019dd42e40 }, ... .. } crash> struct block_device 0xffff81014f4b99c0 struct block_device { bd_dev = 265289733, bd_inode = 0xffff81014f4b9ab8, bd_openers = 1, .... ... bd_holder = 0xffffffff880505a0, bd_holders = 2, bd_holder_list = { next = 0xffff81014f4b9a60, prev = 0xffff81014f4b9a60 }, bd_contains = 0xffff81014f4b99c0, bd_block_size = 4096, bd_part = 0x0, bd_part_count = 0, bd_invalidated = 0, bd_disk = 0xffff81014e0a2800, ==> This is the gendisk. bd_list = { next = 0xffff81014f430258, prev = 0xffff81014f47a658 }, bd_inode_backing_dev_info = 0x0, bd_private = 0 } crash> struct gendisk 0xffff81014e0a2800 struct gendisk { major = 253, first_minor = 5, minors = 1, disk_name = "dm-5\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\ 000\000\000\000\000\000\000\000\000\000\000\000", ==> My disk name. part = 0x0, part_uevent_suppress = 0, fops = 0xffffffff881a87a0, queue = 0xffff81014e4a1858, ==> This is the request_qaueue private_data = 0xffff81014e174400, capacity = 2097152, flags = 16, driverfs_dev = 0x0, ... ... } crash> struct request_queue 0xffff81014e4a1858 struct request_queue { queue_head = { next = 0x0, prev = 0x0 }, .... .... make_request_fn = 0xffffffff8819dac3, <dm_request> prep_rq_fn = 0, unplug_fn = 0xffffffff8819d080, < dm_unplug_all > merge_bvec_fn = 0, prepare_flush_fn = 0, softirq_done_fn = 0, end_sector = 0, boundary_rq = 0x0, ... ... backing_dev_info = { ra_pages = 32, state = 0, capabilities = 4, congested_fn = 0xffffffff8819d02d, <dm_any_congested> congested_data = 0xffff81014e174400, unplug_io_fn = 0xffffffff81127e85 <blk_backing_dev_unplug>, unplug_io_data = 0xffff81014e4a1858, bdi_stat = {{ lock = { raw_lock = { slock = 43433 }, break_lock = 0 }, count = 54880, list = { next = 0xffff81024d5a8158, prev = 0xffff81014e4a1a78 }, counters = 0x7efeb0ecfebf }, { lock = { raw_lock = { slock = 57054 }, break_lock = 0 }, count = 2688, list = { next = 0xffff81014e4a1a50, prev = 0xffff81014e4a1aa0 }, counters = 0x7efeb172233f }}, completions = { events = { lock = { raw_lock = { slock = 39321 }, break_lock = 0 }, count = 169288, list = { next = 0xffff81014e4a1a78, prev = 0xffff81024d95d2b0 }, counters = 0x7efeb1f4bebf }, shift = 19, period = 24903680, lock = { raw_lock = { slock = 24158 }, break_lock = 0 } }, dirty_exceeded = 0 }, queuedata = 0xffff81014e174400, ==> This is mapped_device bounce_pfn = 2424832, bounce_gfp = 16, crash> struct mapped_device 0xffff81014e174400 struct: invalid data structure reference: mapped_device crash> crash> rd 0xffff81014e174400 100 // Intrpreting it as mapped_device ffff81014e174400: 0000000000000000 0000000000005e5e ........^^...... ffff81014e174410: ffff81014e174410 ffff81014e174410 .D.N.....D.N.... ffff81014e174420: ffff81014e174420 ffff81014e174420 D.N.... D.N.... ffff81014e174430: 0000000000000000 0000000000000000 ................ ffff81014e174440: ffff81014e174440 ffff81014e174440 @D.N........ ffff81014e174450: 000000000000008c 0000000000000001 ................ ffff81014e174460: 0000000000000000 ffff81014e174468 ........hD.N.... ffff81014e174470: ffff81014e174468 ffff81014e174478 hD.N....xD.N.... ffff81014e174480: ffff81014e174478 0000000000000000 xD.N............ ffff81014e174490: 0000000000000000 ffff81014e174498 .........D.N.... ffff81014e1744a0: ffff81014e174498 ffff81014e1744a8 .D.N.....D.N.... ffff81014e1744b0: ffff81014e1744a8 0000000000000000 .D.N............ ffff81014e1744c0: 0000000000000000 0000000000000000 ................ ffff81014e1744d0: 0000000000006969 ffff81014e1744d8 ii.......D.N.... ffff81014e1744e0: ffff81014e1744d8 ffff81014e1744e8 .D.N.....D.N.... ffff81014e1744f0: ffff81014e1744e8 0000000000000000 .D.N............ ffff81014e174500: 0000000000000000 ffff81014e174508 .........E.N.... ffff81014e174510: ffff81014e174508 000000000000008c .E.N............ ffff81014e174520: 0000000000000000 0000000100000003 ................ ffff81014e174530: 0000000000000000 ffff81014e4a1858 ........X.JN.... ^^^^^ request_queue ffff81014e174540: ffff81014e0a2800 000000353a333532 .(.N....253:5... ^^^^ gendisk ^^^^ name ffff81014e174550: 0000000000000000 ffff81014e0c5b40 ........@[.N.... ^^^^^^^^^^^interface_ptr ffff81014e174560: 0000000000001fa1 0000000000000000 ................ ^^^^^^^^ pending?? this is a huge number. ffff81014e174570: ffff81014e174570 ffff81014e174570 pE.N....pE.N.... ffff81014e174580: ffff81014e174580 ffff81014e174580 .E.N.....E.N.... ffff81014e174590: 0000000000000000 0000000000000000 ................ ffff81014e1745a0: ffff81014e1745a0 ffff81014e1745a0 .E.N.....E.N.... ^^^ wait (__wait_queue_head) ends here ffff81014e1745b0: 0000000000000000 0000000000000000 ................ struct bio_list deferred; ffff81014e1745c0: 0000000000000000 0000000000000000 ................ struct bio_list pushback; ffff81014e1745d0: ffff81014e56d200 ffff81014e99c480 ..VN.......N.... ^^^^ map (struct dm_table *map;) ffff81014e1745e0: ffff81014e99c080 ffff81014e8dd3c0 ...N.......N.... ^^ *bs(bio_set) ffff81014e1745f0: 0000000000000001 0000000000000000 ................ ffff81014e174600: ffff81014e174600 ffff81014e174600 .F.N.....F.N.... ffff81014e174610: ffff81014e174610 ffff81014e174610 .F.N.....F.N.... ffff81014e174620: 0000000000000000 0000000000000000 ................ ffff81014e174630: ffff81024a9a3c90 ffff81024a9a3c90 .<.J.....<.J.... ffff81014e174640: 0000000000000000 ffff81014e174648 ........HF.N.... ffff81014e174650: ffff81014e174648 0000000000000000 HF.N............ ffff81014e174660: ffff81014e174660 ffff81014e174660 `F.N....`F.N.... ffff81014e174670: ffff81014e174670 ffff81014e174670 pF.N....pF.N.... Changed the default IO scheduler from "cfq" to "anticipatory" and started the blast on mpath devices. It is running for the last 11 hours. In this configuration with "cfq" hung the machine consistently within 3 hours. I can see machine response is slow; but is responsive and blast is running. I will let it run.... (In reply to comment #60) > Changed the default IO scheduler from "cfq" to "anticipatory" and started the > blast on mpath devices. It is running for the last 11 hours. In this > configuration with "cfq" hung the machine consistently within 3 hours. > What was Ming Ming's rationale for switching to the anticipatory I/O scheduler? (In reply to comment #61) > (In reply to comment #60) > > Changed the default IO scheduler from "cfq" to "anticipatory" and started the > > blast on mpath devices. It is running for the last 11 hours. In this > > configuration with "cfq" hung the machine consistently within 3 hours. > > > > > What was Ming Ming's rationale for switching to the anticipatory I/O scheduler? CFQ is relatively new. Became default from 2.6.18. So the idea is to try anticipatory which was the default one before CFQ...and is supposed to be a stable one. BTW, tests are still running fine. ext3 on mpath with AS io scheduler ran longer .. but finally hung the machine after 15 hours of the run. I am trying to reproduce this bug on non-rt system. Compiled 2.6.26-3 kernel; booted-up; and started the blast. Tests are running for 60+ hours. I will be stopping these tests sometime soon...Looks like the bug is only on RT system or.. may be it is not getting exposed on non-rt systems. Blast on non-rt kernel ran for 75 hours. Stopped the tests. As per Mike's request, I will be collecting dump on rt kernel tonight. On non-rt kernel, I tried to stop blast. 1 out of 4 threads did not end properly... and on the console: INFO: task blast:10604 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ffff81022b54ba28 0000000000000082 0000000000000000 ffff810170e6cf18 ffff81024d17d340 ffff81024faa2b70 ffff81024d17d578 0000000300000001 00000000ffffffff 0000000000000003 0000000000000000 0000000000000000 Call Trace: [<ffffffff802f6aab>] log_wait_commit+0x9f/0xed [<ffffffff8023d90a>] autoremove_wake_function+0x0/0x2e [<ffffffff802f5605>] log_do_checkpoint+0xdf/0x348 [<ffffffff8029c3ae>] __find_get_block+0x148/0x158 [<ffffffff802f2985>] journal_stop+0x181/0x18d [<ffffffff8034c569>] __next_cpu+0x19/0x26 [<ffffffff80225401>] find_busiest_group+0x25e/0x6c6 [<ffffffff80209df6>] __switch_to+0x2ea/0x2f9 [<ffffffff8055a09f>] thread_return+0x3d/0x9c [<ffffffff80234ce2>] lock_timer_base+0x26/0x4b [<ffffffff802f5944>] __log_wait_for_space+0x7d/0xa1 [<ffffffff802f304a>] start_this_handle+0x2c3/0x304 [<ffffffff8023d90a>] autoremove_wake_function+0x0/0x2e [<ffffffff8029199d>] mntput_no_expire+0x20/0x107 [<ffffffff802f3126>] journal_start+0x9b/0xd2 [<ffffffff802ea2f8>] ext3_create+0x30/0xe9 [<ffffffff8028d77b>] d_alloc+0x15b/0x1a8 [<ffffffff802850fb>] vfs_create+0x75/0xba [<ffffffff80287a13>] do_filp_open+0x1dd/0x773 [<ffffffff8023d90a>] autoremove_wake_function+0x0/0x2e [<ffffffff802e2835>] ext3_discard_reservation+0x52/0x61 [<ffffffff8034c5f9>] _atomic_dec_and_lock+0x39/0x54 [<ffffffff8027c4d6>] do_sys_open+0x46/0xca [<ffffffff8020b07b>] system_call_after_swapgs+0x7b/0x80 After sometime blast killed itself for the timeout with the message: Waiting for tests to end. BLAST Ended on /SAN/test2 RC = 0 at 09/01/2008 21:46:38 *=== Run statistics ============================================* | Elapsed time = 3 days, 3 hours, 38 minutes and 20 seconds *===============================================================* BLAST Ended on /SAN/test4 RC = 0 at 09/01/2008 21:46:42 *=== Run statistics ============================================* | Elapsed time = 3 days, 3 hours, 38 minutes and 13 seconds *===============================================================* BLAST Ended on /SAN/test3 RC = 0 at 09/01/2008 21:47:21 *=== Run statistics ============================================* | Elapsed time = 3 days, 3 hours, 38 minutes and 58 seconds *===============================================================* q Forcing test termination issuing KILL for thread 1120500032 Killed Ran blast on the 2.6.24-ibmrt2.6 with the "noop" io scheduler... at some point /SAN/test1 (pointing at /dev/dm-6) went "read-only"... device-mapper: multipath: Failing path 8:32. end_request: I/O error, dev sdh, sector 9910550 device-mapper: multipath: Failing path 8:112. printk: 8 messages suppressed. Buffer I/O error on device dm-4, logical block 12991 lost page write due to I/O error on dm-4 Aborting journal on device dm-4. Buffer I/O error on device dm-4, logical block 841 lost page write due to I/O error on dm-4 Buffer I/O error on device dm-4, logical block 1179668 lost page write due to I/O error on dm-4 Buffer I/O error on device dm-4, logical block 1179669 lost page write due to I/O error on dm-4 Buffer I/O error on device dm-4, logical block 1179670 lost page write due to I/O error on dm-4 Buffer I/O error on device dm-4, logical block 1179671 lost page write due to I/O error on dm-4 ext3_abort called. EXT3-fs error (device dm-4): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only ... this could be because the system is running without dm-mp enabled... I tried configuring bonnie to do something like blast using the following script: #!/bin/sh set -x FILES_X1024=100 NUM_DIRS=1 MAX_SIZE=524288 MIN_SIZE=1024 BONNIE_LOOPS=10 BONNIE_USER="-u rtuser" BONNIE_ARGS="-x $BONNIE_LOOPS -s 0 -n $FILES_X1024:$MAX_SIZE:$MIN_SIZE:$NUM_DIRS" rm -rf b{1,2,3,4,5,6,7,8} mkdir b{1,2,3,4,5,6,7,8} if [ ! -d logs ]; then mkdir logs fi chmod a+rwx b{1,2,3,4,5,6,7,8} logs bonnie++ $BONNIE_USER -p8 -x$BONNIE_LOOPS for ((i=1;i<9;i++)); do bonnie++ -y -d b$i $BONNIE_USER $BONNIE_ARGS > logs/`hostname`-`date +%Y%M%d-%H:%M:%S`-bonnie++-b$i.log & done wait It ran until the disk was full, but no hangs. I didn't have separate partitions. I am going to need to repartition a /test partition I think in order to be able to test multiple partitions, or maybe stick this spare drive in my cube into 102... I think I'll try that tomorrow. Currently running iozone on elm3c19, using both / and /test. Reproduced on elm3c29 on /dev/sd devices. good the dump. Moving the vmcore to KBIC. I have saved the core on KBIC. Defect is reproduced directly on /dev/sd devices. This should make the debugging process easier as we are avoiding multipath layers. Dump can be accessed/invoked on KBIC in the following way: jvrao@kernel:~/bz46744/elm3c29$ pwd -P /home/services/opensource/realtime/kdump_cores/bz46744/elm3c29 jvrao@kernel:~/bz46744/elm3c29$ ./crash ./vmlinux 2008-09-02-22\:25/vmcore Please contact me if you can't access the dump for any reasons. Started analyzing the elm3c29 dump. I am updating my findings to the following file: /home/services/opensource/realtime/kdump_cores/bz46744/elm3c29/jv-bt-all If anyone interested, please feel free to look into it. I reviewed the 2008-09-02-22:25/vmcore It appears that the scsi host adapter is in a recovery state post a number of IOs timing out. There are still some questions outstanding. - Why did the IOs timeout. Since it was indicated that a failure has been seen on non-san configurations we may need to look at a non-san failure vmcore to determine if signatures are similar to help focus on the cause of the timeouts. - The qla2x00_issue_iocb function is not returning during the abort sequence. I appears to be in a mod_timer call, but that does match the source I have so I will need a pointer to the source tree for this vmcore. I copied a script file and edited version of the crash info used to the url shown below. http://kernel.beaverton.ibm.com/~andmike/bugs/ibmbz_46744/ Yesterday I gave code access to Mike. I placed the source on KBIC.. /home/jvrao/public/linux-2.6.24.7-ibmrt2.6-view The box I was running blast on with local disks hung, but I have been unable to collect a dump via sysrq. JV pressed the NMI, but the system did not respond. I wasn't able to raise anyone to press the button late this afternoon. Given that, I am planning to reboot the machine and kick the tests off again. On another note, I was able to crash a SAN machine using bonnie++ instead of blast with mixed_load.sh as a background load. I collected the crash and uploaded to kbic here (IBM only for now, I can share if others are interested): http://kernel.beaverton.ibm.com/realtime/kdump_cores/bz46744/elm3c29/bonnie-crash/ A README and the script I used to execute bonnie++ are there as well. On Live machine/crash confirmed that scsi_eh_0 is not making progress. Waiting on the semaphore. This analysis shows how I tracked down that semaphore. Still need to figure out who is holding that semaphore. crash> bt 524 PID: 524 TASK: ffff81007eb494c0 CPU: 0 COMMAND: "scsi_eh_0" #0 [ffff81007ebcfb10] schedule at ffffffff8127a86c #1 [ffff81007ebcfbc8] __compat_down at ffffffff8127c6d3 #2 [ffff81007ebcfbf8] lock_timer_base at ffffffff81045f7b #3 [ffff81007ebcfc28] __compat_down_failed at ffffffff8127c329 #4 [ffff81007ebcfc78] qla2x00_mailbox_command at ffffffff880a44fc #5 [ffff81007ebcfd28] qla2x00_issue_iocb at ffffffff880a5add #6 [ffff81007ebcfdb8] qla24xx_abort_command at ffffffff880a63e3 #7 [ffff81007ebcfe08] qla2xxx_eh_abort at ffffffff8809d127 #8 [ffff81007ebcfe78] __scsi_try_to_abort_cmd at ffffffff88056f31 #9 [ffff81007ebcfe88] scsi_error_handler at ffffffff8805870d #10 [ffff81007ebcff28] kthread at ffffffff8105144f #11 [ffff81007ebcff48] kernel_thread at ffffffff8100d048 crash> crash> bt -f 524 PID: 524 TASK: ffff81007eb494c0 CPU: 0 COMMAND: "scsi_eh_0" #0 [ffff81007ebcfb10] schedule at ffffffff8127a86c ... #3 [ffff81007ebcfc28] __compat_down_failed at ffffffff8127c329 ffff81007ebcfc30: ffff81007ff97e90 0000000000000000 ffff81007ebcfc40: 0000000000000000 000000000000752f ffff81007ebcfc50: ffff81007eb494c0 0000000000000000 ffff81007ebcfc60: ffff81007eb494c0 ffff81007ebcfca0 ffff81007ebcfc70: ffff81014ece69d0 ffffffff880a44fc "include/linux/semaphore.h" down((struct compat_semaphore *sem) #4 [ffff81007ebcfc78] qla2x00_mailbox_command at ffffffff880a44fc ffff81007ebcfc80: 0000000000000000 0000000000000000 ffff81007ebcfc90: 0000000000000000 0054000100000000 ffff81007ebcfca0: 0000000000000000 0000000000200200 ffff81007ebcfcb0: 0000000102ec7256 ffffffff880a42e8 ffff81007ebcfcc0: ffff81014ece69d0 ffffffff81534700 ........................^^^^^^^^^^^ sema ffff81007ebcfcd0: ffffffff880a44b2 5f68655f69736373 ffff81007ebcfce0: 0000000000000030 000000000000020c ffff81007ebcfcf0: ffff810009005a40 ffff81014e9d2200 ffff81007ebcfd00: ffff81014ece4610 ffff81014e9d2200 ffff81007ebcfd10: 0000000000000332 ffff81014d0f94c0 ffff81007ebcfd20: ffff81007ebcfdb0 ffffffff880a5add #5 [ffff81007ebcfd28] qla2x00_issue_iocb at ffffffff880a5add ffff81007ebcfd30: 00000005000000cf 22004e9d00000054 ffff81007ebcfd40: 000100007ebcfdb0 ffffffff811b6bdb ffff81007ebcfd50: 0000000100000000 0000000000000000 ffff81007ebcfd60: 0000000000000000 ffffffffffffffff ffff81007ebcfd70: 0000000000000000 0000000000000000 ffff81007ebcfd80: 0000000000000000 ffff81000000001e ffff81007ebcfd90: ffff81014e8b8180 ffff81014e9d2200 ffff81007ebcfda0: ffff81014e8b8180 ffff81014ece4610 .......................................^^^^^^^^^^^^^^scsi_qla_host_t ffff81007ebcfdb0: ffff81007ebcfe00 ffffffff880a63e3 crash> struct scsi_qla_host ffff81014ece4610|grep host_str host_str = "qla2xxx_0\000\000\000\000\000\000", crash> struct scsi_qla_host ffff81014ece4610|grep model_number model_number = "QMI2472\000\000\000\000\000\000\000\000\000", crash> struct scsi_qla_host ffff81014ece4610|grep parent parent = 0x0, // NO parent. outstanding_cmds = {0x0, 0xffff810148028300, 0xffff81014d0f9cc0, 0xffff8100633e2dc0, 0xffff8100789abe00, 0xffff81014d0f9700, 0xffff8100789aba00, 0xffff81014d0f9480, 0xffff8100633e2900, 0xffff81014d0f9e00, 0xffff8100a95916c0, 0xffff8100a9591640, 0xffff81014e1bc340, 0xffff8101480284c0, 0xffff810148028fc0, 0xffff8100a9591380, 0xffff81014d0f91c0, 0xffff81014d0f9380, 0xffff8100a9591dc0, 0xffff81014e1bc2c0, 0xffff81014e1bc200, 0xffff8100345c1d00, 0xffff8100789ab880, 0xffff8100789abdc0, 0xffff810079056a40, 0xffff8100345c1680, 0xffff8100633e2a80, 0xffff81014d0f96c0, 0xffff8101480286c0, 0xffff81014d0f9980, 0xffff810148028d00, 0xffff810148028900, 0xffff81014d0f9280, 0xffff810148028600, 0xffff81014d0f95c0, 0xffff81014e1bc480, 0xffff8100a95913c0, 0xffff81014d0f9fc0, 0xffff81014d0f9900, 0xffff8100633e2f80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...}, current_outstanding_cmd = 39, <= Outstanding Commands. crash> struct srb 0xffff810148028300 struct srb { ha = 0xffff81014ece4610, // Validation -> pointing back to scsi_qla_host. Confirmed that we are looking at valid structure. fcport = 0xffff81014e8b8180, cmd = 0xffff81014a539940, flags = 2, dma_handle = 18446744071579714837, request_sense_length = 0, request_sense_ptr = 0x0 } crash> struct scsi_qla_host.mbx_intr_sem ffff81014ece4610 mbx_intr_sem = { count = { counter = 0 }, sleepers = 1, wait = { lock = { lock = { wait_lock = { raw_lock = { slock = 0 }, break_lock = 0 }, wait_list = { prio_list = { next = 0xffff81014ece69e0, prev = 0xffff81014ece69e0 }, node_list = { next = 0xffff81014ece69f0, prev = 0xffff81014ece69f0 } }, owner = 0x0 }, break_lock = 0 }, task_list = { next = 0xffff81007ebcfbe8, prev = 0xffff81007ebcfbe8 } } }, Apparently one this process is sleeping. Not sure if there is any easy way of finding who is holding this semaphore. Any help on how to find that out? (In reply to comment #77) > On Live machine/crash confirmed that scsi_eh_0 is not making progress. > Waiting on the semaphore. This analysis shows how I tracked down that semaphore. > Still need to figure out who is holding that semaphore. > > crash> bt 524 > PID: 524 TASK: ffff81007eb494c0 CPU: 0 COMMAND: "scsi_eh_0" > #0 [ffff81007ebcfb10] schedule at ffffffff8127a86c > #1 [ffff81007ebcfbc8] __compat_down at ffffffff8127c6d3 > #2 [ffff81007ebcfbf8] lock_timer_base at ffffffff81045f7b > #3 [ffff81007ebcfc28] __compat_down_failed at ffffffff8127c329 > #4 [ffff81007ebcfc78] qla2x00_mailbox_command at ffffffff880a44fc > #5 [ffff81007ebcfd28] qla2x00_issue_iocb at ffffffff880a5add > #6 [ffff81007ebcfdb8] qla24xx_abort_command at ffffffff880a63e3 > #7 [ffff81007ebcfe08] qla2xxx_eh_abort at ffffffff8809d127 > #8 [ffff81007ebcfe78] __scsi_try_to_abort_cmd at ffffffff88056f31 > #9 [ffff81007ebcfe88] scsi_error_handler at ffffffff8805870d > #10 [ffff81007ebcff28] kthread at ffffffff8105144f > #11 [ffff81007ebcff48] kernel_thread at ffffffff8100d048 > crash> > crash> bt -f 524 > PID: 524 TASK: ffff81007eb494c0 CPU: 0 COMMAND: "scsi_eh_0" > #0 [ffff81007ebcfb10] schedule at ffffffff8127a86c > ... > #3 [ffff81007ebcfc28] __compat_down_failed at ffffffff8127c329 > ffff81007ebcfc30: ffff81007ff97e90 0000000000000000 > ffff81007ebcfc40: 0000000000000000 000000000000752f > ffff81007ebcfc50: ffff81007eb494c0 0000000000000000 > ffff81007ebcfc60: ffff81007eb494c0 ffff81007ebcfca0 > ffff81007ebcfc70: ffff81014ece69d0 ffffffff880a44fc > "include/linux/semaphore.h" down((struct compat_semaphore *sem) > #4 [ffff81007ebcfc78] qla2x00_mailbox_command at ffffffff880a44fc > ffff81007ebcfc80: 0000000000000000 0000000000000000 > ffff81007ebcfc90: 0000000000000000 0054000100000000 > ffff81007ebcfca0: 0000000000000000 0000000000200200 > ffff81007ebcfcb0: 0000000102ec7256 ffffffff880a42e8 > ffff81007ebcfcc0: ffff81014ece69d0 ffffffff81534700 > ........................^^^^^^^^^^^ sema > ffff81007ebcfcd0: ffffffff880a44b2 5f68655f69736373 > ffff81007ebcfce0: 0000000000000030 000000000000020c > ffff81007ebcfcf0: ffff810009005a40 ffff81014e9d2200 > ffff81007ebcfd00: ffff81014ece4610 ffff81014e9d2200 > ffff81007ebcfd10: 0000000000000332 ffff81014d0f94c0 > ffff81007ebcfd20: ffff81007ebcfdb0 ffffffff880a5add > #5 [ffff81007ebcfd28] qla2x00_issue_iocb at ffffffff880a5add > ffff81007ebcfd30: 00000005000000cf 22004e9d00000054 > ffff81007ebcfd40: 000100007ebcfdb0 ffffffff811b6bdb > ffff81007ebcfd50: 0000000100000000 0000000000000000 > ffff81007ebcfd60: 0000000000000000 ffffffffffffffff > ffff81007ebcfd70: 0000000000000000 0000000000000000 > ffff81007ebcfd80: 0000000000000000 ffff81000000001e > ffff81007ebcfd90: ffff81014e8b8180 ffff81014e9d2200 > ffff81007ebcfda0: ffff81014e8b8180 ffff81014ece4610 > .......................................^^^^^^^^^^^^^^scsi_qla_host_t > ffff81007ebcfdb0: ffff81007ebcfe00 ffffffff880a63e3 > > crash> struct scsi_qla_host ffff81014ece4610|grep host_str > host_str = "qla2xxx_0\000\000\000\000\000\000", > crash> struct scsi_qla_host ffff81014ece4610|grep model_number > model_number = "QMI2472\000\000\000\000\000\000\000\000\000", > > crash> struct scsi_qla_host ffff81014ece4610|grep parent > parent = 0x0, // NO parent. > > > outstanding_cmds = {0x0, 0xffff810148028300, 0xffff81014d0f9cc0, > 0xffff8100633e2dc0, 0xffff8100789abe00, 0xffff81014d0f9700, 0xffff8100789aba00, > 0xffff81014d0f9480, 0xffff8100633e2900, 0xffff81014d0f9e00, 0xffff8100a95916c0, > 0xffff8100a9591640, 0xffff81014e1bc340, 0xffff8101480284c0, 0xffff810148028fc0, > 0xffff8100a9591380, 0xffff81014d0f91c0, 0xffff81014d0f9380, 0xffff8100a9591dc0, > 0xffff81014e1bc2c0, 0xffff81014e1bc200, 0xffff8100345c1d00, 0xffff8100789ab880, > 0xffff8100789abdc0, 0xffff810079056a40, 0xffff8100345c1680, 0xffff8100633e2a80, > 0xffff81014d0f96c0, 0xffff8101480286c0, 0xffff81014d0f9980, 0xffff810148028d00, > 0xffff810148028900, 0xffff81014d0f9280, 0xffff810148028600, 0xffff81014d0f95c0, > 0xffff81014e1bc480, 0xffff8100a95913c0, 0xffff81014d0f9fc0, 0xffff81014d0f9900, > 0xffff8100633e2f80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...}, > current_outstanding_cmd = 39, <= Outstanding Commands. > > crash> struct srb 0xffff810148028300 > struct srb { > ha = 0xffff81014ece4610, // Validation -> pointing back to scsi_qla_host. > Confirmed that we are looking at valid structure. > fcport = 0xffff81014e8b8180, > cmd = 0xffff81014a539940, > flags = 2, > dma_handle = 18446744071579714837, > request_sense_length = 0, > request_sense_ptr = 0x0 > } > > crash> struct scsi_qla_host.mbx_intr_sem ffff81014ece4610 > mbx_intr_sem = { > count = { > counter = 0 > }, > sleepers = 1, > wait = { > lock = { > lock = { > wait_lock = { > raw_lock = { > slock = 0 > }, > break_lock = 0 > }, > wait_list = { > prio_list = { > next = 0xffff81014ece69e0, > prev = 0xffff81014ece69e0 > }, > node_list = { > next = 0xffff81014ece69f0, > prev = 0xffff81014ece69f0 > } > }, > owner = 0x0 > }, > break_lock = 0 > }, > task_list = { > next = 0xffff81007ebcfbe8, > prev = 0xffff81007ebcfbe8 > } > } > }, > > Apparently one this process is sleeping. Not sure if there is any easy way > of finding who is holding this semaphore. Any help on how to find that out? > > > > > (In reply to comment #77) > On Live machine/crash confirmed that scsi_eh_0 is not making progress. > Waiting on the semaphore. This analysis shows how I tracked down that semaphore. > Still need to figure out who is holding that semaphore. > > crash> bt 524 > PID: 524 TASK: ffff81007eb494c0 CPU: 0 COMMAND: "scsi_eh_0" > #0 [ffff81007ebcfb10] schedule at ffffffff8127a86c > #1 [ffff81007ebcfbc8] __compat_down at ffffffff8127c6d3 > #2 [ffff81007ebcfbf8] lock_timer_base at ffffffff81045f7b > #3 [ffff81007ebcfc28] __compat_down_failed at ffffffff8127c329 > #4 [ffff81007ebcfc78] qla2x00_mailbox_command at ffffffff880a44fc > #5 [ffff81007ebcfd28] qla2x00_issue_iocb at ffffffff880a5add > #6 [ffff81007ebcfdb8] qla24xx_abort_command at ffffffff880a63e3 > #7 [ffff81007ebcfe08] qla2xxx_eh_abort at ffffffff8809d127 > #8 [ffff81007ebcfe78] __scsi_try_to_abort_cmd at ffffffff88056f31 > #9 [ffff81007ebcfe88] scsi_error_handler at ffffffff8805870d > #10 [ffff81007ebcff28] kthread at ffffffff8105144f > #11 [ffff81007ebcff48] kernel_thread at ffffffff8100d048 > crash> > crash> bt -f 524 > PID: 524 TASK: ffff81007eb494c0 CPU: 0 COMMAND: "scsi_eh_0" > #0 [ffff81007ebcfb10] schedule at ffffffff8127a86c > ... > #3 [ffff81007ebcfc28] __compat_down_failed at ffffffff8127c329 > ffff81007ebcfc30: ffff81007ff97e90 0000000000000000 > ffff81007ebcfc40: 0000000000000000 000000000000752f > ffff81007ebcfc50: ffff81007eb494c0 0000000000000000 > ffff81007ebcfc60: ffff81007eb494c0 ffff81007ebcfca0 > ffff81007ebcfc70: ffff81014ece69d0 ffffffff880a44fc > "include/linux/semaphore.h" down((struct compat_semaphore *sem) > #4 [ffff81007ebcfc78] qla2x00_mailbox_command at ffffffff880a44fc > ffff81007ebcfc80: 0000000000000000 0000000000000000 > ffff81007ebcfc90: 0000000000000000 0054000100000000 > ffff81007ebcfca0: 0000000000000000 0000000000200200 > ffff81007ebcfcb0: 0000000102ec7256 ffffffff880a42e8 > ffff81007ebcfcc0: ffff81014ece69d0 ffffffff81534700 > ........................^^^^^^^^^^^ sema > ffff81007ebcfcd0: ffffffff880a44b2 5f68655f69736373 > ffff81007ebcfce0: 0000000000000030 000000000000020c > ffff81007ebcfcf0: ffff810009005a40 ffff81014e9d2200 > ffff81007ebcfd00: ffff81014ece4610 ffff81014e9d2200 > ffff81007ebcfd10: 0000000000000332 ffff81014d0f94c0 > ffff81007ebcfd20: ffff81007ebcfdb0 ffffffff880a5add > #5 [ffff81007ebcfd28] qla2x00_issue_iocb at ffffffff880a5add > ffff81007ebcfd30: 00000005000000cf 22004e9d00000054 > ffff81007ebcfd40: 000100007ebcfdb0 ffffffff811b6bdb > ffff81007ebcfd50: 0000000100000000 0000000000000000 > ffff81007ebcfd60: 0000000000000000 ffffffffffffffff > ffff81007ebcfd70: 0000000000000000 0000000000000000 > ffff81007ebcfd80: 0000000000000000 ffff81000000001e > ffff81007ebcfd90: ffff81014e8b8180 ffff81014e9d2200 > ffff81007ebcfda0: ffff81014e8b8180 ffff81014ece4610 > .......................................^^^^^^^^^^^^^^scsi_qla_host_t > ffff81007ebcfdb0: ffff81007ebcfe00 ffffffff880a63e3 > > crash> struct scsi_qla_host ffff81014ece4610|grep host_str > host_str = "qla2xxx_0\000\000\000\000\000\000", > crash> struct scsi_qla_host ffff81014ece4610|grep model_number > model_number = "QMI2472\000\000\000\000\000\000\000\000\000", > > crash> struct scsi_qla_host ffff81014ece4610|grep parent > parent = 0x0, // NO parent. > > > outstanding_cmds = {0x0, 0xffff810148028300, 0xffff81014d0f9cc0, > 0xffff8100633e2dc0, 0xffff8100789abe00, 0xffff81014d0f9700, 0xffff8100789aba00, > 0xffff81014d0f9480, 0xffff8100633e2900, 0xffff81014d0f9e00, 0xffff8100a95916c0, > 0xffff8100a9591640, 0xffff81014e1bc340, 0xffff8101480284c0, 0xffff810148028fc0, > 0xffff8100a9591380, 0xffff81014d0f91c0, 0xffff81014d0f9380, 0xffff8100a9591dc0, > 0xffff81014e1bc2c0, 0xffff81014e1bc200, 0xffff8100345c1d00, 0xffff8100789ab880, > 0xffff8100789abdc0, 0xffff810079056a40, 0xffff8100345c1680, 0xffff8100633e2a80, > 0xffff81014d0f96c0, 0xffff8101480286c0, 0xffff81014d0f9980, 0xffff810148028d00, > 0xffff810148028900, 0xffff81014d0f9280, 0xffff810148028600, 0xffff81014d0f95c0, > 0xffff81014e1bc480, 0xffff8100a95913c0, 0xffff81014d0f9fc0, 0xffff81014d0f9900, > 0xffff8100633e2f80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...}, > current_outstanding_cmd = 39, <= Outstanding Commands. > > crash> struct srb 0xffff810148028300 > struct srb { > ha = 0xffff81014ece4610, // Validation -> pointing back to scsi_qla_host. > Confirmed that we are looking at valid structure. > fcport = 0xffff81014e8b8180, > cmd = 0xffff81014a539940, > flags = 2, > dma_handle = 18446744071579714837, > request_sense_length = 0, > request_sense_ptr = 0x0 > } > > crash> struct scsi_qla_host.mbx_intr_sem ffff81014ece4610 > mbx_intr_sem = { > count = { > counter = 0 > }, > sleepers = 1, > wait = { > lock = { > lock = { > wait_lock = { > raw_lock = { > slock = 0 > }, > break_lock = 0 > }, > wait_list = { > prio_list = { > next = 0xffff81014ece69e0, > prev = 0xffff81014ece69e0 > }, > node_list = { > next = 0xffff81014ece69f0, > prev = 0xffff81014ece69f0 > } > }, > owner = 0x0 > }, > break_lock = 0 > }, > task_list = { > next = 0xffff81007ebcfbe8, > prev = 0xffff81007ebcfbe8 > } > } > }, > > Apparently one this process is sleeping. Not sure if there is any easy way > of finding who is holding this semaphore. Any help on how to find that out? > > > > > forgot to add: crash> struct scsi_qla_host.mbx_intr_sem struct scsi_qla_host { [9152] struct compat_semaphore mbx_intr_sem; } ffff81014ece4610+23C0 = ffff81014ece69d0 -> Semaphore location in the structure. We can see this pointer above on the stack (marked as "sema") crash> struct scsi_qla_host ffff81014ece4610|grep mbx_cmd_flags mbx_cmd_flags = 4, unsigned long mbx_cmd_flags; #define MBX_INTERRUPT 1 #define MBX_INTR_WAIT 2 #define MBX_UPDATE_FLASH_ACTIVE 3 From the stack trace MBX_INTR_WAIT should be set..indicating that we are waiting for the interrupt. But the dump is showing the flag value MBX_UPDATE_FLASH_ACTIVE. If we look at the interrupt handler: if (test_bit(MBX_INTR_WAIT, &ha->mbx_cmd_flags) && (status & MBX_INTERRUPT) && ha->flags.mbox_int) { set_bit(MBX_INTERRUPT, &ha->mbx_cmd_flags); up(&ha->mbx_intr_sem); } Hence interrupt handler wont call up() if this bit is not set to MBX_INTR_WAIT. This explains why the io_done is not being called..and why all IOs are waiting. But we need to figure out how and why this bit is set? Another interesting factor is: Looking at the code, MBX_UPDATE_FLASH_ACTIVE is set in suspend_hba() but if the code goes to that path, interrupts_on should be set to 0. But the dump tells otherwise. crash> struct scsi_qla_host ffff81014ece4610|grep interrupts_on interrupts_on = 1 '\001', So it is puzzeling how the mbx_cmd_flags is set to MBX_UPDATE_FLASH_ACTIVE. Another minor mistake(no problem in the functionality) found in the code: tmp_intr_timer.data = (unsigned long)&ha->mbx_intr_sem; tmp_intr_timer.expires = jiffies + mcp->tov * HZ; tmp_intr_timer.function = (void (*)(unsigned long))qla2x00_mbx_sem_timeout; where mbx_intr_sem is compact_semaphore static void qla2x00_mbx_sem_timeout(unsigned long data) { struct semaphore *sem_ptr = (struct semaphore *)data; this data is typecasted to (struct semaphore) Today's Progress: crash> struct scsi_qla_host.mbx_intr_sem ffff81007d2a8610 mbx_intr_sem = { count = { counter = 0 <<< counter should be < 0 if anyone is waiting. }, sleepers = 1, <<< Indicates that one process is waiting. <<< This is inconsistent. It means, either we missed the wakeup or a wakeup is <<<< missed...or some kind of race between down() and up() wait = { lock = { lock = { wait_lock = { raw_lock = { slock = 0 }, break_lock = 0 }, wait_list = { prio_list = { next = 0xffff81007d2aa9e0, prev = 0xffff81007d2aa9e0 }, node_list = { next = 0xffff81007d2aa9f0, prev = 0xffff81007d2aa9f0 } }, owner = 0x0 }, break_lock = 0 }, task_list = { next = 0xffff81014d021be8, prev = 0xffff81014d021be8 } } }, crash> struct __wait_queue.task_list struct __wait_queue { [24] struct list_head task_list; } 0xffff81014d021be8 - 0x18 crash> struct __wait_queue ffff81014d021bd0 struct __wait_queue { flags = 1, private = 0xffff81014d6423c0, <<< This is the task_struct of the thread waiting on the semaphore. So the thread in question , scsi_eh_1 func = 0xffffffff81034f27 <default_wake_function>, task_list = { next = 0xffff81007d2aaa10, prev = 0xffff81007d2aaa10 } } PID: 544 TASK: ffff81014d6423c0 CPU: 3 COMMAND: "scsi_eh_1" #0 [ffff81014d021b20] schedule at ffffffff8129f4fe #1 [ffff81014d021bc8] __compat_down at ffffffff812a14c6 #2 [ffff81014d021c28] __compat_down_failed at ffffffff812a108d #3 [ffff81014d021c78] qla2x00_mailbox_command at ffffffff880a56ae #4 [ffff81014d021d28] qla2x00_issue_iocb at ffffffff880a6d11 #5 [ffff81014d021db8] qla24xx_abort_command at ffffffff880a7644 #6 [ffff81014d021e08] qla2xxx_eh_abort at ffffffff8809e1e0 #7 [ffff81014d021e78] __scsi_try_to_abort_cmd at ffffffff88058077 #8 [ffff81014d021e88] scsi_error_handler at ffffffff880598da #9 [ffff81014d021f28] kthread at ffffffff81053ee5 #10 [ffff81014d021f48] kernel_thread at ffffffff8100d418 (In reply to comment #79) > crash> struct scsi_qla_host ffff81014ece4610|grep mbx_cmd_flags > mbx_cmd_flags = 4, > > unsigned long mbx_cmd_flags; > #define MBX_INTERRUPT 1 > #define MBX_INTR_WAIT 2 > #define MBX_UPDATE_FLASH_ACTIVE 3 > > From the stack trace MBX_INTR_WAIT should be set..indicating that we are waiting > for the interrupt. But the dump is showing the flag value MBX_UPDATE_FLASH_ACTIVE. > > If we look at the interrupt handler: > > if (test_bit(MBX_INTR_WAIT, &ha->mbx_cmd_flags) && > (status & MBX_INTERRUPT) && ha->flags.mbox_int) { > set_bit(MBX_INTERRUPT, &ha->mbx_cmd_flags); > up(&ha->mbx_intr_sem); > } > Hence interrupt handler wont call up() if this bit is not set to MBX_INTR_WAIT. > This explains why the io_done is not being called..and why all IOs are waiting. > > But we need to figure out how and why this bit is set? > > Another interesting factor is: > > Looking at the code, MBX_UPDATE_FLASH_ACTIVE is set in suspend_hba() but if the > code goes to that path, interrupts_on should be set to 0. But the dump tells > otherwise. > crash> struct scsi_qla_host ffff81014ece4610|grep interrupts_on > interrupts_on = 1 '\001', > > So it is puzzeling how the mbx_cmd_flags is set to MBX_UPDATE_FLASH_ACTIVE. > > Another minor mistake(no problem in the functionality) found in the code: > > tmp_intr_timer.data = (unsigned long)&ha->mbx_intr_sem; > tmp_intr_timer.expires = jiffies + mcp->tov * HZ; > tmp_intr_timer.function = > (void (*)(unsigned long))qla2x00_mbx_sem_timeout; > > where mbx_intr_sem is compact_semaphore > static void > qla2x00_mbx_sem_timeout(unsigned long data) > { > struct semaphore *sem_ptr = (struct semaphore *)data; > > this data is typecasted to (struct semaphore) > As i mentioned in this comment, as minor issue is appearing to be THE issue. because of the rt switch in include/linux/rt_lock.h and PICK_SEM_OP() picks the one depending on the type of the semaphore passed. #define up(sem) PICK_SEM_OP(compat_up, rt_up, sem) So the fix is to set the typecast correctly to compat_semaphore. Changed and tests are running from morning. Rostedt, gave a patch with magic check to find out hidden mistakes like this. I will be testing with that patch tomorrow. Created attachment 316488 [details]
patch to fix compat semaphore in qla2x00 driver
The ql2x00 driver passes the semaphore to a timeout routine via a pointer. The semaphore that is passed is really an compat_semaphore, but the callback function typecasts it to a semaphore. The up() logic calls the rt code if the type is a semaphore, and calls the old semaphore code is it is a compat_semaphore. Thus, we were calling the wrong function.
This patch is a one liner that simply makes the type of semaphore match the original one being passed in.
This should solve the issue.
Patch added to 2.6.24.7-80.el5rt. Created attachment 316618 [details]
patch to catch misuses of rtmutexs
This patch adds "magic" numbers to the rtmutexs and these are checked everytime they are used. If a compat_semaphore or raw_spinlock is called to these routines, than a warning will be printed.
patch to fix compat semaphore in qla2x00 driver was found in mrg-rt.git as commit b95c0db0f90e16c458100256f3c4f0daecb6d4eb. patch to catch misuses of rtmutexs found in mrg-rt.git as commit 12538e0ad48ca5cc5f861863b178c03b7d4d28f5. Have not managed to find suitable RHTS test system yet with the right driver available. Only code review done. Both patches are found in mrg-rt-2.6.24.7-81 Tried to run the "online/offline" test (https://bugzilla.redhat.com/attachment.cgi?id=315191) on veritas2.rhts.bos.redhat.com. Ran the test with 250 iterations on 2.6.24.7-74rt, without triggering anything close to a kernel crash. Ran the same test on 2.6.24.7-81rt, with the same results. Well, I got one result ... root filesystem got screwed up quite well and had to reinstall OS between the runs ... but that's not so surprisingly, considering what the test actually does ... and it does it quite brutally. Not sure if I've found a setup which is close enough to reproduce this bug. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0857.html I've verified the compat_sem fix is in -85. |