Description of problem: The CCISS module has a panic panic in do_cciss_intr. CPU: 0 EIP: 0060:[<f8855437>] Tainted: P VLI EFLAGS: 00010087 (2.6.9-34.0.2.ELsmp) EIP is at do_cciss_intr+0xdc/0x4b4 [cciss] eax: 00000000 ebx: 00000004 ecx: 00000004 edx: 00000000 esi: f7400000 edi: 00000000 ebp: c3765800 esp: c03eafbc ds: 007b es: 007b ss: 0068 Process kjournald (pid: 1629, threadinfo=c03ea000 task=c37fedb0) Stack: 00000000 00000001 00000001 00000082 f7dd4800 00000001 00000000 c37a4ab8 c0107472 c37a4a9c c03ea000 c0387900 c37a4000 c01079d2 00000032 c37a4ab8 f7dd4800 Call Trace: [<c0107472>] handle_IRQ_event+0x25/0x4f [<c01079d2>] do_IRQ+0x11c/0x1ae ======================= [<c02d304c>] common_interrupt+0x18/0x20 [<f885510e>] do_cciss_request+0x9e/0x2eb [cciss] [<c0142742>] mempool_alloc+0x7b/0x135 [<c0120291>] autoremove_wake_function+0x0/0x2d [<c0142742>] mempool_alloc+0x7b/0x135 [<c0120291>] autoremove_wake_function+0x0/0x2d [<c022a6ce>] __cfq_get_queue+0x91/0xf6 [<c0120291>] autoremove_wake_function+0x0/0x2d [<c022a763>] cfq_get_queue+0x30/0x37 [<c022aa13>] cfq_set_request+0x33/0x6b [<c022a9e0>] cfq_set_request+0x0/0x6b [<c0223557>] get_request+0x1de/0x1e8 [<c012026d>] finish_wait+0x2c/0x50 [<c0222b9a>] ll_back_merge_fn+0x175/0x1de [<c022174b>] elv_merged_request+0x9/0xa [<c0224174>] __make_request+0x452/0x46c [<c014285c>] mempool_free+0x60/0x64 [<c022a55a>] cfq_dispatch_requests+0x55/0x80 [<c022a5a6>] cfq_next_request+0x21/0x35 [<c0222fa0>] __generic_unplug_device+0x2b/0x2d [<c0222fb7>] generic_unplug_device+0x15/0x21 [<c0222fd2>] blk_backing_dev_unplug+0xf/0x10 [<c015b3d9>] sync_buffer+0x2c/0x2d [<c015b4d7>] __wait_on_buffer+0x67/0x83 [<c015b384>] bh_wake_function+0x0/0x29 [<c015e199>] submit_bh+0x15a/0x166 [<c015b384>] bh_wake_function+0x0/0x29 [<f8863ac2>] journal_commit_transaction+0x8a7/0xfc1 [jbd] [<c0120291>] autoremove_wake_function+0x0/0x2d [<c0120291>] autoremove_wake_function+0x0/0x2d [<c011dcf7>] find_busiest_group+0xdd/0x2ba [<c011e115>] load_balance_newidle+0x56/0x82 [<c02d05c1>] schedule+0x83d/0x8d3 [<c02d05f1>] schedule+0x86d/0x8d3 [<c0129d4a>] del_timer_sync+0x7a/0x9c [<f8865e8d>] kjournald+0xc7/0x219 [jbd] [<c0120291>] autoremove_wake_function+0x0/0x2d [<c0120291>] autoremove_wake_function+0x0/0x2d [<c011d549>] schedule_tail+0x31/0xa7 [<f8865dc0>] commit_timeout+0x0/0x5 [jbd] [<f8865dc6>] kjournald+0x0/0x219 [jbd] [<c01041f5>] kernel_thread_helper+0x5/0xb Code: 95 30 03 00 00 74 38 8b 86 3c 02 00 00 39 f0 74 2e 39 b5 30 03 00 00 75 06 89 85 30 03 00 00 8b 86 38 02 00 00 8b 96 3c 02 00 00 <89> 90 3c 02 00 00 8b 96 3c 02 00 00 89 82 38 02 00 00 eb 06 c7 After investigating, it looks like c->prev is NULL. Version-Release number of selected component (if applicable): 2.6.9-34.0.2 How reproducible: unknown It seems that the the removeQ in cciss.c is having the problem. It doesn't look like this has changed in more recent EL4 kernels however, a http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8a3173de;hp=7c0990c7ee988aa193abbb7da3faeb9279146dbf mentions that detect the spurious case of a command attempted being removed from a queue it doesn't belong to. I think that the problem I'm seeing is due to this being the case.
Does RH need HP to port that change into rhel4.9?
(In reply to comment #2) > Does RH need HP to port that change into rhel4.9? I'm not sure if it is still possible for this to go into rhel4.8, but yes please port it into rhel4.8.
Created attachment 362097 [details] backport The patch is backported from upstream and not so complicated, I think we can take it for 4.9.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Posted today.
Committed in 89.37.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
RHEL4 don't support kdump. Netdump for ccissp was verified at https://beaker.engineering.redhat.com/recipes/74648 Code reviewed. Patch linux-2.6.9-cciss-switch-to-using-hlist-to-fix-panic.patch was applied into kernel-2.6.9-95.EL Sanity only.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html