Bug 479090 - Panic in do_cciss_intr removeQ
Summary: Panic in do_cciss_intr removeQ
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.9
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Tomas Henzl
QA Contact: Storage QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-01-07 01:45 UTC by Wade Mealing
Modified: 2018-10-20 01:28 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-16 15:36:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
backport (5.73 KB, patch)
2009-09-22 14:50 UTC, Tomas Henzl
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0263 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.9 kernel security and bug fix update 2011-02-16 15:14:55 UTC

Description Wade Mealing 2009-01-07 01:45:05 UTC
Description of problem:

The CCISS module has a panic panic in do_cciss_intr.

CPU:    0
EIP:    0060:[<f8855437>]    Tainted: P      VLI
EFLAGS: 00010087   (2.6.9-34.0.2.ELsmp)
EIP is at do_cciss_intr+0xdc/0x4b4 [cciss]
eax: 00000000   ebx: 00000004   ecx: 00000004   edx: 00000000
esi: f7400000   edi: 00000000   ebp: c3765800   esp: c03eafbc
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 1629, threadinfo=c03ea000 task=c37fedb0)
Stack: 00000000 00000001 00000001 00000082 f7dd4800 00000001 00000000 c37a4ab8
      c0107472 c37a4a9c c03ea000 c0387900 c37a4000 c01079d2 00000032 c37a4ab8
      f7dd4800
Call Trace:
[<c0107472>] handle_IRQ_event+0x25/0x4f
[<c01079d2>] do_IRQ+0x11c/0x1ae
=======================
[<c02d304c>] common_interrupt+0x18/0x20
[<f885510e>] do_cciss_request+0x9e/0x2eb [cciss]
[<c0142742>] mempool_alloc+0x7b/0x135
[<c0120291>] autoremove_wake_function+0x0/0x2d
[<c0142742>] mempool_alloc+0x7b/0x135
[<c0120291>] autoremove_wake_function+0x0/0x2d
[<c022a6ce>] __cfq_get_queue+0x91/0xf6
[<c0120291>] autoremove_wake_function+0x0/0x2d
[<c022a763>] cfq_get_queue+0x30/0x37
[<c022aa13>] cfq_set_request+0x33/0x6b
[<c022a9e0>] cfq_set_request+0x0/0x6b
[<c0223557>] get_request+0x1de/0x1e8
[<c012026d>] finish_wait+0x2c/0x50
[<c0222b9a>] ll_back_merge_fn+0x175/0x1de
[<c022174b>] elv_merged_request+0x9/0xa
[<c0224174>] __make_request+0x452/0x46c
[<c014285c>] mempool_free+0x60/0x64
[<c022a55a>] cfq_dispatch_requests+0x55/0x80
[<c022a5a6>] cfq_next_request+0x21/0x35
[<c0222fa0>] __generic_unplug_device+0x2b/0x2d
[<c0222fb7>] generic_unplug_device+0x15/0x21
[<c0222fd2>] blk_backing_dev_unplug+0xf/0x10
[<c015b3d9>] sync_buffer+0x2c/0x2d
[<c015b4d7>] __wait_on_buffer+0x67/0x83
[<c015b384>] bh_wake_function+0x0/0x29
[<c015e199>] submit_bh+0x15a/0x166
[<c015b384>] bh_wake_function+0x0/0x29
[<f8863ac2>] journal_commit_transaction+0x8a7/0xfc1 [jbd]
[<c0120291>] autoremove_wake_function+0x0/0x2d
[<c0120291>] autoremove_wake_function+0x0/0x2d
[<c011dcf7>] find_busiest_group+0xdd/0x2ba
[<c011e115>] load_balance_newidle+0x56/0x82
[<c02d05c1>] schedule+0x83d/0x8d3
[<c02d05f1>] schedule+0x86d/0x8d3
[<c0129d4a>] del_timer_sync+0x7a/0x9c
[<f8865e8d>] kjournald+0xc7/0x219 [jbd]
[<c0120291>] autoremove_wake_function+0x0/0x2d
[<c0120291>] autoremove_wake_function+0x0/0x2d
[<c011d549>] schedule_tail+0x31/0xa7
[<f8865dc0>] commit_timeout+0x0/0x5 [jbd]
[<f8865dc6>] kjournald+0x0/0x219 [jbd]
[<c01041f5>] kernel_thread_helper+0x5/0xb
Code: 95 30 03 00 00 74 38 8b 86 3c 02 00 00 39 f0 74 2e 39 b5 30 03 00 00 75
06
89 85 30 03 00 00 8b 86 38 02 00 00 8b 96 3c 02 00 00 <89> 90 3c 02 00 00 8b
96
3c 02 00 00 89 82 38 02 00 00 eb 06 c7


After investigating, it looks like c->prev is NULL.

Version-Release number of selected component (if applicable):


2.6.9-34.0.2 

How reproducible:

unknown

It seems that the the removeQ in cciss.c is having the problem.  It doesn't look like this has changed in more recent EL4 kernels however, a http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8a3173de;hp=7c0990c7ee988aa193abbb7da3faeb9279146dbf mentions that detect the spurious case of a command attempted being removed from a queue it doesn't belong to.

I think that the problem I'm seeing is due to this being the case.

Comment 2 Mike Miller (OS Dev) 2009-02-11 15:18:27 UTC
Does RH need HP to port that change into rhel4.9?

Comment 3 Tomas Henzl 2009-02-11 15:50:44 UTC
(In reply to comment #2)
> Does RH need HP to port that change into rhel4.9?

I'm not sure if it is still possible for this to go into rhel4.8, but yes please port it into rhel4.8.

Comment 5 Tomas Henzl 2009-09-22 14:50:16 UTC
Created attachment 362097 [details]
backport

The patch is backported from upstream and not so complicated, I think we can take it for 4.9.

Comment 6 RHEL Program Management 2009-09-22 15:02:57 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Tomas Henzl 2009-10-15 14:48:33 UTC
Posted today.

Comment 12 Vivek Goyal 2010-09-23 13:01:16 UTC
Committed in 89.37.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 17 Gris Ge 2011-01-25 06:46:55 UTC
RHEL4 don't support kdump.

Netdump for ccissp was verified at https://beaker.engineering.redhat.com/recipes/74648

Code reviewed. Patch linux-2.6.9-cciss-switch-to-using-hlist-to-fix-panic.patch was applied into kernel-2.6.9-95.EL

Sanity only.

Comment 18 errata-xmlrpc 2011-02-16 15:36:04 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html


Note You need to log in before you can comment on or make changes to this bug.