Bug 237635 - kernel panic on cciss when running hpacucli
kernel panic on cciss when running hpacucli
Status: CLOSED DUPLICATE of bug 196628
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Tom Coughlan
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-04-24 06:46 EDT by Bryn M. Reeves
Modified: 2007-11-16 20:14 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-06-18 03:51:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
minimal fix for hpacucli oops (482 bytes, patch)
2007-04-24 06:55 EDT, Bryn M. Reeves
no flags Details | Diff
fix for the hpacucli oops including code reorgs (2.28 KB, patch)
2007-04-24 06:59 EDT, Bryn M. Reeves
no flags Details | Diff

  None (edit)
Description Bryn M. Reeves 2007-04-24 06:46:09 EDT
Description of problem:
A problem exists in the cciss driver that can lead to an oops when the hpacucli
tool is run. cciss provides an ioctl to re-query the controller's logical drive
configuration. This mimics the code in cciss_init_one (called at init time)
except that it omits a call to blk_queue_softirq_done, leaving the queue
partially initialised. This then triggers an oops in elv_next_request when I/O
passes through the queue:

Unable to handle kernel NULL pointer dereference at virtual address 00000008
printing eip:
*pde = 3415d001
Oops: 0000 [#1]
SMP
Modules linked in: netconsole netdump dlm(U) cman(U) dm_mirror dm_mod button
battery ac uhci_hcd ehci_hcd hw_random tg3 bonding(U) ext3 jbd cciss sd_mod scsi_mod
CPU:    1
EIP:    0060:[<c022bbf2>]    Not tainted VLI
EFLAGS: 00010046   (2.6.9-42.0.10.ELsmp)
EIP is at cfq_next_request+0x7/0x35
eax: f7dd0028   ebx: 00000000   ecx: 00000007   edx: f7dd0028
esi: f7dd0028   edi: 00000000   ebp: f7e4cd80   esp: c03efcf8
ds: 007b   es: 007b   ss: 0068
Process hotplug (pid: 22022, threadinfo=c03ef000 task=f573f930)
Stack: f7dd0028 f7dd0028 c0222ed4 f7dd0028 00000005 00000000 f8864a46 00000000
      00000000 00000000 00000000 00000000 f7e4cd80 f7dd0028 00000000 00000000
      00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace:
[<c0222ed4>] elv_next_request+0xbe/0xce
[<f8864a46>] do_cciss_request+0x33/0x2eb [cciss]
[<c011de3f>] find_busiest_group+0xdd/0x295
[<c011cd26>] recalc_task_prio+0x128/0x133
[<c011cdb9>] activate_task+0x88/0x95
[<c011d2e4>] try_to_wake_up+0x28e/0x299
[<c011d2e4>] try_to_wake_up+0x28e/0x299
[<c0120502>] autoremove_wake_function+0xd/0x2d
[<c011e7d6>] __wake_up_common+0x36/0x51
[<c011e81a>] __wake_up+0x29/0x3c
[<c014585a>] test_clear_page_writeback+0x84/0xb6
[<c014314c>] mempool_free+0x60/0x64
[<c0224902>] freed_request+0x23/0x72
[<c02245e1>] blk_start_queue+0x23/0x41
[<f886513b>] do_cciss_intr+0x43d/0x4b7 [cciss]
[<c01074ce>] handle_IRQ_event+0x25/0x4f
[<c0107a2e>] do_IRQ+0x11c/0x1ae
=======================
[<c02d52dc>] common_interrupt+0x18/0x20
[<c012ce32>] sigprocmask+0xb0/0xca
[<c012cee4>] sys_rt_sigprocmask+0x98/0x145
[<c02d4903>] syscall_call+0x7/0xb
Code: 01 00 00 00 89 e8 eb b8 8b 04 24 3b 46 24 0f 92 c0 31 d2 85 ff 0f 95 c2 85
d0 75 a0 89 c8 5e 5b 5e 5f 5d c3 56 89 c6 53 8b 58 4c <8b> 43 08 39 00 74 17 8b
43 08 8b 18 8b 53 40 85 d2 74 07 89
f0

Upstream fixed this in:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ca1e0484d9fe8a9048ac32b0f9894545f43704e8

Version-Release number of selected component (if applicable):
2.6.9-42.0.10.EL, 2.6.9-48.EL, ...

How reproducible:
Somtimes; running hpacucli in a loop is reported to reliably reproduce this oops.


Steps to Reproduce:
1. Run hpacucli in a tight loop
  
Actual results:
After some time, the above oops.

Expected results:
No oops.
Comment 2 Bryn M. Reeves 2007-04-24 06:55:35 EDT
Created attachment 153342 [details]
minimal fix for hpacucli oops

This patch just takes the one-line fix from the upstream commit which also
included a bunch of non-functional changes (code reorganisation).
Comment 3 Bryn M. Reeves 2007-04-24 06:59:07 EDT
Created attachment 153343 [details]
fix for the hpacucli oops including code reorgs

This is the same fix as the patch in comment #2 but also includes the code
reoganisations; might make future backports easier to include this.
Comment 4 Bryn M. Reeves 2007-04-24 13:30:34 EDT
Steps to reproduce:

Running the following commands in a loop is reported to reliably trigger this
problem on a system with more than one logical disk configured in the same CCISS
array:

hpacucli controller ch='EuroMSA500' show
hpacucli controller ch='EuroMSA500' show status
hpacucli controller ch='EuroMSA500' logicaldrive all show
hpacucli controller ch='EuroMSA500' logicaldrive all show status
hpacucli controller ch='EuroMSA500' physicaldrive all show
hpacucli controller ch='EuroMSA500' physicaldrive all show status
hpacucli controller ch='EuroMSA500' array all show
hpacucli controller ch='EuroMSA500' array all show status
Comment 5 RHEL Product and Program Management 2007-05-09 00:38:47 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 7 Issue Tracker 2007-06-18 03:08:47 EDT
Excellent, customer is happy with the new kernel

Internal Status set to 'Resolved'
Status set to: Closed by Client
Resolution set to: 'RHEL 4.5'

This event sent from IssueTracker by marco 
 issue 118946
Comment 8 Bryn M. Reeves 2007-06-18 03:51:10 EDT

*** This bug has been marked as a duplicate of 196628 ***

Note You need to log in before you can comment on or make changes to this bug.