Bug 237635

Summary: kernel panic on cciss when running hpacucli
Product: Red Hat Enterprise Linux 4 Reporter: Bryn M. Reeves <bmr>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED DUPLICATE QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-06-18 07:51:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
minimal fix for hpacucli oops
none
fix for the hpacucli oops including code reorgs none

Description Bryn M. Reeves 2007-04-24 10:46:09 UTC
Description of problem:
A problem exists in the cciss driver that can lead to an oops when the hpacucli
tool is run. cciss provides an ioctl to re-query the controller's logical drive
configuration. This mimics the code in cciss_init_one (called at init time)
except that it omits a call to blk_queue_softirq_done, leaving the queue
partially initialised. This then triggers an oops in elv_next_request when I/O
passes through the queue:

Unable to handle kernel NULL pointer dereference at virtual address 00000008
printing eip:
*pde = 3415d001
Oops: 0000 [#1]
SMP
Modules linked in: netconsole netdump dlm(U) cman(U) dm_mirror dm_mod button
battery ac uhci_hcd ehci_hcd hw_random tg3 bonding(U) ext3 jbd cciss sd_mod scsi_mod
CPU:    1
EIP:    0060:[<c022bbf2>]    Not tainted VLI
EFLAGS: 00010046   (2.6.9-42.0.10.ELsmp)
EIP is at cfq_next_request+0x7/0x35
eax: f7dd0028   ebx: 00000000   ecx: 00000007   edx: f7dd0028
esi: f7dd0028   edi: 00000000   ebp: f7e4cd80   esp: c03efcf8
ds: 007b   es: 007b   ss: 0068
Process hotplug (pid: 22022, threadinfo=c03ef000 task=f573f930)
Stack: f7dd0028 f7dd0028 c0222ed4 f7dd0028 00000005 00000000 f8864a46 00000000
      00000000 00000000 00000000 00000000 f7e4cd80 f7dd0028 00000000 00000000
      00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace:
[<c0222ed4>] elv_next_request+0xbe/0xce
[<f8864a46>] do_cciss_request+0x33/0x2eb [cciss]
[<c011de3f>] find_busiest_group+0xdd/0x295
[<c011cd26>] recalc_task_prio+0x128/0x133
[<c011cdb9>] activate_task+0x88/0x95
[<c011d2e4>] try_to_wake_up+0x28e/0x299
[<c011d2e4>] try_to_wake_up+0x28e/0x299
[<c0120502>] autoremove_wake_function+0xd/0x2d
[<c011e7d6>] __wake_up_common+0x36/0x51
[<c011e81a>] __wake_up+0x29/0x3c
[<c014585a>] test_clear_page_writeback+0x84/0xb6
[<c014314c>] mempool_free+0x60/0x64
[<c0224902>] freed_request+0x23/0x72
[<c02245e1>] blk_start_queue+0x23/0x41
[<f886513b>] do_cciss_intr+0x43d/0x4b7 [cciss]
[<c01074ce>] handle_IRQ_event+0x25/0x4f
[<c0107a2e>] do_IRQ+0x11c/0x1ae
=======================
[<c02d52dc>] common_interrupt+0x18/0x20
[<c012ce32>] sigprocmask+0xb0/0xca
[<c012cee4>] sys_rt_sigprocmask+0x98/0x145
[<c02d4903>] syscall_call+0x7/0xb
Code: 01 00 00 00 89 e8 eb b8 8b 04 24 3b 46 24 0f 92 c0 31 d2 85 ff 0f 95 c2 85
d0 75 a0 89 c8 5e 5b 5e 5f 5d c3 56 89 c6 53 8b 58 4c <8b> 43 08 39 00 74 17 8b
43 08 8b 18 8b 53 40 85 d2 74 07 89
f0

Upstream fixed this in:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ca1e0484d9fe8a9048ac32b0f9894545f43704e8

Version-Release number of selected component (if applicable):
2.6.9-42.0.10.EL, 2.6.9-48.EL, ...

How reproducible:
Somtimes; running hpacucli in a loop is reported to reliably reproduce this oops.


Steps to Reproduce:
1. Run hpacucli in a tight loop
  
Actual results:
After some time, the above oops.

Expected results:
No oops.

Comment 2 Bryn M. Reeves 2007-04-24 10:55:35 UTC
Created attachment 153342 [details]
minimal fix for hpacucli oops

This patch just takes the one-line fix from the upstream commit which also
included a bunch of non-functional changes (code reorganisation).

Comment 3 Bryn M. Reeves 2007-04-24 10:59:07 UTC
Created attachment 153343 [details]
fix for the hpacucli oops including code reorgs

This is the same fix as the patch in comment #2 but also includes the code
reoganisations; might make future backports easier to include this.

Comment 4 Bryn M. Reeves 2007-04-24 17:30:34 UTC
Steps to reproduce:

Running the following commands in a loop is reported to reliably trigger this
problem on a system with more than one logical disk configured in the same CCISS
array:

hpacucli controller ch='EuroMSA500' show
hpacucli controller ch='EuroMSA500' show status
hpacucli controller ch='EuroMSA500' logicaldrive all show
hpacucli controller ch='EuroMSA500' logicaldrive all show status
hpacucli controller ch='EuroMSA500' physicaldrive all show
hpacucli controller ch='EuroMSA500' physicaldrive all show status
hpacucli controller ch='EuroMSA500' array all show
hpacucli controller ch='EuroMSA500' array all show status


Comment 5 RHEL Program Management 2007-05-09 04:38:47 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Issue Tracker 2007-06-18 07:08:47 UTC
Excellent, customer is happy with the new kernel

Internal Status set to 'Resolved'
Status set to: Closed by Client
Resolution set to: 'RHEL 4.5'

This event sent from IssueTracker by marco 
 issue 118946

Comment 8 Bryn M. Reeves 2007-06-18 07:51:10 UTC

*** This bug has been marked as a duplicate of 196628 ***