Bug 240735 - Kernel Oops in aacraid loading SteelEye Lifekeeper
Summary: Kernel Oops in aacraid loading SteelEye Lifekeeper
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Chip Coldwell
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-05-21 10:59 UTC by Richard Rudd
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version: RHBA-2007-0959
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-07 19:49:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Upstream commit fixing problem (3.70 KB, patch)
2007-05-22 19:36 UTC, James Bottomley
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0959 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5 Update 1 2007-11-08 00:47:37 UTC

Description Richard Rudd 2007-05-21 10:59:06 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)

Description of problem:
Starting SteelEye Lifekeeper on a Redhat Enterprise 5 server causes kernel Oops.  SteelEye technical support have looked into the kernel crash trace and identifed the issue is with common calls made to the aacraid driver.
LifeKeeper version is 6.1.2-12.

Version-Release number of selected component (if applicable):
kernel-2.6.18-8.1.4.el5

How reproducible:
Always


Steps to Reproduce:
1. Install SteelEye Lifekeeper
2. Run /opt/LifeKeeper/bin/lkstart

Actual Results:
Kernel Oops and system stops responding.

Expected Results:
LifeKeeper should start and read information about attached disks.

Additional info:
LifeKeeper is starting to initialize at Mon May 21 11:09:39 BST 2007
Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
 [<ffffffff800862be>] task_rq_lock+0x26/0x6f
PGD 4a9370067 PUD 4a9371067 PMD 0 
Oops: 0000 [1] SMP 
last sysfs file: /class/scsi_host/host3/proc_name
CPU 2 
Modules linked in: mptctl mptbase autofs4 ipv6 video sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport shpchp bnx2 ide_cd serio_raw i2c_i801 sg cdrom pcspkr i2c_core dm_snapshot dm_zero dm_mirror dm_mod qla2400(U) qla2xxx(U) qla2xxx_conf(U) intermodule(U) ata_piix libata aacraid sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 0, comm: swapper Tainted: GF     2.6.18-8.1.4.el5 #1
RIP: 0010:[<ffffffff800862be>]  [<ffffffff800862be>] task_rq_lock+0x26/0x6f
RSP: 0018:ffff8104bfd1fe60  EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffffffff803f9400 RCX: ffff81049eadbb48
RDX: 0000000000000000 RSI: ffff8104bfd1fee8 RDI: ffff8104a2d7e7e0
RBP: ffff8104bfd1fe80 R08: 000000000005bea0 R09: ffff810091665000
R10: ffffffff80392180 R11: ffff8104be49a000 R12: ffffffff803f9400
R13: ffff8104bfd1fee8 R14: ffff8104a2d7e7e0 R15: ffffffff803b4220
FS:  0000000000000000(0000) GS:ffff8104bfcd2e40(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 00000004a8ddf000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff8104bfd18000, task ffff8104bfcd3080)
Stack:  000000000000000f ffffffff80092e3a ffff8104a2d7e7e0 0000000000000200
 ffff8104bfd1ff20 ffffffff80044951 ffffffff80392180 0000000000000001
 0000000000000000 0000000000000001 0000000000000000 000000000000373e
Call Trace:
 <IRQ>  [<ffffffff80092e3a>] process_timeout+0x0/0x5
 [<ffffffff80044951>] try_to_wake_up+0x27/0x418
 [<ffffffff80092e3a>] process_timeout+0x0/0x5
 [<ffffffff80092c4a>] run_timer_softirq+0x133/0x1b0
 [<ffffffff80011c19>] __do_softirq+0x5e/0xd5
 [<ffffffff8005c330>] call_softirq+0x1c/0x28
 [<ffffffff8006a312>] do_softirq+0x2c/0x85
 [<ffffffff80054f2e>] mwait_idle+0x0/0x4a
 [<ffffffff8005bcc2>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80054f64>] mwait_idle+0x36/0x4a
 [<ffffffff80046fb7>] cpu_idle+0x95/0xb8
 [<ffffffff80073bb7>] start_secondary+0x45a/0x469


Code: 8b 40 18 48 8b 04 c5 c0 19 3b 80 4c 03 60 08 4c 89 e7 e8 c0 
RIP  [<ffffffff800862be>] task_rq_lock+0x26/0x6f
 RSP <ffff8104bfd1fe60>
CR2: 0000000000000018
 <0>Kernel panic - not syncing: Fatal exception
Unable to handle kernel paging request at ffffffff82a00000 RIP: 
 [<ffffffff880b1dd0>] :aacraid:aac_internal_transfer+0x9b/0x9e
PGD 203067 PUD 205063 PMD 0 
Oops: 0000 [2] SMP 
last sysfs file: /class/scsi_host/host3/proc_name
CPU 0 
Modules linked in: mptctl mptbase autofs4 ipv6 video sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport shpchp bnx2 ide_cd serio_raw i2c_i801 sg cdrom pcspkr i2c_core dm_snapshot dm_zero dm_mirror dm_mod qla2400(U) qla2xxx(U) qla2xxx_conf(U) intermodule(U) ata_piix libata aacraid sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 0, comm: swapper Tainted: GF     2.6.18-8.1.4.el5 #1
RIP: 0010:[<ffffffff880b1dd0>]  [<ffffffff880b1dd0>] :aacraid:aac_internal_transfer+0x9b/0x9e
RSP: 0018:ffffffff80402ea0  EFLAGS: 00010083
RAX: 0000000000000008 RBX: ffff8104a2b280c0 RCX: 00000000fda02ea0
RDX: 0000000000000008 RSI: ffffffff82a00000 RDI: ffff8104a4f26168
RBP: ffff8104be012780 R08: ffff8104a2929000 R09: ffff8100010004a0
R10: 0000000000000010 R11: ffff8104a6182cc0 R12: ffff8104be012780
R13: ffff8104be6d4cf8 R14: ffffffff803bbee8 R15: ffffffff803bbee8
FS:  0000000000000000(0000) GS:ffffffff8038a000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffffff82a00000 CR3: 00000004a669a000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff803ba000, task ffffffff802d1ae0)
Stack:  ffffffff880b2687 315f726f7272694d 2020202020202020 ffff8104be0f3068
 ffff810037e27800 ffff8104be6d4cf8 ffffffff880b6f23 000000000000013c
 ffff8104be6d4cf8 0000000000000000 00000000000000a9 ffffffff803bbee8
Call Trace:
 <IRQ>  [<ffffffff880b2687>] :aacraid:get_container_name_callback+0x8b/0xb5
 [<ffffffff880b6f23>] :aacraid:aac_intr_normal+0x1b3/0x1f9
 [<ffffffff880b7fc3>] :aacraid:aac_rkt_intr+0x37/0x115
 [<ffffffff80010705>] handle_IRQ_event+0x29/0x58
 [<ffffffff800b2fe2>] __do_IRQ+0xa4/0x105
 [<ffffffff8006a195>] do_IRQ+0xe7/0xf5
 [<ffffffff80054f2e>] mwait_idle+0x0/0x4a
 [<ffffffff8005b649>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff80054f64>] mwait_idle+0x36/0x4a
 [<ffffffff80046fb7>] cpu_idle+0x95/0xb8
 [<ffffffff803c57f6>] start_kernel+0x220/0x225
 [<ffffffff803c5237>] _sinittext+0x237/0x23e


Code: f3 a4 c3 41 55 41 54 55 48 89 fd 53 48 89 f3 48 83 ec 08 48 
RIP  [<ffffffff880b1dd0>] :aacraid:aac_internal_transfer+0x9b/0x9e
 RSP <ffffffff80402ea0>
CR2: ffffffff82a00000
 <0>Kernel panic - not syncing: Fatal exception
 BUG: warning at drivers/char/vt.c:3359/do_unblank_screen() (Tainted: GF    )

Call Trace:
 <IRQ>  [<ffffffff8018eb09>] do_unblank_screen+0x56/0x132
 [<ffffffff8007c97c>] bust_spinlocks+0x1c/0x46
 [<ffffffff8008b32b>] panic+0x88/0x1f4
 [<ffffffff8018eace>] do_unblank_screen+0x1b/0x132
 [<ffffffff80062d3a>] oops_end+0x51/0x53
 [<ffffffff80064842>] do_page_fault+0x753/0x81d
 [<ffffffff8009b6c2>] autoremove_wake_function+0x9/0x2e
 [<ffffffff800850ed>] __wake_up_common+0x3e/0x68
 [<ffffffff8005be1d>] error_exit+0x0/0x84
 [<ffffffff800862be>] task_rq_lock+0x26/0x6f
 [<ffffffff80092e3a>] process_timeout+0x0/0x5
 [<ffffffff80044951>] try_to_wake_up+0x27/0x418
 [<ffffffff80092e3a>] process_timeout+0x0/0x5
 [<ffffffff80092c4a>] run_timer_softirq+0x133/0x1b0
 [<ffffffff80011c19>] __do_softirq+0x5e/0xd5
 [<ffffffff8005c330>] call_softirq+0x1c/0x28
 [<ffffffff8006a312>] do_softirq+0x2c/0x85
 [<ffffffff80054f2e>] mwait_idle+0x0/0x4a
 [<ffffffff8005bcc2>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80054f64>] mwait_idle+0x36/0x4a
 [<ffffffff80046fb7>] cpu_idle+0x95/0xb8
 [<ffffffff80073bb7>] start_secondary+0x45a/0x469

Comment 1 James Bottomley 2007-05-22 19:36:24 UTC
Created attachment 155195 [details]
Upstream commit fixing problem

Comment 2 James Bottomley 2007-05-22 19:37:17 UTC
This has been traced to a failure in the aacraid aac_internal_transfer command
when handling INQUIRY commands requesting less than 16 bytes of data.

I've attached the upstream commit for this fix.

Comment 3 Ernie Petrides 2007-09-06 23:02:14 UTC
This problem has been fixed in 2.6.18-27.el5 with the aacraid driver update (in
patch tracking file repost-bz197337-update-aacraid-driver-to-1-1-5-2437.patch).

Comment 4 Chip Coldwell 2007-09-07 17:30:12 UTC
(In reply to comment #0)

> Pid: 0, comm: swapper Tainted: GF     2.6.18-8.1.4.el5 #1

This line indicates that the kernel that crashed was tainted by the forced
loading of a proprietary driver.  Please reproduce this bug with an untained
kernel and post the oops message here, or close the bug if you cannot.  We do
not have visibility into proprietary drivers.

Chip


Comment 5 Chip Coldwell 2007-09-07 17:31:12 UTC
(In reply to comment #4)
> (In reply to comment #0)
> 
> > Pid: 0, comm: swapper Tainted: GF     2.6.18-8.1.4.el5 #1
> 
> This line indicates that the kernel that crashed was tainted by the forced
> loading of a proprietary driver.  Please reproduce this bug with an untained
> kernel and post the oops message here, or close the bug if you cannot.  We do
> not have visibility into proprietary drivers.

Nevermind; already modified.

Chip


Comment 8 errata-xmlrpc 2007-11-07 19:49:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html



Note You need to log in before you can comment on or make changes to this bug.