Bug 240735 - Kernel Oops in aacraid loading SteelEye Lifekeeper
Kernel Oops in aacraid loading SteelEye Lifekeeper
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Chip Coldwell
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-05-21 06:59 EDT by Richard Rudd
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version: RHBA-2007-0959
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 14:49:44 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Upstream commit fixing problem (3.70 KB, patch)
2007-05-22 15:36 EDT, James Bottomley
no flags Details | Diff

  None (edit)
Description Richard Rudd 2007-05-21 06:59:06 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)

Description of problem:
Starting SteelEye Lifekeeper on a Redhat Enterprise 5 server causes kernel Oops.  SteelEye technical support have looked into the kernel crash trace and identifed the issue is with common calls made to the aacraid driver.
LifeKeeper version is 6.1.2-12.

Version-Release number of selected component (if applicable):
kernel-2.6.18-8.1.4.el5

How reproducible:
Always


Steps to Reproduce:
1. Install SteelEye Lifekeeper
2. Run /opt/LifeKeeper/bin/lkstart

Actual Results:
Kernel Oops and system stops responding.

Expected Results:
LifeKeeper should start and read information about attached disks.

Additional info:
LifeKeeper is starting to initialize at Mon May 21 11:09:39 BST 2007
Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
 [<ffffffff800862be>] task_rq_lock+0x26/0x6f
PGD 4a9370067 PUD 4a9371067 PMD 0 
Oops: 0000 [1] SMP 
last sysfs file: /class/scsi_host/host3/proc_name
CPU 2 
Modules linked in: mptctl mptbase autofs4 ipv6 video sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport shpchp bnx2 ide_cd serio_raw i2c_i801 sg cdrom pcspkr i2c_core dm_snapshot dm_zero dm_mirror dm_mod qla2400(U) qla2xxx(U) qla2xxx_conf(U) intermodule(U) ata_piix libata aacraid sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 0, comm: swapper Tainted: GF     2.6.18-8.1.4.el5 #1
RIP: 0010:[<ffffffff800862be>]  [<ffffffff800862be>] task_rq_lock+0x26/0x6f
RSP: 0018:ffff8104bfd1fe60  EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffffffff803f9400 RCX: ffff81049eadbb48
RDX: 0000000000000000 RSI: ffff8104bfd1fee8 RDI: ffff8104a2d7e7e0
RBP: ffff8104bfd1fe80 R08: 000000000005bea0 R09: ffff810091665000
R10: ffffffff80392180 R11: ffff8104be49a000 R12: ffffffff803f9400
R13: ffff8104bfd1fee8 R14: ffff8104a2d7e7e0 R15: ffffffff803b4220
FS:  0000000000000000(0000) GS:ffff8104bfcd2e40(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 00000004a8ddf000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff8104bfd18000, task ffff8104bfcd3080)
Stack:  000000000000000f ffffffff80092e3a ffff8104a2d7e7e0 0000000000000200
 ffff8104bfd1ff20 ffffffff80044951 ffffffff80392180 0000000000000001
 0000000000000000 0000000000000001 0000000000000000 000000000000373e
Call Trace:
 <IRQ>  [<ffffffff80092e3a>] process_timeout+0x0/0x5
 [<ffffffff80044951>] try_to_wake_up+0x27/0x418
 [<ffffffff80092e3a>] process_timeout+0x0/0x5
 [<ffffffff80092c4a>] run_timer_softirq+0x133/0x1b0
 [<ffffffff80011c19>] __do_softirq+0x5e/0xd5
 [<ffffffff8005c330>] call_softirq+0x1c/0x28
 [<ffffffff8006a312>] do_softirq+0x2c/0x85
 [<ffffffff80054f2e>] mwait_idle+0x0/0x4a
 [<ffffffff8005bcc2>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80054f64>] mwait_idle+0x36/0x4a
 [<ffffffff80046fb7>] cpu_idle+0x95/0xb8
 [<ffffffff80073bb7>] start_secondary+0x45a/0x469


Code: 8b 40 18 48 8b 04 c5 c0 19 3b 80 4c 03 60 08 4c 89 e7 e8 c0 
RIP  [<ffffffff800862be>] task_rq_lock+0x26/0x6f
 RSP <ffff8104bfd1fe60>
CR2: 0000000000000018
 <0>Kernel panic - not syncing: Fatal exception
Unable to handle kernel paging request at ffffffff82a00000 RIP: 
 [<ffffffff880b1dd0>] :aacraid:aac_internal_transfer+0x9b/0x9e
PGD 203067 PUD 205063 PMD 0 
Oops: 0000 [2] SMP 
last sysfs file: /class/scsi_host/host3/proc_name
CPU 0 
Modules linked in: mptctl mptbase autofs4 ipv6 video sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport shpchp bnx2 ide_cd serio_raw i2c_i801 sg cdrom pcspkr i2c_core dm_snapshot dm_zero dm_mirror dm_mod qla2400(U) qla2xxx(U) qla2xxx_conf(U) intermodule(U) ata_piix libata aacraid sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 0, comm: swapper Tainted: GF     2.6.18-8.1.4.el5 #1
RIP: 0010:[<ffffffff880b1dd0>]  [<ffffffff880b1dd0>] :aacraid:aac_internal_transfer+0x9b/0x9e
RSP: 0018:ffffffff80402ea0  EFLAGS: 00010083
RAX: 0000000000000008 RBX: ffff8104a2b280c0 RCX: 00000000fda02ea0
RDX: 0000000000000008 RSI: ffffffff82a00000 RDI: ffff8104a4f26168
RBP: ffff8104be012780 R08: ffff8104a2929000 R09: ffff8100010004a0
R10: 0000000000000010 R11: ffff8104a6182cc0 R12: ffff8104be012780
R13: ffff8104be6d4cf8 R14: ffffffff803bbee8 R15: ffffffff803bbee8
FS:  0000000000000000(0000) GS:ffffffff8038a000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffffff82a00000 CR3: 00000004a669a000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff803ba000, task ffffffff802d1ae0)
Stack:  ffffffff880b2687 315f726f7272694d 2020202020202020 ffff8104be0f3068
 ffff810037e27800 ffff8104be6d4cf8 ffffffff880b6f23 000000000000013c
 ffff8104be6d4cf8 0000000000000000 00000000000000a9 ffffffff803bbee8
Call Trace:
 <IRQ>  [<ffffffff880b2687>] :aacraid:get_container_name_callback+0x8b/0xb5
 [<ffffffff880b6f23>] :aacraid:aac_intr_normal+0x1b3/0x1f9
 [<ffffffff880b7fc3>] :aacraid:aac_rkt_intr+0x37/0x115
 [<ffffffff80010705>] handle_IRQ_event+0x29/0x58
 [<ffffffff800b2fe2>] __do_IRQ+0xa4/0x105
 [<ffffffff8006a195>] do_IRQ+0xe7/0xf5
 [<ffffffff80054f2e>] mwait_idle+0x0/0x4a
 [<ffffffff8005b649>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff80054f64>] mwait_idle+0x36/0x4a
 [<ffffffff80046fb7>] cpu_idle+0x95/0xb8
 [<ffffffff803c57f6>] start_kernel+0x220/0x225
 [<ffffffff803c5237>] _sinittext+0x237/0x23e


Code: f3 a4 c3 41 55 41 54 55 48 89 fd 53 48 89 f3 48 83 ec 08 48 
RIP  [<ffffffff880b1dd0>] :aacraid:aac_internal_transfer+0x9b/0x9e
 RSP <ffffffff80402ea0>
CR2: ffffffff82a00000
 <0>Kernel panic - not syncing: Fatal exception
 BUG: warning at drivers/char/vt.c:3359/do_unblank_screen() (Tainted: GF    )

Call Trace:
 <IRQ>  [<ffffffff8018eb09>] do_unblank_screen+0x56/0x132
 [<ffffffff8007c97c>] bust_spinlocks+0x1c/0x46
 [<ffffffff8008b32b>] panic+0x88/0x1f4
 [<ffffffff8018eace>] do_unblank_screen+0x1b/0x132
 [<ffffffff80062d3a>] oops_end+0x51/0x53
 [<ffffffff80064842>] do_page_fault+0x753/0x81d
 [<ffffffff8009b6c2>] autoremove_wake_function+0x9/0x2e
 [<ffffffff800850ed>] __wake_up_common+0x3e/0x68
 [<ffffffff8005be1d>] error_exit+0x0/0x84
 [<ffffffff800862be>] task_rq_lock+0x26/0x6f
 [<ffffffff80092e3a>] process_timeout+0x0/0x5
 [<ffffffff80044951>] try_to_wake_up+0x27/0x418
 [<ffffffff80092e3a>] process_timeout+0x0/0x5
 [<ffffffff80092c4a>] run_timer_softirq+0x133/0x1b0
 [<ffffffff80011c19>] __do_softirq+0x5e/0xd5
 [<ffffffff8005c330>] call_softirq+0x1c/0x28
 [<ffffffff8006a312>] do_softirq+0x2c/0x85
 [<ffffffff80054f2e>] mwait_idle+0x0/0x4a
 [<ffffffff8005bcc2>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80054f64>] mwait_idle+0x36/0x4a
 [<ffffffff80046fb7>] cpu_idle+0x95/0xb8
 [<ffffffff80073bb7>] start_secondary+0x45a/0x469
Comment 1 James Bottomley 2007-05-22 15:36:24 EDT
Created attachment 155195 [details]
Upstream commit fixing problem
Comment 2 James Bottomley 2007-05-22 15:37:17 EDT
This has been traced to a failure in the aacraid aac_internal_transfer command
when handling INQUIRY commands requesting less than 16 bytes of data.

I've attached the upstream commit for this fix.
Comment 3 Ernie Petrides 2007-09-06 19:02:14 EDT
This problem has been fixed in 2.6.18-27.el5 with the aacraid driver update (in
patch tracking file repost-bz197337-update-aacraid-driver-to-1-1-5-2437.patch).
Comment 4 Chip Coldwell 2007-09-07 13:30:12 EDT
(In reply to comment #0)

> Pid: 0, comm: swapper Tainted: GF     2.6.18-8.1.4.el5 #1

This line indicates that the kernel that crashed was tainted by the forced
loading of a proprietary driver.  Please reproduce this bug with an untained
kernel and post the oops message here, or close the bug if you cannot.  We do
not have visibility into proprietary drivers.

Chip
Comment 5 Chip Coldwell 2007-09-07 13:31:12 EDT
(In reply to comment #4)
> (In reply to comment #0)
> 
> > Pid: 0, comm: swapper Tainted: GF     2.6.18-8.1.4.el5 #1
> 
> This line indicates that the kernel that crashed was tainted by the forced
> loading of a proprietary driver.  Please reproduce this bug with an untained
> kernel and post the oops message here, or close the bug if you cannot.  We do
> not have visibility into proprietary drivers.

Nevermind; already modified.

Chip
Comment 8 errata-xmlrpc 2007-11-07 14:49:44 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html

Note You need to log in before you can comment on or make changes to this bug.