From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30) Description of problem: Starting SteelEye Lifekeeper on a Redhat Enterprise 5 server causes kernel Oops. SteelEye technical support have looked into the kernel crash trace and identifed the issue is with common calls made to the aacraid driver. LifeKeeper version is 6.1.2-12. Version-Release number of selected component (if applicable): kernel-2.6.18-8.1.4.el5 How reproducible: Always Steps to Reproduce: 1. Install SteelEye Lifekeeper 2. Run /opt/LifeKeeper/bin/lkstart Actual Results: Kernel Oops and system stops responding. Expected Results: LifeKeeper should start and read information about attached disks. Additional info: LifeKeeper is starting to initialize at Mon May 21 11:09:39 BST 2007 Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: [<ffffffff800862be>] task_rq_lock+0x26/0x6f PGD 4a9370067 PUD 4a9371067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /class/scsi_host/host3/proc_name CPU 2 Modules linked in: mptctl mptbase autofs4 ipv6 video sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport shpchp bnx2 ide_cd serio_raw i2c_i801 sg cdrom pcspkr i2c_core dm_snapshot dm_zero dm_mirror dm_mod qla2400(U) qla2xxx(U) qla2xxx_conf(U) intermodule(U) ata_piix libata aacraid sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 0, comm: swapper Tainted: GF 2.6.18-8.1.4.el5 #1 RIP: 0010:[<ffffffff800862be>] [<ffffffff800862be>] task_rq_lock+0x26/0x6f RSP: 0018:ffff8104bfd1fe60 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffffffff803f9400 RCX: ffff81049eadbb48 RDX: 0000000000000000 RSI: ffff8104bfd1fee8 RDI: ffff8104a2d7e7e0 RBP: ffff8104bfd1fe80 R08: 000000000005bea0 R09: ffff810091665000 R10: ffffffff80392180 R11: ffff8104be49a000 R12: ffffffff803f9400 R13: ffff8104bfd1fee8 R14: ffff8104a2d7e7e0 R15: ffffffff803b4220 FS: 0000000000000000(0000) GS:ffff8104bfcd2e40(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000018 CR3: 00000004a8ddf000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffff8104bfd18000, task ffff8104bfcd3080) Stack: 000000000000000f ffffffff80092e3a ffff8104a2d7e7e0 0000000000000200 ffff8104bfd1ff20 ffffffff80044951 ffffffff80392180 0000000000000001 0000000000000000 0000000000000001 0000000000000000 000000000000373e Call Trace: <IRQ> [<ffffffff80092e3a>] process_timeout+0x0/0x5 [<ffffffff80044951>] try_to_wake_up+0x27/0x418 [<ffffffff80092e3a>] process_timeout+0x0/0x5 [<ffffffff80092c4a>] run_timer_softirq+0x133/0x1b0 [<ffffffff80011c19>] __do_softirq+0x5e/0xd5 [<ffffffff8005c330>] call_softirq+0x1c/0x28 [<ffffffff8006a312>] do_softirq+0x2c/0x85 [<ffffffff80054f2e>] mwait_idle+0x0/0x4a [<ffffffff8005bcc2>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff80054f64>] mwait_idle+0x36/0x4a [<ffffffff80046fb7>] cpu_idle+0x95/0xb8 [<ffffffff80073bb7>] start_secondary+0x45a/0x469 Code: 8b 40 18 48 8b 04 c5 c0 19 3b 80 4c 03 60 08 4c 89 e7 e8 c0 RIP [<ffffffff800862be>] task_rq_lock+0x26/0x6f RSP <ffff8104bfd1fe60> CR2: 0000000000000018 <0>Kernel panic - not syncing: Fatal exception Unable to handle kernel paging request at ffffffff82a00000 RIP: [<ffffffff880b1dd0>] :aacraid:aac_internal_transfer+0x9b/0x9e PGD 203067 PUD 205063 PMD 0 Oops: 0000 [2] SMP last sysfs file: /class/scsi_host/host3/proc_name CPU 0 Modules linked in: mptctl mptbase autofs4 ipv6 video sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport shpchp bnx2 ide_cd serio_raw i2c_i801 sg cdrom pcspkr i2c_core dm_snapshot dm_zero dm_mirror dm_mod qla2400(U) qla2xxx(U) qla2xxx_conf(U) intermodule(U) ata_piix libata aacraid sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 0, comm: swapper Tainted: GF 2.6.18-8.1.4.el5 #1 RIP: 0010:[<ffffffff880b1dd0>] [<ffffffff880b1dd0>] :aacraid:aac_internal_transfer+0x9b/0x9e RSP: 0018:ffffffff80402ea0 EFLAGS: 00010083 RAX: 0000000000000008 RBX: ffff8104a2b280c0 RCX: 00000000fda02ea0 RDX: 0000000000000008 RSI: ffffffff82a00000 RDI: ffff8104a4f26168 RBP: ffff8104be012780 R08: ffff8104a2929000 R09: ffff8100010004a0 R10: 0000000000000010 R11: ffff8104a6182cc0 R12: ffff8104be012780 R13: ffff8104be6d4cf8 R14: ffffffff803bbee8 R15: ffffffff803bbee8 FS: 0000000000000000(0000) GS:ffffffff8038a000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: ffffffff82a00000 CR3: 00000004a669a000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffffffff803ba000, task ffffffff802d1ae0) Stack: ffffffff880b2687 315f726f7272694d 2020202020202020 ffff8104be0f3068 ffff810037e27800 ffff8104be6d4cf8 ffffffff880b6f23 000000000000013c ffff8104be6d4cf8 0000000000000000 00000000000000a9 ffffffff803bbee8 Call Trace: <IRQ> [<ffffffff880b2687>] :aacraid:get_container_name_callback+0x8b/0xb5 [<ffffffff880b6f23>] :aacraid:aac_intr_normal+0x1b3/0x1f9 [<ffffffff880b7fc3>] :aacraid:aac_rkt_intr+0x37/0x115 [<ffffffff80010705>] handle_IRQ_event+0x29/0x58 [<ffffffff800b2fe2>] __do_IRQ+0xa4/0x105 [<ffffffff8006a195>] do_IRQ+0xe7/0xf5 [<ffffffff80054f2e>] mwait_idle+0x0/0x4a [<ffffffff8005b649>] ret_from_intr+0x0/0xa <EOI> [<ffffffff80054f64>] mwait_idle+0x36/0x4a [<ffffffff80046fb7>] cpu_idle+0x95/0xb8 [<ffffffff803c57f6>] start_kernel+0x220/0x225 [<ffffffff803c5237>] _sinittext+0x237/0x23e Code: f3 a4 c3 41 55 41 54 55 48 89 fd 53 48 89 f3 48 83 ec 08 48 RIP [<ffffffff880b1dd0>] :aacraid:aac_internal_transfer+0x9b/0x9e RSP <ffffffff80402ea0> CR2: ffffffff82a00000 <0>Kernel panic - not syncing: Fatal exception BUG: warning at drivers/char/vt.c:3359/do_unblank_screen() (Tainted: GF ) Call Trace: <IRQ> [<ffffffff8018eb09>] do_unblank_screen+0x56/0x132 [<ffffffff8007c97c>] bust_spinlocks+0x1c/0x46 [<ffffffff8008b32b>] panic+0x88/0x1f4 [<ffffffff8018eace>] do_unblank_screen+0x1b/0x132 [<ffffffff80062d3a>] oops_end+0x51/0x53 [<ffffffff80064842>] do_page_fault+0x753/0x81d [<ffffffff8009b6c2>] autoremove_wake_function+0x9/0x2e [<ffffffff800850ed>] __wake_up_common+0x3e/0x68 [<ffffffff8005be1d>] error_exit+0x0/0x84 [<ffffffff800862be>] task_rq_lock+0x26/0x6f [<ffffffff80092e3a>] process_timeout+0x0/0x5 [<ffffffff80044951>] try_to_wake_up+0x27/0x418 [<ffffffff80092e3a>] process_timeout+0x0/0x5 [<ffffffff80092c4a>] run_timer_softirq+0x133/0x1b0 [<ffffffff80011c19>] __do_softirq+0x5e/0xd5 [<ffffffff8005c330>] call_softirq+0x1c/0x28 [<ffffffff8006a312>] do_softirq+0x2c/0x85 [<ffffffff80054f2e>] mwait_idle+0x0/0x4a [<ffffffff8005bcc2>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff80054f64>] mwait_idle+0x36/0x4a [<ffffffff80046fb7>] cpu_idle+0x95/0xb8 [<ffffffff80073bb7>] start_secondary+0x45a/0x469
Created attachment 155195 [details] Upstream commit fixing problem
This has been traced to a failure in the aacraid aac_internal_transfer command when handling INQUIRY commands requesting less than 16 bytes of data. I've attached the upstream commit for this fix.
This problem has been fixed in 2.6.18-27.el5 with the aacraid driver update (in patch tracking file repost-bz197337-update-aacraid-driver-to-1-1-5-2437.patch).
(In reply to comment #0) > Pid: 0, comm: swapper Tainted: GF 2.6.18-8.1.4.el5 #1 This line indicates that the kernel that crashed was tainted by the forced loading of a proprietary driver. Please reproduce this bug with an untained kernel and post the oops message here, or close the bug if you cannot. We do not have visibility into proprietary drivers. Chip
(In reply to comment #4) > (In reply to comment #0) > > > Pid: 0, comm: swapper Tainted: GF 2.6.18-8.1.4.el5 #1 > > This line indicates that the kernel that crashed was tainted by the forced > loading of a proprietary driver. Please reproduce this bug with an untained > kernel and post the oops message here, or close the bug if you cannot. We do > not have visibility into proprietary drivers. Nevermind; already modified. Chip
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html