467315 – system panics after booting -118 kern and loading qla module

Bug 467315 - system panics after booting -118 kern and loading qla module

Summary: system panics after booting -118 kern and loading qla module

Keywords:
Status:	CLOSED DUPLICATE of bug 465945
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Doug Ledford
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-10-16 20:07 UTC by Corey Marthaler
Modified:	2008-11-18 13:40 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-11-18 13:40:23 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Corey Marthaler 2008-10-16 20:07:04 UTC

Description of problem:
I installed the -118 kernel onto my cluster of x86_64 machines and then rebooted and four machines hit the following panic.

Starting acpi daemon: [  OK  ]
Starting sshd: [  OK  ]
Starting cups: [  OK  ]
Starting xinetd: [  OK  ]
Starting sendmail: [  OK  ]
Starting sm-client: [  OK  ]
Starting console mouse services: [  OK  ]
Starting crond: [  OK  ]
Starting xfs: [  OK  ]
Starting anacron: [  OK  ]
Starting atd: [  OK  ]
Starting yum-updatesd: [  OK  ]
Starting Avahi daemon... [  OK  ]
Starting HAL daemon: [FAILED]
Starting smartd: [  OK  ]

Red Hat Enterprise Linux Server release 5.2 (Tikanga)
Kernel 2.6.18-118.el5 on an x86_64

taft-01 login: Unable to handle kernel NULL pointer dereference at 0000000000000010 RIP: 
 [<ffffffff881b753c>] :qla2xxx:qla2x00_abort_fcport_cmds+0x12/0x124
PGD 0 
Oops: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:06.0/0000:08:00.2/0000:0b:03.0/0000:0c:06.1/irq
CPU 0 
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath sd
Pid: 1398, comm: fc_wq_1 Not tainted 2.6.18-118.el5 #1
RIP: 0010:[<ffffffff881b753c>]  [<ffffffff881b753c>] :qla2xxx:qla2x00_abort_fcport_cmds+0x12/0x124
RSP: 0018:ffff81021afddda0  EFLAGS: 00010286
RAX: ffff81021af7c4a8 RBX: ffff81021af7c060 RCX: 0000000000000000
RDX: ffff8102120ad450 RSI: 0000000000000246 RDI: 0000000000000000
RBP: ffff81021af7c000 R08: ffff81021afdc000 R09: 000000000000003d
R10: ffff81021aeb58c0 R11: 0000000001382fb5 R12: ffff810219d71800
R13: ffff81021af7c060 R14: ffff81021af7c000 R15: ffffffff88194e03
FS:  0000000000000000(0000) GS:ffffffff803b8000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000010 CR3: 0000000219432000 CR4: 00000000000006e0
Process fc_wq_1 (pid: 1398, threadinfo ffff81021afdc000, task ffff810219e380c0)
Stack:  0000000000000000 ffff81021af7c060 ffff81021af7c000 ffff810219d71800
 ffff81021af7c060 ffff81021af7c000 ffffffff88194e03 ffffffff881cfa73
 ffff81021ba5c178 ffffffff88193204 ffff81021ba5c000 ffff81021ba5c000
Call Trace:
 [<ffffffff88194e03>] :scsi_transport_fc:fc_rport_final_delete+0x0/0xb7
 [<ffffffff881cfa73>] :qla2xxx:qla2x00_terminate_rport_io+0x14/0x1d
 [<ffffffff88193204>] :scsi_transport_fc:fc_terminate_rport_io+0x50/0x60
 [<ffffffff88194e86>] :scsi_transport_fc:fc_rport_final_delete+0x83/0xb7
 [<ffffffff8004d299>] run_workqueue+0x94/0xe4
 [<ffffffff80049b52>] worker_thread+0x0/0x122
 [<ffffffff8009e30e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80049c42>] worker_thread+0xf0/0x122
 [<ffffffff8008ae6c>] default_wake_function+0x0/0xe
 [<ffffffff8009e30e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8009e30e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800324fb>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009e30e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800323fd>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code: 4c 8b 67 10 4d 8b ac 24 20 26 00 00 4d 85 ed 4d 0f 44 ec 45 
RIP  [<ffffffff881b753c>] :qla2xxx:qla2x00_abort_fcport_cmds+0x12/0x124
 RSP <ffff81021afddda0>
CR2: 0000000000000010
 <0>Kernel panic - not syncing: Fatal exception


Version-Release number of selected component (if applicable):
Kernel 2.6.18-118.el5 on an x86_64

How reproducible:
Everytime

Comment 1 Jeff Bastian 2008-10-17 21:05:26 UTC

I've seen this on ia64 too on hp-bl870c-02.rhts.bos.redhat.com


...
...
QLogic Fibre Channel HBA Driver
Loading scsi_transport_fc.ko module
Loading qla2xxx.ko module
ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 49 (level, low) -> IRQ 50
qla2xxx 0000:04:00.0: Found an ISP2432, irq 50, iobase 0xc0000000b0084000
qla2xxx 0000:04:00.0: Configuring PCI space...
qla2xxx 0000:04:00.0: Configure NVRAM parameters...
qla2xxx 0000:04:00.0: Verifying loaded RISC code...
qla2xxx 0000:04:00.0: Allocated (64 KB) for EFT...
qla2xxx 0000:04:00.0: Allocated (1413 KB) for firmware dump...
scsi1 : qla2xxx
qla2xxx 0000:04:00.0: 
 QLogic Fibre Channel HBA Driver: 8.02.00-k5-rhel5.3-01
  QLogic QMH2462 - 
  ISP2432: PCIe (2.5Gb/s x4) @ 0000:04:00.0 hdma+, host#=1, fw=4.04.05 [IP] [Multi-ID] [84XX] 
GSI 50 (level, low) -> CPU 2 (0x0400) vector 64
ACPI: PCI Interrupt 0000:04:00.1[B] -> GSI 50 (level, low) -> IRQ 64
qla2xxx 0000:04:00.1: Found an ISP2432, irq 64, iobase 0xc0000000b0080000
qla2xxx 0000:04:00.1: Configuring PCI space...
qla2xxx 0000:04:00.1: Configure NVRAM parameters...
qla2xxx 0000:04:00.1: Verifying loaded RISC code...
qla2xxx 0000:04:00.1: Allocated (64 KB) for EFT...
qla2xxx 0000:04:00.1: Allocated (1413 KB) for firmware dump...
scsi2 : qla2xxx
qla2xxx 0000:04:00.1: 
 QLogic Fibre Channel HBA Driver: 8.02.00-k5-rhel5.3-01
  QLogic QMH2462 - 
  ISP2432: PCIe (2.5Gb/s x4) @ 0000:04:00.1 hdma+, host#=2, fw=4.04.05 [IP] [Multi-ID] [84XX] 
device-mapper: uevent: version 1.0.3
Loading dm-mod.ko module
device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel
Loading dm-log.ko module
Loading dm-mirror.ko module
Loading dm-zero.ko module
Loading dm-snapshot.ko module
Waiting for driver initialization.
qla2xxx 0000:04:00.1: LIP reset occured (f700).
qla2xxx 0000:04:00.1: LOOP UP detected (4 Gbps).
Unable to handle kernel NULL pointer dereference (address 0000000000000010)
fc_wq_2[621]: Oops 8813272891392 [1]
Modules linked in: dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod raid0 ext3 jbd uhci_hcd ohci_hcd ehci_hcd

Pid: 621, CPU 0, comm:              fc_wq_2
psr : 00001010085a6010 ifs : 800000000000050e ip  : [<a00000020f679140>]    Not tainted
ip is at qla2x00_abort_fcport_cmds+0x20/0x300 [qla2xxx]
unat: 0000000000000000 pfs : 0000000000000205 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 0000000000009a41
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a00000020f6c0250 b6  : a00000020f6c0220 b7  : a00000010000b840
f6  : 0fffbccccccccc8c00000 f7  : 0ffe5c612000000000000
f8  : 1000b9c80000000000000 f9  : 10002a000000000000000
f10 : 10007fa6666666172c000 f11 : 1003e00000000000001f4
r1  : a00000020f729938 r2  : e0000100f3bd8038 r3  : 0000000000000010
r8  : e0000100fec82150 r9  : e0000100fec82000 r10 : e0000100f540c0b8
r11 : a00000020f7b8f80 r12 : e0000100f55dfd00 r13 : e0000100f55d8000
r14 : e0000100f3bd84a8 r15 : a00000020f6c0220 r16 : e0000100fec82150
r17 : a00000020f6c0280 r18 : e0000100f3bd8458 r19 : e0000100f46dc298
r20 : 0000000000000001 r21 : 0000000000000001 r22 : 0000000000000000
r23 : 0000000000000001 r24 : e000000004e10000 r25 : ffffffffffff0028
r26 : 0000000000000006 r27 : 0000000000000000 r28 : e0000100f46dc298
r29 : 0000000000000400 r30 : 0000000000000000 r31 : e0000100f540c050

Call Trace:
 [<a000000100013ba0>] show_stack+0x40/0xa0
                                sp=e0000100f55df890 bsp=e0000100f55d9368
 [<a0000001000144a0>] show_regs+0x840/0x880
                                sp=e0000100f55dfa60 bsp=e0000100f55d9310
 [<a000000100037c00>] die+0x1c0/0x2c0
                                sp=e0000100f55dfa60 bsp=e0000100f55d92c8
 [<a00000010065dc50>] ia64_do_page_fault+0x910/0xa40
                                sp=e0000100f55dfa80 bsp=e0000100f55d9278
 [<a00000010000c040>] __ia64_leave_kernel+0x0/0x280
                                sp=e0000100f55dfb30 bsp=e0000100f55d9278
 [<a00000020f679140>] qla2x00_abort_fcport_cmds+0x20/0x300 [qla2xxx]
                                sp=e0000100f55dfd00 bsp=e0000100f55d9208
 [<a00000020f6c0250>] qla2x00_terminate_rport_io+0x30/0x60 [qla2xxx]
                                sp=e0000100f55dfd00 bsp=e0000100f55d91e0
 [<a00000020f14c5f0>] fc_terminate_rport_io+0x110/0x160 [scsi_transport_fc]
                                sp=e0000100f55dfd00 bsp=e0000100f55d91b8
 [<a00000020f151d30>] fc_rport_final_delete+0x1b0/0x220 [scsi_transport_fc]
                                sp=e0000100f55dfd00 bsp=e0000100f55d9180
 [<a0000001000a5be0>] run_workqueue+0x1c0/0x280
                                sp=e0000100f55dfd00 bsp=e0000100f55d9140
 [<a0000001000a7ac0>] worker_thread+0x1a0/0x240
                                sp=e0000100f55dfd00 bsp=e0000100f55d9110
 [<a0000001000afaf0>] kthread+0x230/0x2c0
                                sp=e0000100f55dfd50 bsp=e0000100f55d90c8
 [<a000000100012270>] kernel_thread_helper+0x30/0x60
                                sp=e0000100f55dfe30 bsp=e0000100f55d90a0
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e0000100f55dfe30 bsp=e0000100f55d90a0
 <0>Kernel panic - not syncing: Fatal exception

Comment 2 Don Zickus 2008-10-20 14:57:28 UTC

I believe this has been fixed in kernel-2.6.18-120.el5 (which will be released soon).  Either bz 442946 or bz 465945 is the one that resolved the problem.

Comment 3 Doug Ledford 2008-11-10 15:24:32 UTC

Can someone tell me if this is fixed or if I need to be digging into something?

Comment 4 Peter Martuccelli 2008-11-18 13:40:23 UTC

I cleared the blocker flag.  Corey any remaining issues please let us know, marking this as a duplicate of bug 465945.

*** This bug has been marked as a duplicate of bug 465945 ***

Note You need to log in before you can comment on or make changes to this bug.