332911 – qla2xxx does not handle disk failure resulting in system crash [ORA 6153432]

Bug 332911 - qla2xxx does not handle disk failure resulting in system crash [ORA 6153432]

Summary: qla2xxx does not handle disk failure resulting in system crash [ORA 6153432]

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Marcus Barrow
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-10-15 17:53 UTC by John Sobecki
Modified:	2007-11-17 01:14 UTC (History)
CC List:	2 users (show)
Fixed In Version:	4.6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-10-19 13:49:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
upstream patch backported to EL4 (813 bytes, patch) 2007-10-15 17:53 UTC, John Sobecki	no flags	Details \| Diff
View All

Description John Sobecki 2007-10-15 17:53:22 UTC

Description of problem:

During disk error conditions, the qla2xxx has an unhandled error condition
that takes down the system. 

Version-Release number of selected component (if applicable):

RHEL4 U4 and U5.

How reproducible:

Need to experience a disk failure to get the OOPS, so, not easy.

Steps to Reproduce:
1. NA
2.
3.
  
Actual results:

qla2300 0000:22:01.0: qla2xxx_eh_abort: cmd already done sp=0000000000000000
SC SI error : <4 0 0 11> return code = 0x6000000
end_request: I/O error, dev sdch, sector 25687040
end_request: I/O error, dev sdch, sector 25687048
SC SI error : <4 0 1 6> return code = 0x6000000
end_request: I/O error, dev sddl, sector 27078144
end_request: I/O error, dev sddl, sector 27078152
 at 0000000000000008 RIP:
<ffffffffa0061e0e>{:qla2xxx:qla2x00_cmd_timeout+26}
PML4 714dd067 PGD 160126067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: netconsole netdump md5 ipv6 oracleasm(U) sunrpc dm_mod
emcphr(U) emcpmpap(U) emcpmpaa(U) emcpmpc(U) emcpmp(U) emcp(U) emcplib(U)
button battery ac ohci_hcd hw_random tg3 e1000 bond2(U) bond1(U) bond0(U)
@ floppy st sg ext3 jbd raid1 qla2300 qla2xxx sc si_transport_fc mptscsih mptsas
@ mptspi mptfc mptscsi mptbase sd_mod scsi_mod
Pid: 0, comm: swapper Tainted: PF     2.6.9-42.0.2.ELsmp
RIP: 0010:[<ffffffffa0061e0e>]
<ffffffffa0061e0e>{:qla2xxx:qla2x00_cmd_timeout+26}
RSP: 0018:ffffffff80456f68  EFLAGS: 00010246
RAX: 00000000ffffffff RBX: 00000103d259ec80 RCX: 00000103d259eca0
RDX: ffffffff80456fa8 RSI: 0000000000000206 RDI: 00000103d259ec80
RBP: 0000010008002be0 R08: 0000000000000097 R09: 0000000000000080
R10: 0000000000000080 R11: 0000000000000080 R12: 00000107e2af03c8
R13: 0000000000000000 R14: 0000000000000206 R15: 0000000000000000
FS:  0000002a959a1da0(0000) GS:ffffffff804e5180(0000) knlGS:00000000f7e5a6c0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff804e8000, task ffffffff803d4400)
Stack: 0000000000000080 00000103d259ec80 0000010008002be0 ffffffffa0061df4
       ffffffff80456fa8 0000000000000206 0000000000000000 ffffffff80140085
       <ffffffff80140085>{run_timer_softirq+356}
<ffffffff80133606>{rebalance_tick+504}
       <ffffffff8013c738>{__do_softirq+88} <ffffffff8013c7e1>{do_softirq+49}
       <ffffffff80110bf5>{apic_timer_interrupt+133}  <EOI>
<ffffffff8010e749>{default_idle+0}
       <ffffffff8010e769>{default_idle+32} <ffffffff8010e7dc>{cpu_idle+26}
       <ffffffff804eb67b>{start_kernel+470}
<ffffffff804eb1d5>{_sinittext+469}
    
    
 Code: 49 8b 45 08 48 8b 38 48 8b 43 78 4c 8d b7 c8 03 00 00 48 81
 RIP <ffffffffa0061e0e>{:qla2xxx:qla2x00_cmd_timeout+26} RSP
<ffffffff80456f68>
    CR2: 0000000000000008 

Expected results:

No crash.

Additional info:

Patch from upstream qla2xxx attached; please note that Oracle is including this
in EL4.6.

Reference: Oracle Bug 6153432

Comment 1 John Sobecki 2007-10-15 17:53:22 UTC

Created attachment 227831 [details]
upstream patch backported to EL4

Comment 2 Marcus Barrow 2007-10-19 13:49:29 UTC

This patch has been present in rhel 4.6 for some time now.

Thank-you for the report and sorry you ran into this...

Marcus Barrow
QLogic Partner engineer

Note You need to log in before you can comment on or make changes to this bug.