Bug 332911

Summary: qla2xxx does not handle disk failure resulting in system crash [ORA 6153432]
Product: Red Hat Enterprise Linux 4 Reporter: John Sobecki <john.sobecki>
Component: kernelAssignee: Marcus Barrow <mbarrow>
Status: CLOSED CURRENTRELEASE QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: low    
Version: 4.4CC: dledford, jbaron
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 4.6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-19 13:49:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
upstream patch backported to EL4 none

Description John Sobecki 2007-10-15 17:53:22 UTC
Description of problem:

During disk error conditions, the qla2xxx has an unhandled error condition
that takes down the system. 

Version-Release number of selected component (if applicable):

RHEL4 U4 and U5.

How reproducible:

Need to experience a disk failure to get the OOPS, so, not easy.

Steps to Reproduce:
1. NA
2.
3.
  
Actual results:

qla2300 0000:22:01.0: qla2xxx_eh_abort: cmd already done sp=0000000000000000
SC SI error : <4 0 0 11> return code = 0x6000000
end_request: I/O error, dev sdch, sector 25687040
end_request: I/O error, dev sdch, sector 25687048
SC SI error : <4 0 1 6> return code = 0x6000000
end_request: I/O error, dev sddl, sector 27078144
end_request: I/O error, dev sddl, sector 27078152
 at 0000000000000008 RIP:
<ffffffffa0061e0e>{:qla2xxx:qla2x00_cmd_timeout+26}
PML4 714dd067 PGD 160126067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: netconsole netdump md5 ipv6 oracleasm(U) sunrpc dm_mod
emcphr(U) emcpmpap(U) emcpmpaa(U) emcpmpc(U) emcpmp(U) emcp(U) emcplib(U)
button battery ac ohci_hcd hw_random tg3 e1000 bond2(U) bond1(U) bond0(U)
@ floppy st sg ext3 jbd raid1 qla2300 qla2xxx sc si_transport_fc mptscsih mptsas
@ mptspi mptfc mptscsi mptbase sd_mod scsi_mod
Pid: 0, comm: swapper Tainted: PF     2.6.9-42.0.2.ELsmp
RIP: 0010:[<ffffffffa0061e0e>]
<ffffffffa0061e0e>{:qla2xxx:qla2x00_cmd_timeout+26}
RSP: 0018:ffffffff80456f68  EFLAGS: 00010246
RAX: 00000000ffffffff RBX: 00000103d259ec80 RCX: 00000103d259eca0
RDX: ffffffff80456fa8 RSI: 0000000000000206 RDI: 00000103d259ec80
RBP: 0000010008002be0 R08: 0000000000000097 R09: 0000000000000080
R10: 0000000000000080 R11: 0000000000000080 R12: 00000107e2af03c8
R13: 0000000000000000 R14: 0000000000000206 R15: 0000000000000000
FS:  0000002a959a1da0(0000) GS:ffffffff804e5180(0000) knlGS:00000000f7e5a6c0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff804e8000, task ffffffff803d4400)
Stack: 0000000000000080 00000103d259ec80 0000010008002be0 ffffffffa0061df4
       ffffffff80456fa8 0000000000000206 0000000000000000 ffffffff80140085
       <ffffffff80140085>{run_timer_softirq+356}
<ffffffff80133606>{rebalance_tick+504}
       <ffffffff8013c738>{__do_softirq+88} <ffffffff8013c7e1>{do_softirq+49}
       <ffffffff80110bf5>{apic_timer_interrupt+133}  <EOI>
<ffffffff8010e749>{default_idle+0}
       <ffffffff8010e769>{default_idle+32} <ffffffff8010e7dc>{cpu_idle+26}
       <ffffffff804eb67b>{start_kernel+470}
<ffffffff804eb1d5>{_sinittext+469}
    
    
 Code: 49 8b 45 08 48 8b 38 48 8b 43 78 4c 8d b7 c8 03 00 00 48 81
 RIP <ffffffffa0061e0e>{:qla2xxx:qla2x00_cmd_timeout+26} RSP
<ffffffff80456f68>
    CR2: 0000000000000008 

Expected results:

No crash.

Additional info:

Patch from upstream qla2xxx attached; please note that Oracle is including this
in EL4.6.

Reference: Oracle Bug 6153432

Comment 1 John Sobecki 2007-10-15 17:53:22 UTC
Created attachment 227831 [details]
upstream patch backported to EL4

Comment 2 Marcus Barrow 2007-10-19 13:49:29 UTC
This patch has been present in rhel 4.6 for some time now.

Thank-you for the report and sorry you ran into this...

Marcus Barrow
QLogic Partner engineer