Bug 643135

Summary: [NetApp/QLogic 5.5.z bug] Kernel panic hit on RHEL 5.5 QLogic FC host at qla2x00_abort_fcport_cmds [rhel-5.5.z]
Product: Red Hat Enterprise Linux 5 Reporter: Benjamin Kahn <bkahn>
Component: kernelAssignee: Chad Dupuis (Cavium) <cdupuis>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.6CC: andrew.vasquez, andriusb, bugproxy, bzeranski, cdupuis, dhoward, jjarvis, jpirko, karen.skweres, lalit.chandivade, ltroan, martinez, marting, martin.wilck, mbarrow, mchristi, nobody+PNT0273897, pm-eus, qlogic-redhat-ext, revers, sandy.garza, sbest, syeghiay
Target Milestone: rcKeywords: FutureFeature, OtherQA, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Kernel panic occurred on a Red Hat Enterprise Linux 5.5 FC host with a QLogic 8G FC adapter (QLE2562) while running IO with target controller faults. With this update, kernel panic no longer occurs in the aforementioned case.
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-09 18:07:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 567428    
Bug Blocks:    
Attachments:
Description Flags
qla2xxx: Version updated to 8.03.01.05.05.06-k
none
qla2xxx: Correct use-after-free issue in terminate_rport_io callback. none

Description Benjamin Kahn 2010-10-14 18:16:32 UTC
This bug has been copied from bug #567428 and has been proposed
to be backported to 5.5 z-stream (EUS).

Comment 2 Jiri Pirko 2010-10-14 18:18:24 UTC
*** Bug 628583 has been marked as a duplicate of this bug. ***

Comment 3 Jiri Pirko 2010-10-14 18:23:05 UTC
This fixes the issue described in Bug 628583. A 5.5.z patch, a bit of bz567428 driver update, is needed.

Comment 4 Andrius Benokraitis 2010-10-14 18:31:06 UTC
Chad - please post the discrete patch that resolves the defect described in bug 628583, but posted as part of the wholesale 5.6 qla2xxx update in bug 567428, in this bugzlla for 5.5.z.

Comment 5 Chad Dupuis (Cavium) 2010-10-14 19:08:14 UTC
Created attachment 453533 [details]
qla2xxx: Version updated to 8.03.01.05.05.06-k

Comment 6 Chad Dupuis (Cavium) 2010-10-14 19:09:54 UTC
(In reply to comment #4)
> Chad - please post the discrete patch that resolves the defect described in bug
> 628583, but posted as part of the wholesale 5.6 qla2xxx update in bug 567428,
> in this bugzlla for 5.5.z.

The specific patch from 567428 has been attached.  The patch has a few point fixes in it but the one that specifically fixes this issue is "Correct use-after-free issue in terminate_rport_io callback".

Comment 7 Andrius Benokraitis 2010-10-14 19:13:11 UTC
Jiri/Don - Does QLogic need anything else to provide for this?

Comment 8 Jiri Pirko 2010-10-14 19:16:59 UTC
(In reply to comment #6)
> (In reply to comment #4)
> > Chad - please post the discrete patch that resolves the defect described in bug
> > 628583, but posted as part of the wholesale 5.6 qla2xxx update in bug 567428,
> > in this bugzlla for 5.5.z.
> 
> The specific patch from 567428 has been attached.  The patch has a few point
> fixes in it but the one that specifically fixes this issue is "Correct
> use-after-free issue in terminate_rport_io callback".

Ok, can you please isolate this minimal fixing patch and post it to RHKL under this BZnum? Thanks!

Comment 9 Chad Dupuis (Cavium) 2010-10-14 19:24:05 UTC
Created attachment 453536 [details]
qla2xxx: Correct use-after-free issue in terminate_rport_io callback.

Comment 10 Chad Dupuis (Cavium) 2010-10-14 19:24:54 UTC
> Ok, can you please isolate this minimal fixing patch and post it to RHKL under
> this BZnum? Thanks!

I've posted the minimal fixing patch.

Comment 11 Jiri Pirko 2010-10-14 20:07:12 UTC
(In reply to comment #10)
> > Ok, can you please isolate this minimal fixing patch and post it to RHKL under
> > this BZnum? Thanks!
> 
> I've posted the minimal fixing patch.

I do not see it anywhere in rhkernel-list

Comment 12 Andrius Benokraitis 2010-10-14 20:09:05 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > > Ok, can you please isolate this minimal fixing patch and post it to RHKL under
> > > this BZnum? Thanks!
> > 
> > I've posted the minimal fixing patch.
> 
> I do not see it anywhere in rhkernel-list

Jiri, I believe he meant he posted it "here" in the BZ, but will post to rhkl in the morning. Is this too late?

Comment 13 Don Howard 2010-10-14 20:24:16 UTC
Hi Chad, Andrius -

Please post the minimal patch to rhkernel-list for review asap. Jiri needs to start the 5.5.z build tomorrow - the patch needs to be reviewed *today*.

Comment 14 Chad Dupuis (Cavium) 2010-10-14 20:27:14 UTC
(In reply to comment #13)
> Hi Chad, Andrius -
> 
> Please post the minimal patch to rhkernel-list for review asap. Jiri needs to
> start the 5.5.z build tomorrow - the patch needs to be reviewed *today*.

I just posted it for review.

Comment 16 Jiri Pirko 2010-10-16 09:05:14 UTC
in kernel 2.6.18-194.21.1.el5

linux-2.6-scsi-qla2xxx-correct-use-after-free-issue-in-terminate_rport_io-callback.patch

Comment 18 Martin George 2010-10-20 10:49:09 UTC
Chad,

After patching the RHEL 5.5.z host with your fix above (and with Mike Christie's reverted block state patch for resolving the RHEL5 regression bug 632195), I hit another panic due to a NULL pointer dereference at qla24xx_queuecommand:

Unable to handle kernel NULL pointer dereference at 0000000000000060 RIP: 
 [<ffffffff880ce477>] :qla2xxx:qla24xx_queuecommand+0x1be/0x1dd
PGD 0 
Oops: 0000 [1] SMP 
last sysfs file: /class/fc_remote_ports/rport-1:0-1/scsi_target_id
CPU 2 
Modules linked in: nfs fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_
Pid: 433, comm: scsi_wq_0 Not tainted 2.6.18-194.11.1.el5.oct14.unblock.ver3 #1
RIP: 0010:[<ffffffff880ce477>]  [<ffffffff880ce477>] :qla2xxx:qla24xx_queuecommand+0x1be/0x1dd
RSP: 0000:ffff81007e0eda50  EFLAGS: 00010002
RAX: 0000000000000002 RBX: ffff8100056ee080 RCX: 0000000000000190
RDX: ffff81007e0d8000 RSI: ffffffff880755a6 RDI: ffff81007e0d8060
RBP: ffff81007e5984f8 R08: 0000000000000286 R09: 0000000000000000
R10: ffff8100056ee140 R11: 0000000000000060 R12: ffff8100056ee080
R13: ffff81007e5984f8 R14: 0000000000000000 R15: ffffffff880755a6
FS:  0000000000000000(0000) GS:ffff81007ff1dec0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000060 CR3: 0000000030267000 CR4: 00000000000006e0
Process scsi_wq_0 (pid: 433, threadinfo ffff81007e0ec000, task ffff810037c1a100)
Stack:  ffff8100763f6048 ffff8100056ee080 ffff81007e598000 0000000000000287
 ffff8100763f6048 ffff810074b94178 ffff8100763f6048 ffffffff88075c61
 ffff810027f8e1d8 ffff8100056ee080 ffff810027f8e000 ffff81007e598000
Call Trace:
 [<ffffffff88075c61>] :scsi_mod:scsi_dispatch_cmd+0x26e/0x2ff
 [<ffffffff8807b260>] :scsi_mod:scsi_request_fn+0x2c1/0x390
 [<ffffffff80144fb3>] blk_execute_rq_nowait+0x86/0x9a
 [<ffffffff80145057>] blk_execute_rq+0x90/0xc0
 [<ffffffff8807aca5>] :scsi_mod:scsi_execute+0xd1/0xea
 [<ffffffff8807ad64>] :scsi_mod:scsi_execute_req+0xa6/0xcf
 [<ffffffff8807c05a>] :scsi_mod:scsi_probe_and_add_lun+0x207/0x9c9
 [<ffffffff8807ad37>] :scsi_mod:scsi_execute_req+0x79/0xcf
[1;51
 [<ffffffff8807d275>] :scsi_mod:__scsi_scan_target+0x58a/0x5c7
 [<ffffffff8008c78b>] dequeue_task+0x18/0x37
 [<ffffffff8807d55b>] :

Is this a new issue? Do you want me to file a separate bug for this?

Comment 19 Chad Dupuis (Cavium) 2010-10-20 12:47:35 UTC
> 
> Is this a new issue? Do you want me to file a separate bug for this?

Yes please, the signature of this bug looks completely different.  The stack trace indicates that this occurs during LUN scanning.

Comment 20 Martin George 2010-10-20 13:35:56 UTC
(In reply to comment #19)
> > 
> > Is this a new issue? Do you want me to file a separate bug for this?
> 
> Yes please, the signature of this bug looks completely different.  The stack
> trace indicates that this occurs during LUN scanning.

Done. Filed bug 644863 for the same.

Comment 23 errata-xmlrpc 2010-11-09 18:07:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0839.html

Comment 24 Martin Prpič 2010-11-11 14:03:45 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Kernel panic occurred on a Red Hat Enterprise Linux 5.5 FC host with a QLogic 8G FC adapter (QLE2562) while running IO with target controller faults. With this update, kernel panic no longer occurs in the aforementioned case.