Bug 643135 - [NetApp/QLogic 5.5.z bug] Kernel panic hit on RHEL 5.5 QLogic FC host at qla2x00_abort_fcport_cmds [rhel-5.5.z]
Summary: [NetApp/QLogic 5.5.z bug] Kernel panic hit on RHEL 5.5 QLogic FC host at qla2...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.6
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Chad Dupuis (Cavium)
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 628583 (view as bug list)
Depends On: 567428
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-14 18:16 UTC by Benjamin Kahn
Modified: 2010-11-11 14:03 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Kernel panic occurred on a Red Hat Enterprise Linux 5.5 FC host with a QLogic 8G FC adapter (QLE2562) while running IO with target controller faults. With this update, kernel panic no longer occurs in the aforementioned case.
Clone Of:
Environment:
Last Closed: 2010-11-09 18:07:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
qla2xxx: Version updated to 8.03.01.05.05.06-k (23.90 KB, patch)
2010-10-14 19:08 UTC, Chad Dupuis (Cavium)
no flags Details | Diff
qla2xxx: Correct use-after-free issue in terminate_rport_io callback. (3.15 KB, patch)
2010-10-14 19:24 UTC, Chad Dupuis (Cavium)
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0839 0 normal SHIPPED_LIVE Moderate: kernel security and bug fix update 2010-11-09 18:06:20 UTC

Description Benjamin Kahn 2010-10-14 18:16:32 UTC
This bug has been copied from bug #567428 and has been proposed
to be backported to 5.5 z-stream (EUS).

Comment 2 Jiri Pirko 2010-10-14 18:18:24 UTC
*** Bug 628583 has been marked as a duplicate of this bug. ***

Comment 3 Jiri Pirko 2010-10-14 18:23:05 UTC
This fixes the issue described in Bug 628583. A 5.5.z patch, a bit of bz567428 driver update, is needed.

Comment 4 Andrius Benokraitis 2010-10-14 18:31:06 UTC
Chad - please post the discrete patch that resolves the defect described in bug 628583, but posted as part of the wholesale 5.6 qla2xxx update in bug 567428, in this bugzlla for 5.5.z.

Comment 5 Chad Dupuis (Cavium) 2010-10-14 19:08:14 UTC
Created attachment 453533 [details]
qla2xxx: Version updated to 8.03.01.05.05.06-k

Comment 6 Chad Dupuis (Cavium) 2010-10-14 19:09:54 UTC
(In reply to comment #4)
> Chad - please post the discrete patch that resolves the defect described in bug
> 628583, but posted as part of the wholesale 5.6 qla2xxx update in bug 567428,
> in this bugzlla for 5.5.z.

The specific patch from 567428 has been attached.  The patch has a few point fixes in it but the one that specifically fixes this issue is "Correct use-after-free issue in terminate_rport_io callback".

Comment 7 Andrius Benokraitis 2010-10-14 19:13:11 UTC
Jiri/Don - Does QLogic need anything else to provide for this?

Comment 8 Jiri Pirko 2010-10-14 19:16:59 UTC
(In reply to comment #6)
> (In reply to comment #4)
> > Chad - please post the discrete patch that resolves the defect described in bug
> > 628583, but posted as part of the wholesale 5.6 qla2xxx update in bug 567428,
> > in this bugzlla for 5.5.z.
> 
> The specific patch from 567428 has been attached.  The patch has a few point
> fixes in it but the one that specifically fixes this issue is "Correct
> use-after-free issue in terminate_rport_io callback".

Ok, can you please isolate this minimal fixing patch and post it to RHKL under this BZnum? Thanks!

Comment 9 Chad Dupuis (Cavium) 2010-10-14 19:24:05 UTC
Created attachment 453536 [details]
qla2xxx: Correct use-after-free issue in terminate_rport_io callback.

Comment 10 Chad Dupuis (Cavium) 2010-10-14 19:24:54 UTC
> Ok, can you please isolate this minimal fixing patch and post it to RHKL under
> this BZnum? Thanks!

I've posted the minimal fixing patch.

Comment 11 Jiri Pirko 2010-10-14 20:07:12 UTC
(In reply to comment #10)
> > Ok, can you please isolate this minimal fixing patch and post it to RHKL under
> > this BZnum? Thanks!
> 
> I've posted the minimal fixing patch.

I do not see it anywhere in rhkernel-list

Comment 12 Andrius Benokraitis 2010-10-14 20:09:05 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > > Ok, can you please isolate this minimal fixing patch and post it to RHKL under
> > > this BZnum? Thanks!
> > 
> > I've posted the minimal fixing patch.
> 
> I do not see it anywhere in rhkernel-list

Jiri, I believe he meant he posted it "here" in the BZ, but will post to rhkl in the morning. Is this too late?

Comment 13 Don Howard 2010-10-14 20:24:16 UTC
Hi Chad, Andrius -

Please post the minimal patch to rhkernel-list for review asap. Jiri needs to start the 5.5.z build tomorrow - the patch needs to be reviewed *today*.

Comment 14 Chad Dupuis (Cavium) 2010-10-14 20:27:14 UTC
(In reply to comment #13)
> Hi Chad, Andrius -
> 
> Please post the minimal patch to rhkernel-list for review asap. Jiri needs to
> start the 5.5.z build tomorrow - the patch needs to be reviewed *today*.

I just posted it for review.

Comment 16 Jiri Pirko 2010-10-16 09:05:14 UTC
in kernel 2.6.18-194.21.1.el5

linux-2.6-scsi-qla2xxx-correct-use-after-free-issue-in-terminate_rport_io-callback.patch

Comment 18 Martin George 2010-10-20 10:49:09 UTC
Chad,

After patching the RHEL 5.5.z host with your fix above (and with Mike Christie's reverted block state patch for resolving the RHEL5 regression bug 632195), I hit another panic due to a NULL pointer dereference at qla24xx_queuecommand:

Unable to handle kernel NULL pointer dereference at 0000000000000060 RIP: 
 [<ffffffff880ce477>] :qla2xxx:qla24xx_queuecommand+0x1be/0x1dd
PGD 0 
Oops: 0000 [1] SMP 
last sysfs file: /class/fc_remote_ports/rport-1:0-1/scsi_target_id
CPU 2 
Modules linked in: nfs fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_
Pid: 433, comm: scsi_wq_0 Not tainted 2.6.18-194.11.1.el5.oct14.unblock.ver3 #1
RIP: 0010:[<ffffffff880ce477>]  [<ffffffff880ce477>] :qla2xxx:qla24xx_queuecommand+0x1be/0x1dd
RSP: 0000:ffff81007e0eda50  EFLAGS: 00010002
RAX: 0000000000000002 RBX: ffff8100056ee080 RCX: 0000000000000190
RDX: ffff81007e0d8000 RSI: ffffffff880755a6 RDI: ffff81007e0d8060
RBP: ffff81007e5984f8 R08: 0000000000000286 R09: 0000000000000000
R10: ffff8100056ee140 R11: 0000000000000060 R12: ffff8100056ee080
R13: ffff81007e5984f8 R14: 0000000000000000 R15: ffffffff880755a6
FS:  0000000000000000(0000) GS:ffff81007ff1dec0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000060 CR3: 0000000030267000 CR4: 00000000000006e0
Process scsi_wq_0 (pid: 433, threadinfo ffff81007e0ec000, task ffff810037c1a100)
Stack:  ffff8100763f6048 ffff8100056ee080 ffff81007e598000 0000000000000287
 ffff8100763f6048 ffff810074b94178 ffff8100763f6048 ffffffff88075c61
 ffff810027f8e1d8 ffff8100056ee080 ffff810027f8e000 ffff81007e598000
Call Trace:
 [<ffffffff88075c61>] :scsi_mod:scsi_dispatch_cmd+0x26e/0x2ff
 [<ffffffff8807b260>] :scsi_mod:scsi_request_fn+0x2c1/0x390
 [<ffffffff80144fb3>] blk_execute_rq_nowait+0x86/0x9a
 [<ffffffff80145057>] blk_execute_rq+0x90/0xc0
 [<ffffffff8807aca5>] :scsi_mod:scsi_execute+0xd1/0xea
 [<ffffffff8807ad64>] :scsi_mod:scsi_execute_req+0xa6/0xcf
 [<ffffffff8807c05a>] :scsi_mod:scsi_probe_and_add_lun+0x207/0x9c9
 [<ffffffff8807ad37>] :scsi_mod:scsi_execute_req+0x79/0xcf
[1;51
 [<ffffffff8807d275>] :scsi_mod:__scsi_scan_target+0x58a/0x5c7
 [<ffffffff8008c78b>] dequeue_task+0x18/0x37
 [<ffffffff8807d55b>] :

Is this a new issue? Do you want me to file a separate bug for this?

Comment 19 Chad Dupuis (Cavium) 2010-10-20 12:47:35 UTC
> 
> Is this a new issue? Do you want me to file a separate bug for this?

Yes please, the signature of this bug looks completely different.  The stack trace indicates that this occurs during LUN scanning.

Comment 20 Martin George 2010-10-20 13:35:56 UTC
(In reply to comment #19)
> > 
> > Is this a new issue? Do you want me to file a separate bug for this?
> 
> Yes please, the signature of this bug looks completely different.  The stack
> trace indicates that this occurs during LUN scanning.

Done. Filed bug 644863 for the same.

Comment 23 errata-xmlrpc 2010-11-09 18:07:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0839.html

Comment 24 Martin Prpič 2010-11-11 14:03:45 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Kernel panic occurred on a Red Hat Enterprise Linux 5.5 FC host with a QLogic 8G FC adapter (QLE2562) while running IO with target controller faults. With this update, kernel panic no longer occurs in the aforementioned case.


Note You need to log in before you can comment on or make changes to this bug.