Bug 658854

Summary: [NetApp 6.1 bug] RHEL6.0 FC host hits kernel panic at scsi_error_handler [rhel-6.0.z]
Product: Red Hat Enterprise Linux 6 Reporter: RHEL Program Management <pm-rhel>
Component: kernelAssignee: Frantisek Hrbata <fhrbata>
Status: CLOSED ERRATA QA Contact: Gris Ge <fge>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.0CC: bdonahue, cdupuis, coughlan, dhoward, fge, jmalanik, jwest, marting, mchristi, msnitzer, pm-eus, rajashekhar.a, revers, xdl-redhat-bugzilla, yi.zou
Target Milestone: rcKeywords: OtherQA, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-71.16.1.el6 Doc Type: Bug Fix
Doc Text:
A Red Hat Enterprise Linux 6.0 host (with root on a local disk) with dm-multipath configured on multiple LUNs (Logical Unit Number) hit kernel panic (at scsi_error_handler) with target controller faults during an I/O operation on the dm-multipath devices. This was caused by multipath using the blk_abort_queue() function to allow lower latency path deactivation. The call to blk_abort_queue proved to be unsafe due to a race (between blk_abort_queue and scsi_request_fn). With this update, the race has been resolved and kernel panic no longer occurs on Red Hat Enterprise Linux 6.0 hosts.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-22 17:38:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 636771    
Bug Blocks: 580566, 683532    

Description RHEL Program Management 2010-12-01 14:10:04 UTC
This bug has been copied from bug #636771 and has been proposed
to be backported to 6.0 z-stream (EUS).

Comment 7 Gris Ge 2011-02-16 06:41:03 UTC
Frantisek,

I failed to find the upstream commit 224cb3e981f1b2f9f93dbd49eaef505d17d894c2 as https://bugzilla.redhat.com/show_bug.cgi?id=636771#c36 mentioned in 2.6.32-71.18.1.el6.

static void deactivate_path(struct work_struct *work) is missing.

Comment 9 Gris Ge 2011-02-16 12:30:40 UTC
OK.
This patch just revert 224cb3e981f1b2f9f93dbd49eaef505d17d894c2.

Code reviewed. kernel 2.6.32-71.18.1.el6 has reverted the commit.

Set as Sanity Only.

Comment 10 Mike Snitzer 2011-02-16 14:47:57 UTC
(In reply to comment #9)
> OK.
> This patch just revert 224cb3e981f1b2f9f93dbd49eaef505d17d894c2.
> 
> Code reviewed. kernel 2.6.32-71.18.1.el6 has reverted the commit.
> 
> Set as Sanity Only.

You may have only done "Sanity Only" but FYI: Barry Donahue and I have done exhaustive testing (using NetAPp's test scripts) on NetApp storage in westford.

Comment 11 errata-xmlrpc 2011-02-22 17:38:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0283.html

Comment 12 Martin Prpič 2011-02-23 15:04:26 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A Red Hat Enterprise Linux 6.0 host (with root on a local disk) with dm-multipath configured on multiple LUNs (Logical Unit Number) hit kernel panic (at scsi_error_handler) with target controller faults during an I/O operation on the dm-multipath devices. This was caused by multipath using the blk_abort_queue() function to allow lower latency path deactivation. The call to blk_abort_queue proved to be unsafe due to a race (between blk_abort_queue and scsi_request_fn). With this update, the race has been resolved and kernel panic no longer occurs on Red Hat Enterprise Linux 6.0 hosts.

Comment 13 Don Howard 2011-04-13 17:20:08 UTC
*** Bug 692670 has been marked as a duplicate of this bug. ***