Bug 643237

Summary: [NetApp 6.1 bug] regression: allow offlined devs to be set to running
Product: Red Hat Enterprise Linux 6 Reporter: Mike Christie <mchristi>
Component: kernelAssignee: Mike Christie <mchristi>
Status: CLOSED ERRATA QA Contact: Gris Ge <fge>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: bdonahue, coughlan, dhoward, fge, jwest, marting, xdl-redhat-bugzilla
Target Milestone: rcKeywords: ZStream
Target Release: 6.1   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-91.el6 Doc Type: Bug Fix
Doc Text:
Prior to this update, when using Red Hat Enterprise Linux 6 with a qla4xxx driver and FC (Fibre Channel) drivers using the fc class, a device might have been put in the offline state due to a transport problem. Once the transport problem was resolved, the device was not usable until a user manually corrected the state. This update enables the transition from the offline state to the running state, thus, fixing the problem.
Story Points: ---
Clone Of: 641193 Environment:
Last Closed: 2011-05-19 12:21:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 641193    
Bug Blocks: 660590    
Attachments:
Description Flags
revert patch that prevents changing device state from offline none

Comment 2 RHEL Program Management 2010-10-15 03:28:38 UTC
Thank you for your bug report. This issue was evaluated for inclusion
in the current release of Red Hat Enterprise Linux. Unfortunately, we
are unable to address this request in the current release. Because we
are in the final stage of Red Hat Enterprise Linux 6 development, only
significant, release-blocking issues involving serious regressions and
data corruption can be considered.

If you believe this issue meets the release blocking criteria as
defined and communicated to you by your Red Hat Support representative,
please ask your representative to file this issue as a blocker for the
current release. Otherwise, ask that it be evaluated for inclusion in
the next minor release of Red Hat Enterprise Linux.

Comment 3 Mike Christie 2010-10-20 21:03:39 UTC
Created attachment 454673 [details]
revert patch that prevents changing device state from offline

For the IO stuck in a queue problem, it is possible if the device was offlined
by the scsi layer, then the fc class tried to delete it due to dev_loss_tmo,
then the IO could get stuck in the scsi/block layer queue. It could have gone
from blocked->offline->then it should go to the cancel and device delete state,
but due to a bug in the scsi layer, it would not and the queues would not get
started again so IO in the queue would be stuck with no timer running on it to
unjam us.

Attached is a patch that reverts the patch that caused the problem (note it is
slightly different than the RHEL 5 one).

Comment 4 RHEL Program Management 2010-10-20 23:09:44 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 6 Andrius Benokraitis 2010-12-02 15:32:47 UTC
Mike - any idea when this is getting POSTed?

Comment 7 Andrius Benokraitis 2010-12-02 17:01:05 UTC
Requesting 6.0.z due this is blocking NetApp's 6.0 GA cert.

Comment 9 Mike Christie 2010-12-03 03:22:19 UTC
(In reply to comment #6)
> Mike - any idea when this is getting POSTed?

I was trying to work on a proper upstream fix first. I think I can send the patch in this bz that just reverts the bad patch now and then open a new bz for the proper fix. So tomorrow/this-weekend.

Comment 10 Andrius Benokraitis 2010-12-03 15:58:36 UTC
(In reply to comment #9)
> (In reply to comment #6)
> > Mike - any idea when this is getting POSTed?
> 
> I was trying to work on a proper upstream fix first. I think I can send the
> patch in this bz that just reverts the bad patch now and then open a new bz for
> the proper fix. So tomorrow/this-weekend.

Yeah, I don't think NetApp can wait until the whole upstream set is set for 6.1 since this is blocking 6.0 cert (and needs 6.0.z) so it sounds like if you could POST just the patch that fixes this ASAP that would be great.

Comment 12 Aristeu Rozanski 2010-12-15 16:07:47 UTC
Patch(es) available on kernel-2.6.32-91.el6

Comment 15 Martin Prpič 2011-02-23 15:05:20 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Prior to this update, when using Red Hat Enterprise Linux 6 with a qla4xxx driver and FC (Fibre Channel) drivers using the fc class, a device might have been put in the offline state due to a transport problem. Once the transport problem was resolved, the device was not usable until a user manually corrected the state. This update enables the transition from the offline state to the running state, thus, fixing the problem.

Comment 18 Gris Ge 2011-03-11 06:23:13 UTC
NetApp and Qlogic verified this patch.

Code reivewed. Patch has been applied into kernel-2.6.32-120.

Set as Sanity only.

Comment 19 errata-xmlrpc 2011-05-19 12:21:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html