Red Hat Bugzilla – Bug 643237
[NetApp 6.1 bug] regression: allow offlined devs to be set to running
Last modified: 2011-07-06 11:12:58 EDT
Thank you for your bug report. This issue was evaluated for inclusion in the current release of Red Hat Enterprise Linux. Unfortunately, we are unable to address this request in the current release. Because we are in the final stage of Red Hat Enterprise Linux 6 development, only significant, release-blocking issues involving serious regressions and data corruption can be considered. If you believe this issue meets the release blocking criteria as defined and communicated to you by your Red Hat Support representative, please ask your representative to file this issue as a blocker for the current release. Otherwise, ask that it be evaluated for inclusion in the next minor release of Red Hat Enterprise Linux.
Created attachment 454673 [details] revert patch that prevents changing device state from offline For the IO stuck in a queue problem, it is possible if the device was offlined by the scsi layer, then the fc class tried to delete it due to dev_loss_tmo, then the IO could get stuck in the scsi/block layer queue. It could have gone from blocked->offline->then it should go to the cancel and device delete state, but due to a bug in the scsi layer, it would not and the queues would not get started again so IO in the queue would be stuck with no timer running on it to unjam us. Attached is a patch that reverts the patch that caused the problem (note it is slightly different than the RHEL 5 one).
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Mike - any idea when this is getting POSTed?
Requesting 6.0.z due this is blocking NetApp's 6.0 GA cert.
(In reply to comment #6) > Mike - any idea when this is getting POSTed? I was trying to work on a proper upstream fix first. I think I can send the patch in this bz that just reverts the bad patch now and then open a new bz for the proper fix. So tomorrow/this-weekend.
(In reply to comment #9) > (In reply to comment #6) > > Mike - any idea when this is getting POSTed? > > I was trying to work on a proper upstream fix first. I think I can send the > patch in this bz that just reverts the bad patch now and then open a new bz for > the proper fix. So tomorrow/this-weekend. Yeah, I don't think NetApp can wait until the whole upstream set is set for 6.1 since this is blocking 6.0 cert (and needs 6.0.z) so it sounds like if you could POST just the patch that fixes this ASAP that would be great.
Patch(es) available on kernel-2.6.32-91.el6
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Prior to this update, when using Red Hat Enterprise Linux 6 with a qla4xxx driver and FC (Fibre Channel) drivers using the fc class, a device might have been put in the offline state due to a transport problem. Once the transport problem was resolved, the device was not usable until a user manually corrected the state. This update enables the transition from the offline state to the running state, thus, fixing the problem.
NetApp and Qlogic verified this patch. Code reivewed. Patch has been applied into kernel-2.6.32-120. Set as Sanity only.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html