Bug 231319
Summary: | [QLogic 4.6 bug] Qlogic driver handles RSCN updates in a problematic way | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Josef Bacik <jbacik> | ||||||
Component: | kernel | Assignee: | Marcus Barrow <mbarrow> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4.0 | CC: | andrew.vasquez, andriusb, casmith, coughlan, dmair, emcnabb, hgarcia, jbaron, mbarrow, mceci, mchristi, michael.hagmann, pan_haifeng, poelstra, qlogic-redhat-ext | ||||||
Target Milestone: | --- | Keywords: | OtherQA | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHBA-2007-0791 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-11-15 16:21:30 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 217099 | ||||||||
Attachments: |
|
Description
Josef Bacik
2007-03-07 18:33:31 UTC
Created attachment 149474 [details]
time based failover for dm multipath
this is the time based failover patch that Mike Christie suggested on RHKL in
reference to this problem.
(In reply to comment #0) > Now in DM multipath there are things that you can do to get around this, ie > the queue if no path option, but AFAIK there is no such thing for EMC. > Just my 2 cents on this. If EMC does not have the exact same thing it is because how you handle errors isimplementation specific. If we can get some traces from EMC, they have their own path testing and failback scheme. DM decided to haandle the problem partially in userspace. Also the problem of error being propogated back to the FS layer when there are no paths is not limited to the qla2xxx RCSN problem. It occurs with any driver and any transport if there is a single point of failure and multipath layer decideds to fail IO to the FS layer instead of retrying it. For iscsi we have the same problem. If you put all your cables through one switch and reboot the switch, you will get errors on all paths and then if no path retry is set to fail the IO it will fail the IO when there are no paths. Created attachment 149734 [details]
use did imm retry instead of did bus busy
Here is the patch from Andrew Vasquez.
From Andrew:
Essentially it's a backport of changes done in our standard driver which swap
DID_BUS_BUSY statuses for DID_IMM_RETRY statuses in 'select' logic
paths -- those where the driver uses command recylcing during topology
disruptions.
Of course the usage of DID_IMM_RETRY implies some care, as to avoid infinite
retries. But, given the use of qla2xxx's own internal dev-loss-tmo timers,
command recycling will not proceed ad infinitum.
I'd suggest RH serious consider this for their RHEL4 qla2xxx driver.
I think this patch should be fine because as Andrew pointed it out the driver has timers so the command is not retried forever and he stated that: RSCN processing is typically very fast. The worse case fabric timeout one must worry about for any type of extended-link-service fabric command is 2 * R_A_TOV, where R_A_TOV is typically 10 seconds. So commands would not sit too long. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Marcus, Is this in your queue for 4.6? If not, please consider it a hight priority. Tom I will put the it in the queue. QLogic was not sure if the opinion fell in favor of including this. Internal Status set to 'Resolved' Status set to: Closed by Client Resolution set to: 'Closed by Client' This event sent from IssueTracker by robert.wehner issue 119734 The use imm retry patch was submitted to RHEL4.6 A patch addressing this issue has been included in kernel-2.6.9-55.19.EL. The reason that kernel package isn't signed is because it is an unofficial build on the way to RHEL 4.6 Beta. If you require an officially supported kernel with this fix prior to RHEL 4.6, please request a hotfix. *** Bug 180212 has been marked as a duplicate of this bug. *** A fix for this issue should have been included in the packages contained in the RHEL4.6 Beta released on RHN (also available at partners.redhat.com). Requested action: Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message to Issue Tracker and I will change the status for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. thanks for your update An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html |