Bug 576709
Summary: | [Cisco 5.6 bug] fnic: flush Tx queue bug fix | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Abhijeet Joglekar <abjoglek> | ||||
Component: | kernel | Assignee: | Mike Christie <mchristi> | ||||
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 5.6 | CC: | andriusb, coughlan, cward, james.brown, jwest, mchristi, mgahagan, savbu-lnx-drivers, Stuart.Kirk, tao, vbhamidi | ||||
Target Milestone: | rc | Keywords: | OtherQA, ZStream | ||||
Target Release: | 5.6 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
A host could crash during an SAN (storage area network) installation when using the Cisco 'fnic' driver. During driver initialization, an error in the 'fnic' driver caused it to flush the wrong queue. The flush code could then incorrectly access the memory and crash the host. With this update, the error in the 'fnic' driver has been fixed and crashed no longer occur.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-01-13 21:21:31 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 578328 | ||||||
Bug Blocks: | 557597, 580828, 580829 | ||||||
Attachments: |
|
Description
Abhijeet Joglekar
2010-03-24 21:31:36 UTC
This fix should also be included in 5.5-z series kernel. Will upload patch soon, and also submit to upstream. (In reply to comment #1) > This fix should also be included in 5.5-z series kernel. Will upload patch > soon, and also submit to upstream. Okay, since it can cause a crash, I will propose it for .z. Mike, it would be great if this gets committed to the 5.6 tree very early (right when it opens?) so that we have enough time to get this in the first 5.5.z. How is your workload to get this POSTed in the next week or two once we get it? Is this upstream (in James's tree at least)? If so could you send a git commit link? We need this for RHEL6 too right? If so then we should ping Rob since he has the fnic RHEL6.0 update (I traded that bug with him for some fc class one). Right, Cisco: Assuming this fix isn't in bug 570693 since this isn't upstream yet. (In reply to comment #5) > We need this for RHEL6 too right? > > If so then we should ping Rob since he has the fnic RHEL6.0 update Ah shoot. Rob sent this patch already. I guess we need to clone this for RHEL6. (In reply to comment #4) > Is this upstream (in James's tree at least)? If so could you send a git commit > link? You can also at the very least send me a link to the posting on linux-scsi. Created attachment 402956 [details]
Patch for fnic flush transmit queue issue
Adding links to the submission so I can track it later. http://www.open-fcoe.org/pipermail/devel/2010-March/010116.html http://www.open-fcoe.org/pipermail/devel/2010-March/010117.html Sorry, didn't get chance to reply to this earlier. Yes, we need to include in 6.0 too. Will create a bugzilla for that. thanks. Created 6.0 bug 578328 Symptom: Customer may see a host crash during SAN install using fnic driver Problem: ELS frames generated by libfc are queued by the fnic driver in a TX queue, until the fabric login is done and the adapter is set up in the correct mode. Once fabric login completes, and the FCID received from the fabric is programmed, the TX queue should be flushed and frames sent out on the wire. The issue is that driver is incorrectly flushing another queue instead of the TX queue. The buffers in these queues are aligned differently and so the flush code can access memory incorrectly and crash the host. Fix: Flush the Tx queue instead of the other queue. This fix will be present in 5.5-z series. It will also be provided by Cisco on a driver disk to replace the in-kernel driver during SAN install if customer does not want to upgrade to errata kernel. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Ryan, this is for the on-line 5.5 release notes. This is a shorter version of comment 16. Symptom: Possible host crash during SAN install using Cisco fnic driver. Problem: During driver initialization an error in the fnic driver causes it to flush the wrong queue. The flush code can access memory incorrectly and crash the host. Fix: There is a plan to ship a fixed fnic driver in a RHEL 5.5 errata. This fix will also be provided by Cisco on a driver disk, if needed. in kernel-2.6.18-197.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details. Abhijeet - URGENT - please test ASAP, your results are blocking the inclusion of this into RHEL 5.5.z. Thanks! Already forwarded the kernel link to QA yesterday, will forward them the urgent timeline requirement. Thanks! This fix was verified by QA. They ran the following test: 1) Install RHEL 5.5 GA bits (kernel 194) on a SAN array. Use a RHEL 5.5 driver disk that has the fnic driver 1.4.0.145 with the bug fix. Install goes through fine, driver goes into updates/fnic/fnic.ko and takes precedence over the inbox driver in kernel 194 (which had the tx queue flush bug) 2) Install kernel rpms 197. Then reboot the system in 197; the in-box driver in 197 now shows up as 1.4.0.145 from drivers/scsi/fnic/fnic.ko. Continue testing with this driver. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,10 +1 @@ -Ryan, this is for the on-line 5.5 release notes. +A host could crash during an SAN (storage area network) installation when using the Cisco 'fnic' driver. During driver initialization, an error in the 'fnic' driver caused it to flush the wrong queue. The flush code could then incorrectly access the memory and crash the host. With this update, the error in the 'fnic' driver has been fixed and crashed no longer occur.- -This is a shorter version of comment 16. - -Symptom: Possible host crash during SAN install using Cisco fnic driver. - -Problem: During driver initialization an error in the fnic driver causes it to flush the wrong queue. The flush -code can access memory incorrectly and crash the host. - -Fix: There is a plan to ship a fixed fnic driver in a RHEL 5.5 errata. This fix will also be provided by Cisco on a driver disk, if needed. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |