Bug 531784
| Summary: | ipoib: null tx/rx_ring skb pointers on free | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Issue Tracker <tao> | |
| Component: | kernel | Assignee: | Doug Ledford <dledford> | |
| Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 5.4 | CC: | cward, jtluka, kbaxley, peterm, tao | |
| Target Milestone: | rc | Keywords: | OtherQA | |
| Target Release: | 5.5 | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 537153 (view as bug list) | Environment: | ||
| Last Closed: | 2010-03-30 07:31:00 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 537153 | |||
|
Description
Issue Tracker
2009-10-29 14:00:54 UTC
Event posted on 10-28-2009 06:14pm EDT by woodard Clone of https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=355110 Description of problem: There is a very difficult to reproduce bug with IPoIB in which the data structures which keep track of the xmit/recv buffers get corrupted. This is apparently because when the sk_buffs are freed, a stale pointer is left in the rx/tx rings, and somehow those stale pointers are being used to free the sk_buffs again. (This explanation seems to invoke a firmware problem - i.e., some problem with the work completions seems necessary to make the observed crashes fit with the available data.) See: http://lists.openfabrics.org/pipermail/general/2008-May/050196.html http://lists.openfabrics.org/pipermail/general/2008-October/054824.html How reproducible: Very hard. Steps to Reproduce: We only ever got a couple of crashdumps. Actual results: Kernel panic. Expected results: Additional info: The following patch was tested at LLNL; they reported that the patch fixed their problem and we haven't seen the same bug since we began using the patch, either. http://lists.openfabrics.org/pipermail/general/2008-November/055242.html I am working on a patch for RHEL. This is also a concern for RHEL6. -------------- On Wed, 28 Oct 2009 11:50:27 -0700 Al Chu <chu11> wrote: > > I don't think so ... Ira?? > > > > If we didn't, maybe we can bug Ben to open?? I did not bother RedHat as I thought there was a fix in the upstream OFED and that RHEL would have taken it from there. Perhaps this did not make it in the upstream kernel, but only OFED? <sigh> Arthur, could you tell us the RHEL bug # which is filed. Ben we need to make sure this issue is fixed in RHEL 5.4. Ira > > > > Al > > > > On Tue, 2009-10-27 at 17:55 -0700, Arthur Kepner wrote: >> > > Hi Ira, Al, >> > > >> > > This is a long shot, but with respect to the bug mentioned here: >> > > >> > > http://*lists.openfabrics.org/pipermail/general/2008-November/055242.html >> > > >> > > Did you folks happen to open a bug with RedHat? And, if so, can you >> > > let me know the bug number? >> > > >> > > In case you are wondering about this odd request - it's because I'm >> > > trying to convince RedHat to take this patch. If anyone else has >> > > opened a similar RedHat bug it will improve my chances. >> > > >> > > Thanks for any info. >> > > > > -- > > Albert Chu > > chu11 > > Computer Scientist > > High Performance Systems Division > > Lawrence Livermore National Laboratory > > -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 weiny2 This event sent from IssueTracker by kbaxley [LLNL (HPC)] issue 359586 This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. @GSS We need to confirm that there is third-party commitment to test for the resolution of this request during the RHEL 5.5 Beta Test Phase before we can approve it for acceptance into the release. RHEL 5.5 Beta Test Phase is expected to begin around February 2010. In order to avoid any unnecessary delays, please post a confirmation as soon as possible, including the contact information for testing engineers. Any additional information about alternative testing variations we could use to reproduce this issue in-house would be appreciated. in kernel-2.6.18-178.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html |