Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 531784

Summary: ipoib: null tx/rx_ring skb pointers on free
Product: Red Hat Enterprise Linux 5 Reporter: Issue Tracker <tao>
Component: kernelAssignee: Doug Ledford <dledford>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.4CC: cward, jtluka, kbaxley, peterm, tao
Target Milestone: rcKeywords: OtherQA
Target Release: 5.5   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 537153 (view as bug list) Environment:
Last Closed: 2010-03-30 07:31:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 537153    

Description Issue Tracker 2009-10-29 14:00:54 UTC
Escalated to Bugzilla from IssueTracker

Comment 1 Issue Tracker 2009-10-29 14:00:56 UTC
Event posted on 10-28-2009 06:14pm EDT by woodard

Clone of https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=355110

Description of problem:
 There is a very difficult to reproduce bug with IPoIB
 in which the data structures which keep track of the
 xmit/recv buffers get corrupted. This is apparently
 because when the sk_buffs are freed, a stale pointer
 is left in the rx/tx rings, and somehow those stale
 pointers are being used to free the sk_buffs again.
 (This explanation seems to invoke a firmware problem
 - i.e., some problem with the work completions seems
 necessary to make the observed crashes fit with the
 available data.)

 See:
   http://lists.openfabrics.org/pipermail/general/2008-May/050196.html
   http://lists.openfabrics.org/pipermail/general/2008-October/054824.html
How reproducible:
 Very hard.

Steps to Reproduce:
 We only ever got a couple of crashdumps.

Actual results:
 Kernel panic.

Expected results:

Additional info:
 The following patch was tested at LLNL; they reported
 that the patch fixed their problem and we haven't seen
 the same bug since we began using the patch, either.

http://lists.openfabrics.org/pipermail/general/2008-November/055242.html

I am working on a patch for RHEL.

This is also a concern for RHEL6.

--------------

On Wed, 28 Oct 2009 11:50:27 -0700
Al Chu <chu11> wrote:

> > I don't think so ...  Ira??
> > 
> > If we didn't, maybe we can bug Ben to open??

I did not bother RedHat as I thought there was a fix in the upstream OFED and
that RHEL would have taken it from there.  Perhaps this did not make it in the
upstream kernel, but only OFED?  <sigh>

Arthur, could you tell us the RHEL bug # which is filed.  Ben we need to make
sure this issue is fixed in RHEL 5.4.

Ira

> > 
> > Al
> > 
> > On Tue, 2009-10-27 at 17:55 -0700, Arthur Kepner wrote:
>> > > Hi Ira, Al,
>> > > 
>> > > This is a long shot, but with respect to the bug mentioned here:
>> > > 
>> > > http://*lists.openfabrics.org/pipermail/general/2008-November/055242.html
>> > > 
>> > > Did you folks happen to open a bug with RedHat? And, if so, can you 
>> > > let me know the bug number?
>> > > 
>> > > In case you are wondering about this odd request - it's because I'm 
>> > > trying to convince RedHat to take this patch. If anyone else has 
>> > > opened a similar RedHat bug it will improve my chances.
>> > > 
>> > > Thanks for any info.
>> > > 
> > -- 
> > Albert Chu
> > chu11
> > Computer Scientist
> > High Performance Systems Division
> > Lawrence Livermore National Laboratory
> > 


-- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 weiny2 
This event sent from IssueTracker by kbaxley  [LLNL (HPC)]
 issue 359586

Comment 2 RHEL Program Management 2009-10-30 19:31:44 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Chris Ward 2009-11-19 15:57:29 UTC
@GSS

We need to confirm that there is third-party commitment to 
test for the resolution of this request during the RHEL 5.5 
Beta Test Phase before we can approve it for acceptance 
into the release.

RHEL 5.5 Beta Test Phase is expected to begin around February
2010.

In order to avoid any unnecessary delays, please post a 
confirmation as soon as possible, including the contact 
information for testing engineers.

Any additional information about alternative testing variations we 
could use to reproduce this issue in-house would be appreciated.

Comment 8 Don Zickus 2009-12-09 18:12:09 UTC
in kernel-2.6.18-178.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.

Comment 11 errata-xmlrpc 2010-03-30 07:31:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html