Bug 477202 - oops in net_rx_action on double free of dev->poll_list
oops in net_rx_action on double free of dev->poll_list
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.8
All Linux
urgent Severity medium
: rc
: ---
Assigned To: Neil Horman
Martin Jenner
: ZStream
Depends On: 463815
Blocks: 479681
  Show dependency treegraph
 
Reported: 2008-12-19 11:52 EST by Neil Horman
Modified: 2010-10-23 02:40 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-18 15:10:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
netpoll-fix.patch (746 bytes, patch)
2008-12-19 12:16 EST, Andy Gospodarek
no flags Details | Diff
new version of same patch (1.48 KB, patch)
2008-12-19 13:15 EST, Neil Horman
no flags Details | Diff

  None (edit)
Description Neil Horman 2008-12-19 11:52:47 EST
Description of problem:
if dev->quota in net_rx_action becomes zero or less, and dev->poll() makes a call to netif_rx_complete, we will try to do another list_del back in net_rx_action, leading to an oops on the poinsoned pointers in dev->poll_list.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:
oops

Expected results:
no opps

Additional info:
Comment 1 Andy Gospodarek 2008-12-19 12:16:28 EST
Created attachment 327468 [details]
netpoll-fix.patch

This on-top of what's currently in RHEL4 seems like a reasonable fix.
Comment 2 Neil Horman 2008-12-19 13:15:35 EST
Created attachment 327476 [details]
new version of same patch

quick update to andys patch.  I forgot that my origional version included a chunk to do the __LINK_STATE_NETPOLL check in __netif_rx_complete as well as netif_rx_complete.  This prevents us from needing to fix up dozens of drivers.  We want to keep that one chunk.  Everything else is unaltered
Comment 3 Andy Gospodarek 2008-12-19 13:41:06 EST
Looks good, Neil.
Comment 4 Neil Horman 2008-12-22 09:38:57 EST
Thanks gospo, posted for review.
Comment 5 RHEL Product and Program Management 2008-12-23 11:19:17 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 6 Vivek Goyal 2009-01-05 09:20:34 EST
Committed in 78.23.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 7 Bryn M. Reeves 2009-01-06 11:14:16 EST
Neil, any feeling for how reproducible this is or steps to trigger it?
Comment 8 Andy Gospodarek 2009-01-06 11:24:55 EST
This problem would happen every time you processed exactly the number of frames that were in your quota.  Vivek seemed to be able to reproduce it pretty easily in RHTS after we added some new code that protected concurrent calls to poll_napi.  I think he was doing a connectathon test with netconsole enabled.
Comment 9 Neil Horman 2009-01-06 11:52:34 EST
yes, thats right.  Andy hit the nail on the head.  Not sure what it is about connectathon, but that did seem to have a 100% reproduction rate.
Comment 10 Bryn M. Reeves 2009-01-06 12:12:01 EST
Thanks Andy & Neil - looks like connectathon tickles this particularly well then (we've had a test kernel running for a few months with all the earlier netpoll/bond patches included but have yet to see an occurrence of this oops).
Comment 16 Rick Beldin 2009-02-13 09:10:40 EST
Would it be possible to get a sample stack trace of this failure?   It might be easier to match up to see if customers are encountering this problem.  

Thanks,

Rick
Comment 20 errata-xmlrpc 2009-05-18 15:10:29 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html

Note You need to log in before you can comment on or make changes to this bug.