Red Hat Bugzilla – Bug 477202
oops in net_rx_action on double free of dev->poll_list
Last modified: 2010-10-23 02:40:39 EDT
Description of problem:
if dev->quota in net_rx_action becomes zero or less, and dev->poll() makes a call to netif_rx_complete, we will try to do another list_del back in net_rx_action, leading to an oops on the poinsoned pointers in dev->poll_list.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Created attachment 327468 [details]
This on-top of what's currently in RHEL4 seems like a reasonable fix.
Created attachment 327476 [details]
new version of same patch
quick update to andys patch. I forgot that my origional version included a chunk to do the __LINK_STATE_NETPOLL check in __netif_rx_complete as well as netif_rx_complete. This prevents us from needing to fix up dozens of drivers. We want to keep that one chunk. Everything else is unaltered
Looks good, Neil.
Thanks gospo, posted for review.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
Committed in 78.23.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Neil, any feeling for how reproducible this is or steps to trigger it?
This problem would happen every time you processed exactly the number of frames that were in your quota. Vivek seemed to be able to reproduce it pretty easily in RHTS after we added some new code that protected concurrent calls to poll_napi. I think he was doing a connectathon test with netconsole enabled.
yes, thats right. Andy hit the nail on the head. Not sure what it is about connectathon, but that did seem to have a 100% reproduction rate.
Thanks Andy & Neil - looks like connectathon tickles this particularly well then (we've had a test kernel running for a few months with all the earlier netpoll/bond patches included but have yet to see an occurrence of this oops).
Would it be possible to get a sample stack trace of this failure? It might be easier to match up to see if customers are encountering this problem.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.