Bug 617268
Summary: | kernel crash in br_nf_pre_routing_finish | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Sachin Prabhu <sprabhu> | ||||||
Component: | kernel | Assignee: | Jiri Pirko <jpirko> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Network QE <network-qe> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 5.5 | CC: | bturner, cward, dag, dhoward, dtian, grimme, herbert.xu, hjia, jhughes, jpirko, lzheng, nhorman, rkhan, tao | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-01-13 21:44:28 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Sachin Prabhu
2010-07-22 16:00:03 UTC
this is fixed by upstream commit e94c67436efa22af7d8b7d19c885863246042543 I think. I'll work on a backport in the AM. To answer your question, turning off that sysctl value will cause any iptables rules that are attached to a bridge interface to get skipped with an NF_ACCEPT code. so basically, by turning that value to 0, you bypass any iptables rules for frames arriving into the system via br0. I'll backport your patch in the am and have something for you test soon I'm sorry, I take that last comment back, I misread, that doesnt fix this. Hey, while I'm looking at this, I note that you have the e1000e and bnx2 modules loaded on this system. Can you tell me if gro or lro is enabled on any of those interfaces? If it is, please turn it off via the available module options for those modules and see if the problem reproduces. OK I think this might be an upstream bug. It would appear that what had happend is that a bridge fragment has been reassembled with a non-bridge fragment, thus causing bridge to see a packet with nf_bridge == NULL. Again this just goes to show how the current bridge netfilter is broken by design. I will take this to netdev. Yes, one fragment came in on eth6 and one on eth7. Please confirm this by giving us the network topology of the box. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2620776 also, heres a debug kernel build. If the lro thing doesn't solve the problem, you can run this. It should cause the kernel to panic in the event that any context attempts to free or null the nf_bridge pointer of an skb while we're traversing the ip tables rules in that path. br_nf_pre_routing calls skb_share_check, which should ensure that we are the only user of this skb. My guess is that something in the iptables code is returning accept on that skb while at the same time using it for soemthing else, resulting in a null nf_bridge. This patch should more directly point to that culprit. Please try the lro settings first, and if that continnues to fail, move on to this build. Thanks! herbert, you seem to have figured this out way ahead of me. I'll just pass this over to you. Thanks! commit 8fa9ff6849bb86c59cc2ea9faadf3cb2d5223497 Author: Patrick McHardy <kaber> Date: Tue Dec 15 16:59:59 2009 +0100 netfilter: fix crashes in bridge netfilter caused by fragment jumps should fix the problem. We'll need to back port it. Created attachment 437645 [details]
first proposed patch
Created attachment 438988 [details]
second proposed patch
I have what seems to be the same kernel oops on an x86_64 VM server running on a Dell PowerEdge R710. The machine seems to oops every couple of days since I upgraded to the 5.5 kernels. I am also a CentOS Developer and I have rebuilt the 2.6.18-194.11.1.el5 x86_64 kernel with the patch from comment #30. I will provide feedback in a couple of days as to whether or not I continue to get the kernel oops. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in kernel-2.6.18-219.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. I have not tested the kernel-2.6.18-219.el5, but I did want to feedback that I have had no kernel oops since adding the patch in comment #30 to 2.6.18-194.11.1.el5 on our Dell PowerEdge R710. I have encountered the same problem on a IBM BladeCenter HS22. It happened a few times even when the system was seemingly not doing anything in particular. So we can reproduce this very easily. The work-around from the Red Hat Knowledgebase at: https://access.redhat.com/kb/docs/DOC-44616 hasn't caused a kernel panic, yet. More information when it is available :-) An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |