Description of problem: Since updating to RHEL5.7-based openVZ kernel server crashes when using iptables -j REJECT with a device connected to a bridge. br_dev_xmit() saves address of network bridge in IPCB br_nf_local_out() does not clear it before return to IP stack ip_options_echo() called from icmp_send uses "dirty" IPCB and corrupts stack Similar issues was fixed in mainline kernels by Herbert Xy and it was backported correctly. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=patch;h=17762060c25590bfddd68cc1131f28ec720f405f http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=patch;h=6b1e960fdbd75dcd9bcc3ba5ff8898ff1ad30b6e However in this case deprecated hooks are used. ip_conntrack version 2.4 (8192 buckets, 65536 max) - 312 bytes per conntrack physdev match: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is deprecated and breaks other things, it will be removed in January 2007. See Documentation/feature-removal-schedule.txt for details. This doesn't affect you in case you're using it for purely bridged traffic. general protection fault: 0000 [1] SMP last sysfs file: CPU: 6 Modules linked in: ipt_MASQUERADE(U) xt_physdev(U) iptable_filter(U) iptable_nat(U) ip_nat(U) ip_conntrack(U) nfnetlink(U) ip_tables(U) ipt_REJECT(U) x_tables(U) xfrm_nalgo(U) crypto_api(U) vznetdev(U) vzmon(U) vzdev(U) bridge(U) loop(U) dm_multipath(U) scsi_dh(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) igb(U) 8021q(U) i7core_edac(U) pcspkr(U) edac_mc(U) dca(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_mem_cache(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_log(U) dm_mod(U) usb_storage(U) shpchp(U) cciss(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) Pid: 8693, comm: telnet Tainted: P --- 2.6.18-274.3.1.el5.028stab094.3debug #1 028stab094 RIP: 0060:[<ffffffff80278a34>] [<ffffffff80278a34>] icmp_send+0x760/0x761 RSP: 0068:ffff81033a43b7d0 EFLAGS: 00010296 RAX: ffff81033a43bfd8 RBX: 5a5a5a5a5a5a5a5a RCX: 0000000000000001 RDX: ffff81033c9d3000 RSI: 0000000000000001 RDI: ffffffff80278a03 RBP: 5a5a5a5a5a5a5a5a R08: 0000000000000286 R09: ffff81033da0c188 R10: ffff8101bd9dd5f8 R11: 0000000000000040 R12: 5a5a5a5a5a5a5a5a R13: 5a5a5a5a5a5a5a5a R14: 5a5a5a5a5a5a5a5a R15: 5a5a5a5a5a5a5a5a FS: 00002ac635bb06e0(0000) GS:ffff81033dc84560(0000) knlGS:0000000000000000 CS: 0060 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00002ac635728330 CR3: 000000033dd83000 CR4: 00000000000006a0 Process telnet (pid: 8693, veid=0, threadinfo ffff81033a43a000, task ffff81033c9d3000) Stack: 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 5a5a5a5a5a04575a 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a Call Trace: [<ffffffff88315117>] :ip_tables:ipt_do_table+0x293/0x2f8 [<ffffffff800e2e07>] zone_statistics+0x3e/0x6d [<ffffffff80037578>] nf_iterate+0x41/0x7d [<ffffffff8836ce42>] :bridge:br_nf_local_out_finish+0x0/0xb0 [<ffffffff8005bec8>] nf_hook_slow+0x78/0xdd [<ffffffff8836ce42>] :bridge:br_nf_local_out_finish+0x0/0xb0 [<ffffffff8836df4c>] :bridge:br_nf_local_out+0x269/0x28b [<ffffffff80032469>] dev_queue_xmit+0x0/0x4b3 [<ffffffff80037578>] nf_iterate+0x41/0x7d [<ffffffff88368217>] :bridge:br_forward_finish+0x0/0x69 [<ffffffff8005bec8>] nf_hook_slow+0x78/0xdd [<ffffffff88368217>] :bridge:br_forward_finish+0x0/0x69 [<ffffffff8836838b>] :bridge:__br_deliver+0x61/0x74 [<ffffffff88367317>] :bridge:br_dev_xmit+0xd3/0xe7 [<ffffffff80240d6d>] dev_hard_start_xmit+0x1b7/0x28a [<ffffffff80032847>] dev_queue_xmit+0x3de/0x4b3 [<ffffffff80035123>] ip_output+0x31f/0x368 [<ffffffff80037b8f>] ip_queue_xmit+0x594/0x5fd [<ffffffff800636f8>] restore_args+0x0/0x30 [<ffffffff80023ee9>] tcp_transmit_skb+0x730/0x768 [<ffffffff80059cf7>] tcp_connect+0x373/0x410 [<ffffffff802711a9>] tcp_v4_connect+0x7c1/0x9be [<ffffffff8005ebbc>] inet_stream_connect+0x94/0x23f [<ffffffff80237bee>] sys_connect+0x6c/0x9c [<ffffffff80264b90>] ip_setsockopt+0x22/0x78 [<ffffffff80069508>] trace_hardirqs_on_thunk+0x35/0x37 [<ffffffff80033b40>] release_sock+0x2f/0xed [<ffffffff80063166>] system_call+0x7e/0x83 Code: c3 41 56 41 55 41 54 55 48 89 fd 53 8b 87 88 00 00 00 89 c2 RIP [<ffffffff80278a34>] icmp_send+0x760/0x761 Version-Release number of selected component (if applicable): kernels after 2.6.18-274 are affected. Steps to Reproduce: 1. create bridge with 2 eth interfaces 2. add rule like "iptables -A OUTPUT -o vzbr0 -j REJECT" 3. generate some traffic to trigger this rule note: behavior depends on memory address of network bridge. In bad case it can corrupt stack and lead to host crash. In fact issue was reproduced on openVZ kernel however I thisnk this issue affects RHEL5 kernels too. Actual results: In bad case it can lead to host crash. Expected results: icmp reply Additional info: see details in attached log of crash dumps session and in http://bugzilla.openvz.org/show_bug.cgi?id=2047
Created attachment 530673 [details] log of crash dump session Log of crash dump session contains original oops message, and detailed dump of stack of crashed process. Then I've found address of incoming skb in stack (ffff81033da0c188) Its "sc" field contains address of bridge net_device struct (0xffff8101b4f66000) However ip_options_echo() uses this field as struct ip_options and corrupts stack.
Just to add that we see similar looking panics on one machine acting as a firewall with bridged interfaces. It panics frequenctly when running 2.6.18-274.7.1.el5 and is fine if we back off to 2.6.18-238.19.1.el5. I don't have the saved kernel errors to hand but they certainly mentioned icmp_send() in the RIP line from the log. None of our other systems currently use (much) in the way of bridges and they seem to be ok with 2.6.18-274.7.1.el5.
Do you need more details of the setup we have which causes the panics when using 2.6.18-274.7.1.el5 ? In particular we are not using any iptables --physdev* matches so the problem we see may not be quite the same as the original issue. We are using ebtables - in about the most trivial (if evil) way possible: # ebtables-save # Generated by ebtables-save v1.0 on Fri Nov 25 20:45:53 GMT 2011 *filter :INPUT ACCEPT :FORWARD DROP :OUTPUT ACCEPT -A FORWARD -j DROP And we are using a set of iptables rules which are altered dynamically according to client status etc... # iptables-save | wc -l 978 (I won't include the full thing here since it contains sensitive data) We are bridging about 22 VLAN sub-interfaces together as 4 bridges which are then routed and filtered using iptables along with other physical interfaces. (the reason for bridging stuff in a way which doesn't forward via the bridge is for client isolation, I could explain in detail but it might take several pages). With 2.6.18-274.7.1.el5 the system was fine in testing, and worked for about 6 hours before it froze - and then did the same after 2 or 3 hours and again after another 2 hours (during the busiest time of day with ~300 client machines connected) - so not every icmp_send is triggering a panic! Switching back to 2.6.18-238.19.1.el5 the system has been stable for 10 days, and before the update it had been happy (ignoring planned reboots) for about a year (we had a hardware problem on a RAID controller last December but that is unrelated)... -- Jon
Created attachment 536689 [details] backport of missing mainline commit 17762060c25590bfddd68cc1131f28ec720f405f
Created attachment 536690 [details] backport of missing mainline commit 87f94b4e91dc042620c527f3c30c37e5127ef757
Created attachment 536691 [details] backport of missing mainline commit 462fb2af9788a82a534f8184abfde31574e1cfa0
Created attachment 536692 [details] backport of missing mainline commit 6b1e960fdbd75dcd9bcc3ba5ff8898ff1ad30b6e
Created attachment 536693 [details] backport of missing mainline commit f8e9881c2aef1e982e5abc25c046820cd0b7cf64
Created attachment 536699 [details] backport of missing mainline commit cb68552858c64db302771469b1202ea09e696329
Created attachment 536700 [details] restore IPCB before return to IP stack via deferred physdev hooks
I expect 7 attached patches should completely resolve reported problem: first 6 patches are backports of missing mainline commits, last one my patch that restores IPCB before return to IP stack via deferred physdev hooks
I'm guessing that the fixes would not have made it into kernel-2.6.18-274.12.1.el5 - certainly the changelog doesn't seem to mention this issue so I don't want to risk using it on any production systems...
For affected people: As workaround you can try to use OpenVZ kernel 2.6.18-028stab096.1 http://wiki.openvz.org/Download/kernel/rhel5-testing/028stab096.1 We expect it includes all required fixes.
Might the el6 bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=770709 be related to this one? (I forgot to add a link in that direction so anyone finding one of these bugs would discover the other).
Seems so, Jonathan. Also for BZ #717407. Thanks.
Created attachment 552000 [details] bridge: Reset IPCB when entering IP stack on NF_FORWARD This is a backport of upstream commit: commit 6b1e960fdbd75dcd9bcc3ba5ff8898ff1ad30b6e Author: Herbert Xu <herbert.org.au> Date: Fri Mar 18 05:27:28 2011 +0000 bridge: Reset IPCB when entering IP stack on NF_FORWARD Please let me know whether it fixes the crash. Thanks!
Dear Herbert, your patch seems fixes BZ #770709, but it does not fixes this bug. in this case we have crashed without entering to br_nf_forward_ip(). br_dev_xmit __br_deliver br_nf_local_out reject and ipt_send in ipt_REJECT Also I would like to pay your attention that br_nf_pre_routing() and br_nf_dev_queue_xmit() are not patched, and it can lead to stack corruption too. Could you pleasetake look at my attachments in this bug? I believe I've backported all required fixes (including your last patch) thank you, Vasily Averin
Created attachment 552022 [details] bridge: Reset IPCB when entering IP stack Thanks Vasily. I have updated the patch to include local_out.
Dear Herbert, I think you need to patch at least br_nf_pre_routing() and br_nf_dev_queue_xmit() too. https://bugzilla.redhat.com/attachment.cgi?id=536689&action=edit https://bugzilla.redhat.com/attachment.cgi?id=536690&action=edit Do you prefer to wait until it will be reproduced? thank you, Vasily Averin
Created attachment 552041 [details] bridge: Reset IPCB when entering IP stack Thanks Vasily, I missed as I was looking at the RHEL6 source where they'd already been patched. Here is the updated patch.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: If the IP stack proper is accessed from bridge netfilter, the socket buffer needs to be in a form the IP stack expects. Previously, the entry point on the NF_FORWARD hook did not meet the requirements of the IP stack. Consequently, hosts could terminate unexpectedly. A backported upstream patch has been provided to address this issue and the crashes no longer occur in the described scenario.
reproduced in https://bugzilla.redhat.com/show_bug.cgi?id=804721#c6. Verified on kernel-2.6.18-338.el5. [root@hp-dl320g5-03 ~]# cat brloop.sh #!/bin/bash for i in {1..100}; do ifconfig eth0 0 ifconfig eth0 up ifconfig eth1 up brctl addbr br0 brctl addif br0 eth0 brctl addif br0 eth1 pkill -KILL dhclient dhclient br0 && ping 10.66.12.192 -I br0 -c 5 iptables -A OUTPUT -o br0 -j REJECT ping 10.66.12.192 -I br0 -c 5 iptables -F ifconfig br0 down brctl delbr br0 ifconfig eth0 up pkill -KILL dhclient && dhclient eth0 done and no call trace is found. Set Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0006.html