Bug 749813 - 2.6.18-274 stack corruption in icmp_send() (bridge/iptables)
Summary: 2.6.18-274 stack corruption in icmp_send() (bridge/iptables)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.7
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Herbert Xu
QA Contact: Weibing Zhang
URL:
Whiteboard:
Depends On:
Blocks: 743405 804721
TreeView+ depends on / blocked
 
Reported: 2011-10-28 14:27 UTC by Vasily Averin
Modified: 2018-11-29 21:36 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
If the IP stack proper is accessed from bridge netfilter, the socket buffer needs to be in a form the IP stack expects. Previously, the entry point on the NF_FORWARD hook did not meet the requirements of the IP stack. Consequently, hosts could terminate unexpectedly. A backported upstream patch has been provided to address this issue and the crashes no longer occur in the described scenario.
Clone Of:
Environment:
Last Closed: 2013-01-08 04:17:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log of crash dump session (16.00 KB, text/plain)
2011-10-28 14:38 UTC, Vasily Averin
no flags Details
backport of missing mainline commit 17762060c25590bfddd68cc1131f28ec720f405f (1.42 KB, patch)
2011-11-26 11:39 UTC, Vasily Averin
no flags Details | Diff
backport of missing mainline commit 87f94b4e91dc042620c527f3c30c37e5127ef757 (1.35 KB, patch)
2011-11-26 11:40 UTC, Vasily Averin
no flags Details | Diff
backport of missing mainline commit 462fb2af9788a82a534f8184abfde31574e1cfa0 (5.28 KB, patch)
2011-11-26 11:42 UTC, Vasily Averin
no flags Details | Diff
backport of missing mainline commit 6b1e960fdbd75dcd9bcc3ba5ff8898ff1ad30b6e (1.30 KB, patch)
2011-11-26 11:43 UTC, Vasily Averin
no flags Details | Diff
backport of missing mainline commit f8e9881c2aef1e982e5abc25c046820cd0b7cf64 (1.53 KB, patch)
2011-11-26 11:45 UTC, Vasily Averin
no flags Details | Diff
backport of missing mainline commit cb68552858c64db302771469b1202ea09e696329 (1.22 KB, patch)
2011-11-26 11:47 UTC, Vasily Averin
no flags Details | Diff
restore IPCB before return to IP stack via deferred physdev hooks (1.13 KB, patch)
2011-11-26 11:49 UTC, Vasily Averin
no flags Details | Diff
bridge: Reset IPCB when entering IP stack on NF_FORWARD (526 bytes, patch)
2012-01-11 04:31 UTC, Herbert Xu
no flags Details | Diff
bridge: Reset IPCB when entering IP stack (951 bytes, patch)
2012-01-11 07:13 UTC, Herbert Xu
no flags Details | Diff
bridge: Reset IPCB when entering IP stack (1.66 KB, patch)
2012-01-11 08:49 UTC, Herbert Xu
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:0006 0 normal SHIPPED_LIVE Red Hat Enterprise Linux 5.9 kernel update 2013-01-08 08:48:56 UTC

Description Vasily Averin 2011-10-28 14:27:52 UTC
Description of problem:
Since updating to RHEL5.7-based openVZ kernel server crashes when using iptables -j REJECT with a device connected to a bridge.

br_dev_xmit() saves address of network bridge in IPCB
br_nf_local_out() does not clear it before return to IP stack
ip_options_echo() called from icmp_send uses "dirty" IPCB and corrupts stack

Similar issues was fixed in mainline kernels by Herbert Xy and it was backported correctly.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=patch;h=17762060c25590bfddd68cc1131f28ec720f405f
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=patch;h=6b1e960fdbd75dcd9bcc3ba5ff8898ff1ad30b6e

However in this case deprecated hooks are used.

ip_conntrack version 2.4 (8192 buckets, 65536 max) - 312 bytes per conntrack
physdev match: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is deprecated and breaks other things, it will be removed in January 2007. See Documentation/feature-removal-schedule.txt for details. This doesn't affect you in case you're using it for purely bridged traffic.
general protection fault: 0000 [1] SMP
last sysfs file:
CPU: 6
Modules linked in: ipt_MASQUERADE(U) xt_physdev(U) iptable_filter(U) iptable_nat(U) ip_nat(U) ip_conntrack(U) nfnetlink(U) ip_tables(U) ipt_REJECT(U) x_tables(U) xfrm_nalgo(U) crypto_api(U) vznetdev(U) vzmon(U) vzdev(U) bridge(U) loop(U) dm_multipath(U) scsi_dh(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) igb(U) 8021q(U) i7core_edac(U) pcspkr(U) edac_mc(U) dca(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_mem_cache(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_log(U) dm_mod(U) usb_storage(U) shpchp(U) cciss(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
Pid: 8693, comm: telnet Tainted:  P     --- 2.6.18-274.3.1.el5.028stab094.3debug #1 028stab094
RIP: 0060:[<ffffffff80278a34>]  [<ffffffff80278a34>] icmp_send+0x760/0x761
RSP: 0068:ffff81033a43b7d0  EFLAGS: 00010296
RAX: ffff81033a43bfd8 RBX: 5a5a5a5a5a5a5a5a RCX: 0000000000000001
RDX: ffff81033c9d3000 RSI: 0000000000000001 RDI: ffffffff80278a03
RBP: 5a5a5a5a5a5a5a5a R08: 0000000000000286 R09: ffff81033da0c188
R10: ffff8101bd9dd5f8 R11: 0000000000000040 R12: 5a5a5a5a5a5a5a5a
R13: 5a5a5a5a5a5a5a5a R14: 5a5a5a5a5a5a5a5a R15: 5a5a5a5a5a5a5a5a
FS:  00002ac635bb06e0(0000) GS:ffff81033dc84560(0000) knlGS:0000000000000000
CS:  0060 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002ac635728330 CR3: 000000033dd83000 CR4: 00000000000006a0
Process telnet (pid: 8693, veid=0, threadinfo ffff81033a43a000, task ffff81033c9d3000)
Stack:  5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a
 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 5a5a5a5a5a04575a 5a5a5a5a5a5a5a5a
 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a
Call Trace:
 [<ffffffff88315117>] :ip_tables:ipt_do_table+0x293/0x2f8
 [<ffffffff800e2e07>] zone_statistics+0x3e/0x6d
 [<ffffffff80037578>] nf_iterate+0x41/0x7d
 [<ffffffff8836ce42>] :bridge:br_nf_local_out_finish+0x0/0xb0
 [<ffffffff8005bec8>] nf_hook_slow+0x78/0xdd
 [<ffffffff8836ce42>] :bridge:br_nf_local_out_finish+0x0/0xb0
 [<ffffffff8836df4c>] :bridge:br_nf_local_out+0x269/0x28b
 [<ffffffff80032469>] dev_queue_xmit+0x0/0x4b3
 [<ffffffff80037578>] nf_iterate+0x41/0x7d
 [<ffffffff88368217>] :bridge:br_forward_finish+0x0/0x69
 [<ffffffff8005bec8>] nf_hook_slow+0x78/0xdd
 [<ffffffff88368217>] :bridge:br_forward_finish+0x0/0x69
 [<ffffffff8836838b>] :bridge:__br_deliver+0x61/0x74
 [<ffffffff88367317>] :bridge:br_dev_xmit+0xd3/0xe7
 [<ffffffff80240d6d>] dev_hard_start_xmit+0x1b7/0x28a
 [<ffffffff80032847>] dev_queue_xmit+0x3de/0x4b3
 [<ffffffff80035123>] ip_output+0x31f/0x368
 [<ffffffff80037b8f>] ip_queue_xmit+0x594/0x5fd
 [<ffffffff800636f8>] restore_args+0x0/0x30
 [<ffffffff80023ee9>] tcp_transmit_skb+0x730/0x768
 [<ffffffff80059cf7>] tcp_connect+0x373/0x410
 [<ffffffff802711a9>] tcp_v4_connect+0x7c1/0x9be
 [<ffffffff8005ebbc>] inet_stream_connect+0x94/0x23f
 [<ffffffff80237bee>] sys_connect+0x6c/0x9c
 [<ffffffff80264b90>] ip_setsockopt+0x22/0x78
 [<ffffffff80069508>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff80033b40>] release_sock+0x2f/0xed
 [<ffffffff80063166>] system_call+0x7e/0x83


Code: c3 41 56 41 55 41 54 55 48 89 fd 53 8b 87 88 00 00 00 89 c2
RIP  [<ffffffff80278a34>] icmp_send+0x760/0x761

Version-Release number of selected component (if applicable):
kernels after 2.6.18-274 are affected.

Steps to Reproduce:
1. create bridge with 2 eth interfaces 
2. add rule like "iptables -A OUTPUT -o vzbr0 -j REJECT"
3. generate some traffic to trigger this rule
note: behavior depends on memory address of network bridge. In bad case it can corrupt stack and lead to host crash.

In fact issue was reproduced on openVZ kernel however I thisnk this issue affects RHEL5 kernels too.

Actual results:
In bad case it can lead to host crash.

Expected results:
icmp reply

Additional info: see details in attached log of crash dumps session and in 
http://bugzilla.openvz.org/show_bug.cgi?id=2047

Comment 1 Vasily Averin 2011-10-28 14:38:08 UTC
Created attachment 530673 [details]
log of crash dump session

Log of crash dump session contains original oops message,
and detailed dump of stack of crashed process.
Then I've found address of incoming skb in stack (ffff81033da0c188)
Its "sc" field contains address  of bridge net_device struct (0xffff8101b4f66000)
However ip_options_echo() uses this field as struct ip_options and corrupts stack.

Comment 2 Jonathan Peatfield 2011-11-18 16:03:02 UTC
Just to add that we see similar looking panics on one machine acting as a firewall with bridged interfaces.  It panics frequenctly when running 2.6.18-274.7.1.el5 and is fine if we back off to 2.6.18-238.19.1.el5.

I don't have the saved kernel errors to hand but they certainly mentioned icmp_send() in the RIP line from the log.  None of our other systems currently use (much) in the way of bridges and they seem to be ok with 2.6.18-274.7.1.el5.

Comment 3 Jonathan Peatfield 2011-11-25 20:59:21 UTC
Do you need more details of the setup we have which causes the panics when using 2.6.18-274.7.1.el5 ?

In particular we are not using any iptables --physdev* matches so the problem we see may not be quite the same as the original issue.

We are using ebtables - in about the most trivial (if evil) way possible:

# ebtables-save 
# Generated by ebtables-save v1.0 on Fri Nov 25 20:45:53 GMT 2011
*filter
:INPUT ACCEPT
:FORWARD DROP
:OUTPUT ACCEPT
-A FORWARD -j DROP

And we are using a set of iptables rules which are altered dynamically according to client status etc...

# iptables-save | wc -l
978

(I won't include the full thing here since it contains sensitive data)

We are bridging about 22 VLAN sub-interfaces together as 4 bridges which are then routed and filtered using iptables along with other physical interfaces. (the reason for bridging stuff in a way which doesn't forward via the bridge is for client isolation, I could explain in detail but it might take several pages).

With 2.6.18-274.7.1.el5 the system was fine in testing, and worked for about 6 hours before it froze - and then did the same after 2 or 3 hours and again after another 2 hours (during the busiest time of day with ~300 client machines connected) - so not every icmp_send is triggering a panic!

Switching back to 2.6.18-238.19.1.el5 the system has been stable for 10 days, and before the update it had been happy (ignoring planned reboots) for about a year (we had a hardware problem on a RAID controller last December but that is unrelated)...

 -- Jon

Comment 4 Vasily Averin 2011-11-26 11:39:45 UTC
Created attachment 536689 [details]
backport of missing mainline commit 17762060c25590bfddd68cc1131f28ec720f405f

Comment 5 Vasily Averin 2011-11-26 11:40:54 UTC
Created attachment 536690 [details]
backport of missing mainline commit 87f94b4e91dc042620c527f3c30c37e5127ef757

Comment 6 Vasily Averin 2011-11-26 11:42:27 UTC
Created attachment 536691 [details]
backport of missing mainline commit 462fb2af9788a82a534f8184abfde31574e1cfa0

Comment 7 Vasily Averin 2011-11-26 11:43:46 UTC
Created attachment 536692 [details]
backport of missing mainline commit 6b1e960fdbd75dcd9bcc3ba5ff8898ff1ad30b6e

Comment 8 Vasily Averin 2011-11-26 11:45:11 UTC
Created attachment 536693 [details]
backport of missing mainline commit f8e9881c2aef1e982e5abc25c046820cd0b7cf64

Comment 9 Vasily Averin 2011-11-26 11:47:18 UTC
Created attachment 536699 [details]
backport of missing mainline commit cb68552858c64db302771469b1202ea09e696329

Comment 10 Vasily Averin 2011-11-26 11:49:24 UTC
Created attachment 536700 [details]
restore IPCB before return to IP stack via deferred physdev hooks

Comment 11 Vasily Averin 2011-11-26 11:53:34 UTC
I expect 7 attached patches should completely resolve reported problem:
first 6 patches are backports of missing mainline commits, last one my patch that restores IPCB before return to IP stack via deferred physdev hooks

Comment 12 Jonathan Peatfield 2011-12-01 17:58:05 UTC
I'm guessing that the fixes would not have made it into kernel-2.6.18-274.12.1.el5 - certainly the changelog doesn't seem to mention this issue so I don't want to risk using it on any production systems...

Comment 13 Vasily Averin 2011-12-06 08:39:22 UTC
For affected people:
As workaround you can try to use OpenVZ kernel 2.6.18-028stab096.1
http://wiki.openvz.org/Download/kernel/rhel5-testing/028stab096.1

We expect it includes all required fixes.

Comment 15 Jonathan Peatfield 2012-01-03 17:58:10 UTC
Might the el6 bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=770709 be related to this one? (I forgot to add a link in that direction so anyone finding one of these bugs would discover the other).

Comment 16 Marcelo Ricardo Leitner 2012-01-03 18:39:37 UTC
Seems so, Jonathan. Also for BZ #717407. Thanks.

Comment 17 Herbert Xu 2012-01-11 04:31:20 UTC
Created attachment 552000 [details]
bridge: Reset IPCB when entering IP stack on NF_FORWARD

This is a backport of upstream commit:

commit 6b1e960fdbd75dcd9bcc3ba5ff8898ff1ad30b6e
Author: Herbert Xu <herbert.org.au>
Date:   Fri Mar 18 05:27:28 2011 +0000

    bridge: Reset IPCB when entering IP stack on NF_FORWARD

Please let me know whether it fixes the crash.

Thanks!

Comment 18 Vasily Averin 2012-01-11 06:22:29 UTC
Dear Herbert,
your patch seems fixes BZ #770709, but it does not fixes this bug.
in this case we have crashed without entering to br_nf_forward_ip().
br_dev_xmit
 __br_deliver
  br_nf_local_out
   reject and ipt_send in ipt_REJECT

Also I would like to pay your attention that br_nf_pre_routing() and br_nf_dev_queue_xmit() are not patched, and it can lead to stack corruption too.
Could you pleasetake look at my attachments in this bug? I believe I've backported all required fixes (including your last patch)

thank you,
Vasily Averin

Comment 19 Herbert Xu 2012-01-11 07:13:35 UTC
Created attachment 552022 [details]
bridge: Reset IPCB when entering IP stack

Thanks Vasily.  I have updated the patch to include local_out.

Comment 20 Vasily Averin 2012-01-11 08:42:52 UTC
Dear Herbert,
I think you need to patch at least br_nf_pre_routing() and br_nf_dev_queue_xmit() too.

https://bugzilla.redhat.com/attachment.cgi?id=536689&action=edit
https://bugzilla.redhat.com/attachment.cgi?id=536690&action=edit

Do you prefer to wait until it will be reproduced?

thank you,
Vasily Averin

Comment 21 Herbert Xu 2012-01-11 08:49:35 UTC
Created attachment 552041 [details]
bridge: Reset IPCB when entering IP stack

Thanks Vasily, I missed as I was looking at the RHEL6 source where they'd already been patched.  Here is the updated patch.

Comment 31 Tomas Capek 2012-04-18 12:43:30 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
If the IP stack proper is accessed from bridge netfilter, the socket buffer needs to be in a form the IP stack expects. Previously, the entry point on the NF_FORWARD hook did not meet the requirements of the IP stack. Consequently, hosts could terminate unexpectedly. A backported upstream patch has been provided to address this issue and the crashes no longer occur in the described scenario.

Comment 33 Weibing Zhang 2012-09-05 06:17:28 UTC
reproduced in https://bugzilla.redhat.com/show_bug.cgi?id=804721#c6.
Verified on kernel-2.6.18-338.el5.
[root@hp-dl320g5-03 ~]# cat brloop.sh 
#!/bin/bash
for i in {1..100};
do
	ifconfig eth0 0
	ifconfig eth0 up
	ifconfig eth1 up
	brctl addbr br0
	brctl addif br0 eth0
	brctl addif br0 eth1
	pkill -KILL dhclient
	dhclient br0 && ping 10.66.12.192 -I br0 -c 5
	iptables -A OUTPUT -o br0 -j REJECT
	ping 10.66.12.192 -I br0 -c 5
	iptables -F
	ifconfig br0 down
	brctl delbr br0

	ifconfig eth0 up
	pkill -KILL dhclient && dhclient eth0
done

and no call trace is found.
Set Verified.

Comment 35 errata-xmlrpc 2013-01-08 04:17:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0006.html


Note You need to log in before you can comment on or make changes to this bug.