Bug 724862 - Bridging behavior is different than older versions; promisc mode fixes one problem and creates another problem.
Summary: Bridging behavior is different than older versions; promisc mode fixes one pr...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 14
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-22 04:32 UTC by Greg Scott
Modified: 2012-08-16 13:51 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-08-16 13:51:49 UTC
Type: Bug


Attachments (Terms of Use)

Description Greg Scott 2011-07-22 04:32:44 UTC
Description of problem:

Running Fedora 14 with kernel  2.6.35.6-48.fc14.i686.PAE.  I have a complex iptables script developed several years ago that generates a firewall ruleset.  

This particular site is bridged.  Physical eth0 connects to the Internet, physical eth1 connects to the LAN.  Bridge br0 bridges physical eth0 and eth1.  The site needs to be bridged because some devices in the LAN need real, public IP Addresses and proxy ARP is not a desirable solution.  Bridging is just cleaner. 

Some internal users need to access internally hosted websites using external, public IP Addresses.  This is important because I need to provide internal users the same experience as the rest of the world when acccessing the organization's website.  To make this work, an internal user sends a web request to aaa.bbb.115.147 on TCP port 80. I have an iptables PREROUTING rule to DNAT this to 192.168.10.2 and a POSTROUTING rule to MASQUERADE the request.  This essentially sets up a "router on a stick" environment for my internal users.  

I recently upgraded this site to a system based on Fedora 14 and this broke my rules above.  Troubleshooting, I noticed that the above rules worked as expected only when I was watching with tcpdump.  When I stopped watching with tcpdump, the rules broke.  Sure enough, using tcpdump -p (don't set the interface to promiscuous mode), the packets died.  They came in on br0/eth1, but never came back out again.  

Setting:

ip link set br0 promisc on

worked around the problem.  

Unfortunately, this created another problem - it broke my NAT entries for PPTP VPNs, and possibly other NATed services I offer.  For the PPTP VPNs, I use ip_conntrack_pptp and ip_nat_pptp.  In POSTROUTING, I SNAT anything going out with IP Protocol 47 to my PPTP server.  And in PREROUTING, I DNAT anything for TCP 1723 and IP Protocol 47 to my PPTP server.  

Whenever anyone tries an inbound PPTP connection, watching with tcpdump, I see a storm of packets flying everywhere, but the user only sees a very very very long timeout waiting for authentication.  I suspect the bridge is forwarding frames out the wrong physical ethnnn interface.  

When I do this:

ip link set br0 promisc off

then my PPTP VPNs work as expected.  But this breaks my "router on a stick" rules.  


How reproducible:

At will.

Steps to Reproduce:
1.  Set up bridge br0 bridging eth0 and eth1.  
2.  Set up an internal website and an internal PPTP server.  These can be on the same server platform.  
3.  Generate appropriate iptables rules to SNAT/DNAT PPTP traffic and HTTP traffic.  See the relevant ruleset extract below.
4.  ip link set br0 promisc off
5.  Try to access the internal website using its external IP Address.  Try an inbound PPTP VPN.  The website access will fail, inbound PPTP VPN will work.
6.  ip link set br0 promisc on
7.  Try the internal website and inbound PPTP VPN again.  Now the website will drop and the PPTP VPN will work.    


Actual results:

Turning on promisc mode fixes the "router on a stick" problem and breaks inbound PPTP VPNs.  Turning off promisc mode fixes inbound PPTP and breaks the "router on a stick" rules.

Expected results:

It should all just work the same as it did with earlier versions. Tinkering with promisc mode by hand was a troubleshooting step.


Additional info:

Here are the relevant iptables rules, edited to obfuscate the customer site:

I mark all packets coming in on eth0 with 1 (bit 0 set).  These are from the Internet.  I mark all packets on eth1 with a 2 (bit 1 set).  These are from the internal LAN.

[root@ehac-fw2011 firewall-scripts]# /sbin/ebtables -t broute -L --Ln --Lc
Bridge table: broute

Bridge chain: BROUTING, entries: 2, policy: ACCEPT
1. -i eth0 -j mark --mark-set 0x1 --mark-target CONTINUE, pcnt = 4974258 -- bcnt = 3247704031
2. -i eth1 -j mark --mark-set 0x2 --mark-target CONTINUE, pcnt = 6945040 -- bcnt = 943150968
[root@ehac-fw2011 firewall-scripts]#

There are no ebtables filter rules.
[root@ehac-fw2011 firewall-scripts]# /sbin/ebtables -t filter -L --Ln --Lc
Bridge table: filter

Bridge chain: INPUT, entries: 0, policy: ACCEPT

Bridge chain: FORWARD, entries: 0, policy: ACCEPT

Bridge chain: OUTPUT, entries: 0, policy: ACCEPT
[root@ehac-fw2011 firewall-scripts]#


Here are the relevant iptables NAT rules.

[root@ehac-fw2011 firewall-scripts]# iptables -L -n -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
.
.
.
DNAT       tcp  --  0.0.0.0/0            aaa.bbb.115.147      mark match 0x1 tcp dpt:1723 to:192.168.10.2
DNAT       47   --  0.0.0.0/0            aaa.bbb.115.147      mark match 0x1 to:192.168.10.2
.
.
.
DNAT       tcp  --  0.0.0.0/0            aaa.bbb.115.151      tcp dpt:80 to:192.168.10.8

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           mark match 0x1
.
.
.
SNAT       47   --  192.168.10.2         0.0.0.0/0           to:aaa.bbb.115.147
SNAT       tcp  --  192.168.10.2         0.0.0.0/0           tcp dpt:1723 to:aaa.bbb.115.147
.
.
.
MASQUERADE  tcp  --  192.168.10.0/24      aaa.bbb.115.151      tcp dpt:80
MASQUERADE  all  --  192.168.10.0/24      0.0.0.0/0


And relevant iptables filtering rules:

[root@ehac-fw2011 firewall-scripts]# iptables -L -n -v
Chain INPUT (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
 .
.
.
Chain FORWARD (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 REJECT     all  --  *      *       0.0.0.0/0            aaa.bbb.115.159      reject-with icmp-port-unreachable
71010 6714K REJECT     all  --  *      *       0.0.0.0/0            192.168.10.255      reject-with icmp-port-unreachable
 3226 1069K REJECT     all  --  *      *       0.0.0.0/0            255.255.255.255     reject-with icmp-port-unreachable
  16M 7293M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED
 436K   33M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           mark match 0x2
.
.
.
    0     0 allowed    tcp  --  *      *       0.0.0.0/0            192.168.10.2        tcp dpt:1723
    0     0 ACCEPT     47   --  *      *       0.0.0.0/0            192.168.10.2
.
.
.
 4921  274K allowed    tcp  --  *      *       0.0.0.0/0            192.168.10.8        tcp dpt:80
 9324  492K allowed    tcp  --  *      *       0.0.0.0/0            .
.
.
    0     0 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0           LOG flags 0 level 4
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
.
.
.
Chain allowed (19 references)
 pkts bytes target     prot opt in     out     source               destination
29279 1614K ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp flags:0x17/0x02
  164 24630 LOG        tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           LOG flags 0 level 4 prefix `Malformed packet! '
  164 24630 DROP       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0

Comment 1 Soren Hansen 2012-05-22 20:22:08 UTC
I'm having this same problem. Have you found any other references for it?

Comment 2 Greg Scott 2012-05-22 21:19:47 UTC
No - no other references, just total deafening silence on this issue.  I've also tried posting it to the netdev email forum but the best feedback from there is, use proxy ARP instead of bridging.

FWIW, thankx - you made me feel better because now I'm not the only one seeing the problem.

Comment 3 Soren Hansen 2012-05-22 21:21:31 UTC
FWIW, after posting my comment here, I found this:

  http://marc.info/?l=linux-bridge&m=129719385113107&w=4

I've e-mailed the author of that e-mail to ask him if he worked anything out. I'll update this bug if I get a response.

Comment 4 Fedora End Of Life 2012-08-16 13:51:57 UTC
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping


Note You need to log in before you can comment on or make changes to this bug.