From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020809 Description of problem: I have a CIPE tunnel connecting my home gateway to a machine at the university, that I use, among many other things, to get transparent access to the news server in there. The hardware set up goes like this: my desktop is connected to my home gateway through 100MBPS ethernet, with a switching hub in between. it runs (null) with kernel 2.4.18-12.2. my home gateway connects to the outside world through pppoe ADSL. it has a cipe tunnel to the server of the lab I run at the uni, and it routes all traffic to IP addresses within the uni through this channel (except to the server itself). it currently runs Valhalla with kernel 2.4.18-10, but I first noticed the problem with kernel 2.4.18-5. the lab server at the uni masquerades packets it forwards that came from the cipe channel. it's in a 100MBPS network, plugged to a switch that is part of the routing infrastructure of the institute. it runs Valhalla, still with kernel 2.4.18-5. When it ran Enigma, the problem didn't show up. there are at least two such switches between the lab server and the news server, and possibly a router, but they don't show up in traceroute. the news server run some version of Red Hat Linux, probably Enigma, and probably not with all the most recent patches, but it doesn't matter too much. The root of the problem is that the switching network drops PMTUD packets. I see with tcpdump that the lab server receives non-fragmentable packets that are too big to go through the cipe channel, and it replies with ICMP packets indicating the MTU of the cipe channel, but such packets never get back to the news server, that doesn't have any packet filtering enabled. Unfortunately, IS at the institute can't seem to find any settings in the routing hardware to let such packets get through, so I have to fix the problem elsewhere. Back when my home network run Valhalla but the lab server ran Enigma, everything worked just fine. As soon as I upgraded the lab server to Valhalla, I started getting dead connections to the news server. Searching the internet, I found out that setting the MSS of the route to a smaller value would fix the problem and, indeed, limiting to 1440 the MSS of the route from my home gateway to the other end of the cipe channel fixed it. Until I upgraded my desktop to limbo2. At that point, it started getting dead connections again, and no amount of limiting in the MSS seemed to help. I ended up limiting its own MTU to 1442, but I don't understand why the MSS work around, that would allow my local network to communicate more efficiently while still fixing connections through the cipe channel, no longer works. Any ideas? Is MSS not supposed to limit the TCP window size of all connections going through it? Why would upgrading one of the ends of the connection make any difference? Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.connect one machine running Enigma (lab server) to another running say Enigma (news server), such that ICMP PMTUD packets don't reach the Enigma machine 2.connect this machine running Enigma (lab server) to one running Valhalla (home gateway), and set up a cipe connection between them, such that traffic to and from news server goes through the cipe channel 3.connect this machine running Valhalla (home gateway) to yet another one running Valhalla (desktop) 4.set up a server on the news server that, when connected to, sends a large packet 5.connect to the server from the desktop 6.watch the network traffic between the news server and the lab server (everything works) 7.upgrade the lab server to valhalla 8.connect again 9.watch the network traffic, and notice the large packet doesn't get through 10.change the route settings in the home gateway such that the route to the news server has an MSS of 1440 11.connect again 12.watch the network traffic, and see it works 13.upgrade the desktop to limbo2 or (null) (perhaps limbo 1 too, I didn't try that) 14.connect once again 15.watch the network traffic, and see the packet doesn't get through 16.set the mtu of the desktop to 1442 17.connect again 18.it works again, but the local network it's less efficient Actual Results: Already summarized above. Expected Results: I'd have hoped the MSS setting in the gateway would have taken care of the problem. Why did it stop working when the desktop was upgraded to limbo2? Additional info:
A new bit of info just came in: the problem seems to be caused by the lack of MASQuerading in ICMP need to fragment packets. The IP address of the target machine remains unchanged, instead of being masqueraded, so the sender of the big packet has no way to tell on which connections to lower the mtu.
And iptables' TCPMSS rules seem to be able to `fix' the problem for me. I'll only be able to tell for sure after all PMTUs get dropped from caches all over, but it's looking like it's exactly the right solution for the problem. Unless a connection is set up before the MTU is discovered, in which case there's not much that can be done...
I can confirm that adding a line such as this to my gateway's iptables configuration file fixes the problem: [0:0] -A FORWARD -o cipcb0 -p tcp -m tcp --syn -j TCPMSS --clamp-mss-to-pmtu
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/