Description of problem: TCP connections stall after several megabytes. I monitored the traffic via ethereal on the outgoing interface and saw that tometimes TCP packets were sent that were larger than the interface MTU. (E.g 2800 byte ethernet frame being sent out of interface with MTU of 1500.) Version-Release number of selected component (if applicable): 2.6.17-1.2141_FC4, but seems to be dependent of the version at each end of the TCP connection, and is depends on which end opened the TCP connection (but not which way the data is flowing.) (Also observed with 2.6.17-1.2145_FC5.) How reproducible: Always Steps to Reproduce: 1. Create a large file, e.g. bigfile 2. scp bigfile othermachine:/dev/null 3. Watch it transfer a few megabytes and then enter a mode in which packets above MTU are sent but not acknowledged. Retransmission occur, but they are usually futile. Connection locks up. Actual results: TCP connection stalls. Expected results: TCP connection should be transferring (on my net this means typically about 500kbyte/second to 2mbyte/second transfers.) Additional info: I watched the connections via ethereal (I'll try to get some traces and attach 'em to this tomorrow.) Even though interface MTU was set at 1500, packets of over 2800 bytes were being sent (which, if they got onto the wire at all were subsequently lost as they tried to traverse a switch that was not capable of handling jumbograms.) TCP MTU probing was off (net.ipv4.tcp_mtu_probing = 0) Network was typical 10 and 100mbit full-duplex ethernet with consumer-grade switches (i.e. most can't do packets above about 1500 bytes) and Cisco 2621 routers. Problem also occurred when running across the internet. I reverted back to either 2.6.16-1.2115_FC4 or 2.6.16-1.2111_FC4smp and the problem went away. Sorry for being so vague. The problem only shows up on large transfers (several tens of megabytes at least), through a sequence of routers and switches, and only when at least one of the ends is a release later than 2.6.16-1.2111 or 2.6.16-1.2115. Some of the machines that exhibited the problem were running FC5 -2.6.17-1.2145_FC5 This showed up with scp and cvs updates. It smells like a TCP stack issue rather than an application issue. Courtesy of a blown circuit breaker, there was a full power cycle of all of the equipment involved - and the problem remained. The work around for me was to revert to older kernels.
Created attachment 132571 [details] Ethereal trace of stalling TCP connection with oversize packets (format is libpcap) This attachment is an ethereal dump of a tcp connection that stalls because of oversize packets. TCP sends oversize packets starting at frame 48. The interface MTU is 1500. The physical interface is an Intel e1000 that can send packets up to about 16100 bytes. The intervening networkn uses a variety of gear, most of which is not willing to accomodate packets bigger than the typical ~1500 byte variety found on most ethernets. The OS version on 192.202.17.213 is 2.6.17-1.2145_FC5. The OS version on 71.132.98.41 is 2.6.17-1.2142_FC4. The capture was made on machine 192.202.17.213.
It is possible that this bug is not a bug. Rather, it may be that I was being fooled by the TCP segment offloading mechanisms that are supported by the Intel Pro 1000 NICs in the machines I use. When I do "ethtool -k" it tells me that segment offloading is "on". However, there does seem to be some interaction between recent TCP kernel code and the inspectors in Cisco IOS-based firewall code. But I haven't been able to isolate it.
[This comment added as part of a mass-update to all open FC4 kernel bugs] FC4 has now transitioned to the Fedora legacy project, which will continue to release security related updates for the kernel. As this bug is not security related, it is unlikely to be fixed in an update for FC4, and has been migrated to FC5. Please retest with Fedora Core 5. Thank you.
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
This bug has been mass-closed along with all other bugs that have been in NEEDINFO state for several months. Due to the large volume of inactive bugs in bugzilla, this is the only method we have of cleaning out stale bug reports where the reporter has disappeared. If you can reproduce this bug after installing all the current updates, please reopen this bug. If you are not the reporter, you can add a comment requesting it be reopened, and someone will get to it asap. Thank you.