Red Hat Bugzilla – Bug 198602
TCP connections stall, packets larger than MTU being sent.
Last modified: 2015-01-04 17:27:53 EST
Description of problem:
TCP connections stall after several megabytes.
I monitored the traffic via ethereal on the outgoing interface and saw that
tometimes TCP packets were sent that were larger than the interface MTU. (E.g
2800 byte ethernet frame being sent out of interface with MTU of 1500.)
Version-Release number of selected component (if applicable):
2.6.17-1.2141_FC4, but seems to be dependent of the version at each end of the
TCP connection, and is depends on which end opened the TCP connection (but not
which way the data is flowing.)
(Also observed with 2.6.17-1.2145_FC5.)
Steps to Reproduce:
1. Create a large file, e.g. bigfile
2. scp bigfile othermachine:/dev/null
3. Watch it transfer a few megabytes and then enter a mode in which packets
above MTU are sent but not acknowledged. Retransmission occur, but they are
usually futile. Connection locks up.
TCP connection stalls.
TCP connection should be transferring (on my net this means typically about
500kbyte/second to 2mbyte/second transfers.)
I watched the connections via ethereal (I'll try to get some traces and attach
'em to this tomorrow.) Even though interface MTU was set at 1500, packets of
over 2800 bytes were being sent (which, if they got onto the wire at all were
subsequently lost as they tried to traverse a switch that was not capable of
TCP MTU probing was off (net.ipv4.tcp_mtu_probing = 0)
Network was typical 10 and 100mbit full-duplex ethernet with consumer-grade
switches (i.e. most can't do packets above about 1500 bytes) and Cisco 2621
routers. Problem also occurred when running across the internet.
I reverted back to either 2.6.16-1.2115_FC4 or 2.6.16-1.2111_FC4smp and the
problem went away.
Sorry for being so vague. The problem only shows up on large transfers (several
tens of megabytes at least), through a sequence of routers and switches, and
only when at least one of the ends is a release later than 2.6.16-1.2111 or
Some of the machines that exhibited the problem were running FC5
This showed up with scp and cvs updates. It smells like a TCP stack issue
rather than an application issue.
Courtesy of a blown circuit breaker, there was a full power cycle of all of the
equipment involved - and the problem remained.
The work around for me was to revert to older kernels.
Created attachment 132571 [details]
Ethereal trace of stalling TCP connection with oversize packets (format is libpcap)
This attachment is an ethereal dump of a tcp connection that stalls because of
TCP sends oversize packets starting at frame 48.
The interface MTU is 1500. The physical interface is an Intel e1000 that can
send packets up to about 16100 bytes. The intervening networkn uses a variety
of gear, most of which is not willing to accomodate packets bigger than the
typical ~1500 byte variety found on most ethernets.
The OS version on 18.104.22.168 is 2.6.17-1.2145_FC5.
The OS version on 22.214.171.124 is 2.6.17-1.2142_FC4.
The capture was made on machine 126.96.36.199.
It is possible that this bug is not a bug.
Rather, it may be that I was being fooled by the TCP segment offloading
mechanisms that are supported by the Intel Pro 1000 NICs in the machines I use.
When I do "ethtool -k" it tells me that segment offloading is "on".
However, there does seem to be some interaction between recent TCP kernel code
and the inspectors in Cisco IOS-based firewall code. But I haven't been able to
[This comment added as part of a mass-update to all open FC4 kernel bugs]
FC4 has now transitioned to the Fedora legacy project, which will continue to
release security related updates for the kernel. As this bug is not security
related, it is unlikely to be fixed in an update for FC4, and has been migrated
Please retest with Fedora Core 5.
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.
Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.
This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.
Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.
In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed. See bug 207474 for further details.
If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.
If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.
This bug has been mass-closed along with all other bugs that
have been in NEEDINFO state for several months.
Due to the large volume of inactive bugs in bugzilla, this
is the only method we have of cleaning out stale bug reports
where the reporter has disappeared.
If you can reproduce this bug after installing all the
current updates, please reopen this bug.
If you are not the reporter, you can add a comment requesting
it be reopened, and someone will get to it asap.