Bug 168329 - Dropped packets with direct socket, but not with NAT packets
Summary: Dropped packets with direct socket, but not with NAT packets
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 3
Hardware: athlon
OS: Linux
medium
medium
Target Milestone: ---
Assignee: David Miller
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-09-15 00:41 UTC by Jonathan Larmour
Modified: 2007-11-30 22:11 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-21 03:38:26 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
tcpdump showing packet loss from direct connection to remote host (18.46 KB, text/plain)
2005-09-15 00:45 UTC, Jonathan Larmour
no flags Details
tcpdump showing no real packet loss from connection behind firewall to remote host (7.05 KB, text/plain)
2005-09-15 00:47 UTC, Jonathan Larmour
no flags Details

Description Jonathan Larmour 2005-09-15 00:41:12 UTC
Description of problem:

I set up my FC3 machine as my firewalling gateway when a previous machine died.
Network throughput seemed poor, and some sites as good as stopped working
altogether. Examining a tcpdump of the poor connections showed something very
odd: incoming packets were frequently getting dropped. But if I connect from a
machine _behind_ the firewall, the incoming packets don't get dropped. I have a
reliable broadband connection.

TCP pushes seem to get through okay. It appears to be when we've got past slow
start, and the remote host tries to send multiple packets at once, that the
packets get dropped. Here's a sample of a tcpdump from a connection to a remote
news server,  requesting the news active file, just with "telnet
news.individual.net nntp". The push at the top is me sending the "list active" line:
00:51:19.644839 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145 >
individual.net.nntp: P 2552681086:2552681099(13) ack 3452392115 win 6432
<nop,nop,timestamp 273176760 3387426>
00:51:19.714611 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145: . ack 13 win 49152
<nop,nop,timestamp 3387447 273176760>
00:51:19.718526 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145: . 1:1449(1448) ack 13 win 49152
<nop,nop,timestamp 3387447 273176760>
00:51:19.718652 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145 >
individual.net.nntp: . ack 1449 win 8688 <nop,nop,timestamp 273176834 3387447>
00:51:19.727559 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145: . 1449:2897(1448) ack 13 win
49152 <nop,nop,timestamp 3387447 273176760>
00:51:19.727690 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145 >
individual.net.nntp: . ack 2897 win 11584 <nop,nop,timestamp 273176843 3387447>
00:51:19.730451 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145: . 4097:5545(1448) ack 13 win
49152 <nop,nop,timestamp 3387447 273176760>
00:51:19.730566 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145 >
individual.net.nntp: . ack 2897 win 11584 <nop,nop,timestamp 273176846
3387447,nop,nop,sack sack 1 {4097:5545} >
00:51:19.772992 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145: . 5545:6993(1448) ack 13 win
49152 <nop,nop,timestamp 3387447 273176760>
00:51:19.773116 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145 >
individual.net.nntp: . ack 2897 win 11584 <nop,nop,timestamp 273176888
3387447,nop,nop,sack sack 1 {4097:6993} >
00:51:19.778491 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145: . 8441:9889(1448) ack 13 win
49152 <nop,nop,timestamp 3387447 273176760>
00:51:19.778618 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145 >
individual.net.nntp: . ack 2897 win 11584 <nop,nop,timestamp 273176894
3387447,nop,nop,sack sack 2 {8441:9889}{4097:6993} >
00:51:19.794663 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145: . 9889:11337(1448) ack 13 win
49152 <nop,nop,timestamp 3387447 273176760>
00:51:19.794764 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145 >
individual.net.nntp: . ack 2897 win 11584 <nop,nop,timestamp 273176910
3387447,nop,nop,sack sack 2 {8441:11337}{4097:6993} >
00:51:19.803763 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145: . 11337:12785(1448) ack 13 win
49152 <nop,nop,timestamp 3387447 273176760>
00:51:19.803809 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145 >
individual.net.nntp: . ack 2897 win 11584 <nop,nop,timestamp 273176919
3387447,nop,nop,sack sack 2 {8441:12785}{4097:6993} >
00:51:19.853043 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145: . 2897:4345(1448) ack 13 win
49152 <nop,nop,timestamp 3387447 273176760>
00:51:19.853102 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.58145 >
individual.net.nntp: . ack 6993 win 14480 <nop,nop,timestamp 273176968
3387447,nop,nop,sack sack 2 {4097:4345}{8441:12785} >

And so on. This particular remote server plays very poorly with TCP retransmits
later on, and assumes the worst with congestion, and eventually the 250KiB or so
active file download times out after 15 minutes. This site is worse than most,
but from what I can see, it affects all connections.

Here is the same operation, performed by a (Windows) machine behind the same
gateway with no configuration changes:

00:53:22.108688 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720 >
individual.net.nntp: P 11:13(2) ack 1 win 17265
00:53:22.151819 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . ack 13 win 49152
00:53:22.155356 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 1:1461(1460) ack 13 win 49152
00:53:22.156509 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: P 2921:4097(1176) ack 13 win 49152
00:53:22.158246 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720 >
individual.net.nntp: . ack 1461 win 17520 <nop,nop,sack sack 1 {2921:4097} >
00:53:22.161174 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 1461:2921(1460) ack 13 win 49152
00:53:22.163170 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720 >
individual.net.nntp: . ack 4097 win 17520
00:53:22.166279 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 4097:5557(1460) ack 13 win 49152
00:53:22.172699 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 5557:7017(1460) ack 13 win 49152
00:53:22.174601 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720 >
individual.net.nntp: . ack 7017 win 17520
00:53:22.179170 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 7017:8477(1460) ack 13 win 49152
00:53:22.184477 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 8477:9937(1460) ack 13 win 49152
00:53:22.186498 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720 >
individual.net.nntp: . ack 9937 win 17520
00:53:22.204593 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 9937:11397(1460) ack 13 win 49152
00:53:22.206028 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 11397:12857(1460) ack 13 win 49152
00:53:22.208011 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720 >
individual.net.nntp: . ack 12857 win 17520
00:53:22.210327 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 12857:14317(1460) ack 13 win 49152
00:53:22.215131 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 14317:15777(1460) ack 13 win 49152
00:53:22.217085 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720 >
individual.net.nntp: . ack 15777 win 16900
00:53:22.221728 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 15777:17237(1460) ack 13 win 49152
00:53:22.224848 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 17237:18697(1460) ack 13 win 49152
00:53:22.226741 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720 >
individual.net.nntp: . ack 18697 win 13980
00:53:22.232569 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 18697:20157(1460) ack 13 win 49152
00:53:22.237553 IP individual.net.nntp >
cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720: . 20157:21617(1460) ack 13 win 49152
00:53:22.239545 IP cpc4-cmbg5-3-0-cust166.cmbg.cable.ntl.com.4720 >
individual.net.nntp: . ack 21617 win 11060

As you can see, there's one packet lost, but that's just an abberant glitch. The
download proceeds happily with no further loss. Note that the point that things
go wrong in the first dump is the point in the second dump when two packets get
sent by the remote host consecutively.

Disabling SACKs makes no difference.

I have tried adding a blanket allowance for news.individual.net as an
experiment: iptables -I INPUT 1 -m tcp -p tcp -s individual.net -j ACCEPT
but that made no difference.

I can send my /etc/sysconfig/iptables if it would be helpful, although I am
reluctant to do that for security reasons. I even briefly disabled my iptables
rules completely to see if that helped, but it didn't. And lsmod showed the ipt*
modules had been unloaded.

Just ask if there is more I can do to debug the problem, although I don't have
any easy means to spy externally on what gets sent between the gateway machine
and the firewall router as I have no spare hub. I may be able to get one. I
would be surprised if it didn't show the packets coming in though.

I haven't seen any reports of anything similar, which seems all the odder.

Version-Release number of selected component (if applicable):

2.6.12-1.1376_FC3 for athlon

Comment 1 Jonathan Larmour 2005-09-15 00:45:01 UTC
Created attachment 118829 [details]
tcpdump showing packet loss from direct connection to remote host

Comment 2 Jonathan Larmour 2005-09-15 00:47:35 UTC
Created attachment 118830 [details]
tcpdump showing no real packet loss from connection behind firewall to remote host

Comment 3 Jonathan Larmour 2005-09-15 00:48:29 UTC
Those tcpdumps formatted really badly inline. I have attached them as files to
make them easier to read.


Comment 4 Jonathan Larmour 2005-10-21 03:38:26 UTC
I have now determined that the NIC was getting CRC errors, but it was a card
specific issue - faulty hardware. Sigh. Not sure why it manifested differently
between NAT and not, but a switch to another NIC fixed the issue.

Closing.



Note You need to log in before you can comment on or make changes to this bug.