Bug 161898
Summary: | network connections stalled with kernel 2.6 and standard iptables | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Jose Traver <traverj> | ||||||
Component: | kernel | Assignee: | Thomas Graf <tgraf> | ||||||
Status: | CLOSED CANTFIX | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4.0 | CC: | caronc, davej, jbaron, rkhan, ssnodgra, villapla | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-11-03 12:55:10 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Jose Traver
2005-06-28 10:39:24 UTC
Created attachment 116051 [details]
ethereal capture of a stalled connection
Iptables is used to set up, maintain, and inspect the tables of IP packet filter rules in the Linux kernel. Assigning to kernel. Is this bug still occurring? I'm closing this bugzilla as there was no answer to my ping. Feel free to reopen the bug if the problem still occurs. For about 2 weeks now, we've been trying to figure out why we can't deliver files larger then 600KB across our network before it stalls. We experience the same symptoms testing with both FTP and SFTP. Using Wireshark; we can see that just prior to the stall we receive 40 to 60 Duplicate ACK requests. According to http://ssfnet.org/Exchange/tcp/tcpTutorialNotes.html (specifically the Congestive Control section) this implies that a packet was lost due to congestion and the therefore making the remaining packets delivered out of sequence. The multiple Duplicate ACKs (>=3, in our case 50 to 60) insinuates that the packet was lost. But here is where Linux stalls... nothing is re-transmitted... after 1 to 2 minutes; the connection sometimes resumes... but otherwise stays dormant. We have firewall configurations that kill stale connections with no data flow (I realize I can eliminate this in SSH with keep-alives, but it doesn't solve the FTP issue which suffers the same fate). The transfer takes place from our internal LAN to a small 1MB pipe (WAN) which connects us across the country to another end point. So according to the same article i posted above; this is exactly where congestion will occur. the Duplicate ACK's tell the server to adjust it's sliding window (sometimes almost in half) to accommodate for the traffic. When i disable iptables on the sending end (source), I'm able to transfer the file without a problem. In fact; the same 40 to 60 Duplicate ACKs are received; but Linux acknowledges them with iptables disabled and immediately recovers and delivers the file in sequence with virtually no interruptions (obviously some performance issues; but the transfer continues uninterrupted). Then I stumbled across ```this``` bugzilla report: which was closed back in 2008 which is EXACTLY the issue I'm experiencing. Effectively adding this entry to my IP Tables (and restarting the firewall) resolves the problem: -A INPUT -m state --state INVALID --dport 22 -m tcp -p tcp -j ACCEPT It appeared to be closed because no one could solve the issue and it appeared to work for everyone else. To further my testing I delivered a file via SFTP to a server I had at home (with iptables enabled and without the above workaround mentioned in this BugZilla report and seen above). It delivered perfectly without any issues at all. I then stumbled across this: http://www.experts-exchange.com/Software/Server_Software/Web_Servers/Apache/Q_27997038.html The above article illustrates the EXACT same problem; but their solution was to disable tcp_sack requests instead of accepting INVALID Duplicate ACKs. So I tried this next and it also worked perfectly for us in our production environment. I could send and acknowledge the Duplicate ACKs received (without the firewall entry work around). As it turned out; the reason was able to deliver files successfully to an SFTP server on the internet (appose to our WAN ) was because our company's outside firewall (the last one before leaving into the vast internet) disables Selective Acknowledgments (SACK) for it's DoS exploit (http://www.iss.net/security_center/reference/vuln/TCP_Malformed_SACK_DoS.htm). Therefore it strips the extra SACK entries off of the packet and suddenly makes iptables not categorize the Duplicate ACK as an 'INVALID' packet. This is currently happening to all of our Red Hat v5.8 and 6.4 servers deployed today. Since the ticket seems to focus on Red Hat 4. I have to assume it is still present in all v2.6 kernels in general (I'm not sure if it spans to other distributions or not). The issue clearly has 4 workarounds when using iptables: 1. disable SACK at the router level to allow the packet to not be considered 'INVALID' and remain as 'ESTABLISHED'. Hence there is no reason why the following shouldn't be able to work (SSH as an example below /etc/sysconfig/iptables): -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT -A INPUT -j DROP 2. disable SACK at the kernel level at each source (sending) machine so it can process the Duplicate SACK requests as Duplicate ACK's instead: echo 0 > /proc/sys/net/ipv4/tcp_sack # or: sysctl -w net.ipv4.tcp_sack=0 3. using the bugzilla mentioned above... add the following to IP Tables to allow for SACK responses: -A INPUT -m state --state INVALID --dport 22 -m tcp -p tcp -j ACCEPT # or even more dangerous; but support FTP and all other transport protocols -A INPUT -m state --state INVALID -m tcp -p tcp -j ACCEPT 4. Eliminate any possible means of network congestion on your network so duplicate ack's are never issued But I think the real answer is: Update the kernel (specifically the iptables module) to support the TCP Duplicate Selective ACK packets. These should be still flowing with the 'ESTABLISHED' category and not 'INVALID. In our environment; we will temporarily go with workaround #2 i defined above until this bug is resolved. I REALLY need to push that this be back ported to Red Hat v5 (as well as v6) since it affects many distributed systems across Canada right now running both versions. But in the mean time I'm content with my work around. Do you guys see any problems with workaround #2 for now? Perhaps you can offer a better solution I haven't stumbled across yet? |