Bug 236872 - TCP connection breaks by ignoring acknowledge packets?
TCP connection breaks by ignoring acknowledge packets?
Status: CLOSED CANTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.3
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Thomas Graf
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-04-18 00:38 EDT by Gary Shi
Modified: 2014-06-18 04:29 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-10 08:51:56 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sample tcpdump output (343.19 KB, application/octet-stream)
2007-04-18 00:38 EDT, Gary Shi
no flags Details

  None (edit)
Description Gary Shi 2007-04-18 00:38:47 EDT
Description of problem:
The situation happens in an environment that administrators login from a RHEL4U4
workstation (kernel 2.6.9-42.ELsmp) to hundres of RHEL4U3 servers (kernel
2.6.9-34.ELsmp) over the Internet. Sometimes the ssh session stops responding,
and we found some problem in tcpdump data.

Version-Release number of selected component (if applicable):
kernel-2.6.9-34.ELsmp

How reproducible:
rarely

Steps to Reproduce:
1.make hundres of concurrent ssh connections
2.type some commands in terminal, especially commands that generate large
portion of terminal output
3.
  
Actual results:
sometimes the terminal suspends, output from the last command doesn't continue
and typing any characters get no response.

Expected results:
continues operation.

Additional info:
The attachment is a full session tcpdump from the client side. We don't have
server-side tcpdump data for this connection because the problem happens rarely
and it's very difficult to capture packets on hundres of running production
servers, but earlier captures from the server side shows there're no massive
packet loss.

We masked the address part of client-to-server packet to '>>>' and
server-to-client to spaces. And we filtered nop and timestamp options to make it
easier to read.

The interesting part starts from line 39502 (starred below), where the server
retransmit packet 2300959925:2300961373 to the client. Before that, the client
just acknowledged sequence 2300966901 which is several packets after 2300961373.

  03:56:41.491259     P 2300965717:2300966901(1184) ack 2379603710 win 8840
  03:56:41.491430 >>> . ack 2300966901 win 16022
  03:56:41.492200 >>> P 2379603710:2379603758(48) ack 2300966901 win 16022
* 03:56:41.735891     . 2300959925:2300961373(1448) ack 2379603710 win 8840
  03:56:41.735907 >>> . ack 2300966901 win 16022
  03:56:41.739594 >>> P 2379603710:2379603758(48) ack 2300966901 win 16022
  03:56:42.221765     . 2300959925:2300961373(1448) ack 2379603710 win 8840
  03:56:42.221784 >>> . ack 2300966901 win 16022
  03:56:42.233519 >>> P 2379603710:2379603758(48) ack 2300966901 win 16022
  03:56:43.193557     . 2300959925:2300961373(1448) ack 2379603710 win 8840
  03:56:43.193573 >>> . ack 2300966901 win 16022

When the client received the retransmission, he repeats the ack of 2300966901 to
the server again. But after some while (about 0.5 seconds here) the server
retransmit it again, seems like he doesn't get the ack packet. This behaviour
repeats for about 15 minutes, until the client side think the connection is lost
and make the recv syscall return -1 with ETIMEDOUT error, from strace of the ssh
process:

04:12:47.562502 read(3, 0xbffc2ac0, 8192) = -1 ETIMEDOUT (Connection timed out)

During this, the network condition is good, there're dozens of active ssh
sessions working well at this time, and we can get every retransmission packet
from the server, the server also receives client reacknowledge packet very well
from early manual tcpdumps.

All servers have iptables settings like this, I think this might relate to some
problems in the ip_conntrack module:

/sbin/iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
/sbin/iptables -A INPUT -s 192.168.6.0/24 -p tcp -m state --state NEW -m tcp
--dport 22 -j ACCEPT
...
/sbin/iptables -A INPUT -j DROP
Comment 1 Gary Shi 2007-04-18 00:38:47 EDT
Created attachment 152869 [details]
sample tcpdump output
Comment 2 Gary Shi 2007-05-10 02:04:47 EDT
This problem is confirmed. When I add LOG rules before the iptables DROP rule, I
see match count and logged packets in /var/log/messages when the session is
locked-up. Then I insert an ACCEPT rule before stateful rules, the locked
session continues.
Comment 3 Thomas Graf 2008-06-13 16:56:37 EDT
Are you still experiencing the problem?
Comment 4 Gary Shi 2008-06-16 08:30:00 EDT
Yes, we let the customer add non-stateful iptables rules to avoid this problem.
But if any server (RHEL4 kernel, not sure the patch level) is still running the
stateful firewall, this problem do occur. It is very likely to reproduce, if we
open a dozen of SSH connections to the server, perform several operations, and
leave them alone for 10-30 minutes, then you go back to type a command that
produces a lot of outputs (for example, for i in `seq 10`; do dmesg; done), tit
is very likely the output will block, until you add a non-stateful iptables rule
to accept the packet, or the connection will timeout. It means the firewall
still recognizes the connection after the idle, but suddenly drops packets
afterwhile.
Comment 5 Thomas Graf 2012-05-10 08:51:56 EDT
RHEL4 has entered the Extended Life Phase. There will be no more minor releases.

I'm closing this bug due to inactivity.

Please reopen and provide an explanation if you need this issue to be addressed in RHEL4. Please note that only security and critical bugfixes are considered at this point.

Note You need to log in before you can comment on or make changes to this bug.