Bug 39923

Summary: dropped ack packet and 2.2.19-6.2.1
Product: [Retired] Red Hat Linux Reporter: Andre Delafontaine <andre.delafontaine>
Component: kernelAssignee: David Miller <davem>
Status: CLOSED WONTFIX QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-03-06 07:12:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
.config used to compile the 2.2.19-6.2.1 kernel
none
list of RPMs and their versions installed on the host used to compile the kernel. Basically, a RedHat6.2 with all available patches from updates.redhat.com none

Description Andre Delafontaine 2001-05-09 17:46:17 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.72 [en] (X11; U; Linux 2.2.16-1 i686)

Description of problem:
RedHat's 2.2.19-6.2.1 kernel has trouble talking TCP to a Tru64 host when
the ack of the syn-ack during the 3 way handshake gets dropped.

At this point, the 2 hosts get in a deadlock and the tcp connection stays
active without being able to send any data. After 20 minutes, the client
process on Tru64 still tries to connect.

This issue has been extensively tested on a 2.2.17-14 kernel without
problems.

The 2.2.19-6.2.1 kernel has no additional patches, but has been recompiled.

How reproducible:
Always

Steps to Reproduce:
1. establish a tcp connection from a client to a server running
2.2.19-6.2.1;

2. somehow get the ack of the syn-ack part of the 3 way handshake to be
dropped (occurs more often on wan connections);

	

Actual Results:  The client tries to send additional syn-ack packets, but
the server doesn't respond at all.

After a number of syn-ack resends, the client sends a reset, which is
ignored too.

The tcp connection stays half open for at least 20 minutes. Only a kill of
the client side process clears the connection (didn't try killing the
server process).

Fresh tcp connections are still accepted during the dead lock state.

Expected Results:  The 2.2.17-14 kernel resends the ack when getting a 2nd
copy of the syn-ack packet. The connection gets fully established and works
fine.

Additional info:

tcpdump trace of RedHat's 2.2.19, linux-host is running Linux 2.2.19-6.2.1,
tru64 is running Compaq's Tru64 4.0d:

10:24:42.007945 > linux-host.1016 > tru64-host.1022: S
2181164545:2181164545(0) win 32120 <mss 1460,sackOK,timestamp 68814196
0,nop,wscale 0> (DF)

10:24:42.049098 < tru64-host.1022 > linux-host.1016: S
1985791700:1985791700(0) ack 2181164546 win 33580 <mss 1460,nop,wscale 0>
(DF)

*** this ack gets lost ***
10:24:42.049198 > linux-host.1016 > tru64-host.1022: . 1:1(0) ack 1 win
32120 (DF)

*** tru64-host retransmits syn-ack ***
10:24:48.505996 < tru64-host.1022 > linux-host.1016: S
1985791700:1985791700(0) ack 2181164546 win 33580 <mss 1460,nop,wscale 0>
(DF)

*** tru64-host retransmits syn-ack ***
10:25:13.007212 < tru64-host.1022 > linux-host.1016: S
1985791700:1985791700(0) ack 2181164546 win 33580 <mss 1460,nop,wscale 0>
(DF)

*** tru64-host resets link, but even then Linux does not respond ***
10:25:57.507858 < tru64-host.1022 > linux-host.1016: R 0:0(0) ack 1 win
33580 (DF)

20 minutes later, connection has not timed out. Client side process
is still running on tru64-host.



The problem does not occur with RedHat's 2.2.17-14:

13:29:55.168486 > linux-host.1016 > tru64-host.1019: S
1023562949:1023562949(0) win 32120 <mss 1460,sackOK,timestamp 663579
0,nop,wscale 0> (DF)

13:29:55.213289 < tru64-host.1019 > linux-host.1016: S
129268475:129268475(0) ack 1023562950 win 33580 <mss 1460,nop,wscale 0>
(DF)

*** this ack gets lost ***
13:29:55.213360 > linux-host.1016> tru64-host.1019: . 1:1(0) ack 1 win
32120 (DF)

*** tru64-host retransmits syn-ack ***
13:30:01.275229 < tru64-host.1019 > linux-host.1016: S
129268475:129268475(0) ack 1023562950 win 33580 <mss 1460,nop,wscale 0>
(DF)

Comment 1 David Miller 2001-05-09 21:09:11 UTC
Does the bug occur without you recompiling our kernel?
I only ask because this is the first I've ever heard of
such an obviously bogus behavior.

Comment 2 Andre Delafontaine 2001-05-09 21:47:27 UTC
I have not yet tried the original RedHat kernel but will be setting up a test
case soon.

In case it is something to do with kernel configuration or compiler version, I
will be attaching the .config used as well as the list of RPMs installed on the
host used to compile the kernel.

I will keep you updated.

Andre

Comment 3 Andre Delafontaine 2001-05-09 21:49:42 UTC
Created attachment 17947 [details]
.config used to compile the 2.2.19-6.2.1 kernel

Comment 4 Andre Delafontaine 2001-05-09 21:58:37 UTC
Created attachment 17948 [details]
list of RPMs and their versions installed on the host used to compile the kernel. Basically, a RedHat6.2 with all available patches from updates.redhat.com

Comment 5 David Miller 2004-03-06 07:12:15 UTC
Way past EOL product, therefore this is unlikely to ever be fixed.