Good example: establishing ftp connection from SunOS5.4 client to RHL6.2 server (kernel-2.2.19-6.2.1, wu-ftpd-2.6.0-14.6x). Path MTU discovery on Solaris side turned off. FTP server's banner ("220 ... ready.") is 89 character long. As soon as I see this response, I press Ctrl+C on client. Client is appr. 20 hops away from server. 00:51:04.521363 < sunos54.51881 > rhl62.ftp: S 3451928585:3451928585(0) win 64240 <mss 1460> 00:51:04.521486 > rhl62.ftp > sunos54.51881: S 4201831394:4201831394(0) ack 3451928586 win 32120 <mss 1460> (DF) 00:51:04.730219 < sunos54.51881 > rhl62.ftp: . 1:1(0) ack 1 win 64240 00:51:04.778800 > rhl62.ftp > sunos54.51881: P 1:92(91) ack 1 win 32120 (DF) [tos 0x10] 00:51:05.032062 < sunos54.51881 > rhl62.ftp: . 1:1(0) ack 92 win 64240 00:51:06.882869 < sunos54.51881 > rhl62.ftp: F 1:1(0) ack 92 win 64240 As soon as rhl62 server received the packet from remote side, confirming successfull TCP hanshake, it [rhl62] sent FTP response 220, which is 89 chars + cr + lf, 91 bytes in total. Bad example: Mainframe tries to establish ftp connection to rhl62, but operator [human] can't see FTP login prompt. Then in /var/log/messages I see the following: Jun 26 23:29:36 rhl62 ftpd[3908]: lost connection to mframe [mframe] 23:24:08.150047 < mframe.4434 > rhl62.ftp: S 2569289217:2569289217(0) win 12288 <mss 1460> 23:24:08.150506 > rhl62.ftp > mframe.4434: S 2977640474:2977640474(0) ack 2569289218 win 32120 <mss 1460> (DF) 23:24:08.174624 < mframe.4434 > rhl62.ftp: . 1:1(0) ack 1 win 12288 23:29:08.188922 < mframe.4434 > rhl62.ftp: R 2569289218:2569289218(0) win 32120 I think, symptoms are the same as mentioned in Bug#13282. BTW, mframe's IP address is not reverse-resolvable, but it takes between 15 to 20 seconds to get negative result using 'nslookup' or 'host'. And I see the same 'lost connection' records in /var/log/messages, mentioning perfectly resolvable FQDNs too. bind-utils-8.2.2_P3-1 Also, very close to Bug#39923. And my mainframe client is also behind T1 link. But some other clients (recorded in /var/log/messages as whose 'lost connection') are very close - behind one cisco router and two/three foundry switches, everything on 100base-T ethernet. The 2.2.19-6.2.1 kernel was installed on 01-Jun-2001, but I saw the same 'lost connection' messages in old log files too (when rhl62 run 2.2.17-14 kernel). All kernels are from updates.redhat.com. Another good example: After reboot of rhl62 server, mainframe user successfully connected to FTP server: 12:39:10.733959 < mframe.4575 > rhl62.ftp: S 97114625:97114625(0) win 12288 <mss 1460> (ttl 27, id 29705) 12:39:10.734420 > rhl62.ftp > mframe.4575: S 1809826268:1809826268(0) ack 97114626 win 32120 <mss 1460> (DF) (ttl 64, id 4664) 12:39:10.783162 < mframe.4575 > rhl62.ftp: . 1:1(0) ack 1 win 12288 (ttl 27, id 29706) 12:40:28.863661 > rhl62.ftp > mframe.4575: P 1:92(91) ack 1 win 32120 (DF) [tos 0x10] (ttl 64, id 4680) 12:40:28.979833 < mframe.4575 > rhl62.ftp: . 1:1(0) ack 92 win 12288 (ttl 27, id 29932) 12:40:55.181347 < mframe.4575 > rhl62.ftp: P 1:13(12) ack 92 win 12288 (ttl 27, id 29982) [...and so on...] In the last packet shown FTP client sent "USER uname" command, which is 10 + cr + lf = 12 character long. The whole FTP session was successfull. In tcpdump output I see unsuccessfull attempts to establish FTP connection with all DF flags set too, so path MTU discovery is not a problem. And all FTP connections after reboot are [still] successfull.
Surely looks odd, and I can't reproduce it, so it seems to be something very specific to your clients or setup. It does look like 39923... I don't think it's a wu-ftpd problem because it doesn't do anything non-obvious before login and because it seems to work after a reboot. Looks more like a general TCP/IP problem (either on the server or on the client). Arjan, any ideas?
In the "bad" example, the is nothing even interesting going on. The connection setup is all that happens. Then "mframe" resets probably due to the "operator" aborting the ftp client. The RH6.2 machine is not sending the welcome banner to "mframe" for several minutes. I doubt this is a kernel issue from what I see in the trace, at least not a TCP issue, because the parameters negotiated by both mframe and the RH6.2 machine are completely normal. More likely, it would be beneficial to run "strace" on the ftp daemon to see why it is not sending even the greeting message to "mframe" for all this time.
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/