Bug 64808 - kernel TCP error
Summary: kernel TCP error
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: vsftpd
Version: 7.2
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: David Miller
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-05-13 02:48 UTC by Need Real Name
Modified: 2007-04-18 16:42 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-07-15 01:27:45 UTC
Embargoed:


Attachments (Terms of Use)
recvmsg files from the first host (64.22 KB, text/plain)
2002-05-17 15:24 UTC, Need Real Name
no flags Details
recvmsg from the second host (826 bytes, text/plain)
2002-05-17 15:25 UTC, Need Real Name
no flags Details
Fix for Andrew's TCP bug. (1.21 KB, patch)
2002-05-21 03:50 UTC, David Miller
no flags Details | Diff
Fix for MSG_PEEK in parallel with normal recvmsg race in TCP. (401 bytes, patch)
2002-05-21 22:19 UTC, David Miller
no flags Details | Diff

Description Need Real Name 2002-05-13 02:48:05 UTC
Description of Problem:

I noticed the following in /var/log/messages:

recvmsg bug: copied 39AA115 seq 39AA116

Version-Release number of selected component (if applicable):

2.4.9-31smp

How Reproducible:

This has been noticed on 2 machines so far.  One was a short burst that lasted
for ~6 seconds, the other machine logged these messages over a span of 5 minutes.

Steps to Reproduce:

Unknown.  Both of these servers push 100Mb/s or more sustained.

Comment 1 David Miller 2002-05-17 02:58:55 UTC
Can you extract all of the "recvmsg bug" messages from your
logs and attach them to this bugzilla entry?

Also, are you using TUX by chance on this machine?


Comment 2 Need Real Name 2002-05-17 15:24:25 UTC
Created attachment 57744 [details]
recvmsg files from the first host

Comment 3 Need Real Name 2002-05-17 15:25:29 UTC
Created attachment 57745 [details]
recvmsg from the second host

Comment 4 Need Real Name 2002-05-17 15:26:23 UTC
No, we are not using TUX, but we are using vsftpd, which makes heavy use of
sendfile() if that makes any difference.

Comment 5 David Miller 2002-05-21 03:34:08 UTC
Thanks for the log files and information, they affirmed my
suspicion of what this bug might be caused by.  I think I
know what is wrong, when we directly copy into userspace from
the TCP input packet processing, we mishandle the sequence
numbers if this happens to be the FIN packet too.


Comment 6 David Miller 2002-05-21 03:50:17 UTC
Created attachment 58064 [details]
Fix for Andrew's TCP bug.

Comment 7 David Miller 2002-05-21 04:43:22 UTC
Please ignore the patch I posted it turns out to be buggy,
searching some more.


Comment 8 David Miller 2002-05-21 22:18:52 UTC
Good news is that we know what is happening.  Bad news is that vsftpd
is a buggy pile of crud.

From the perspective of the kernel, the situation is harmless.  The messages
are printed, but the kernel is actually fine.  What the application
is doing is allowing multiple threads to read from the same socket, one of which
is using MSG_PEEK.  This is racy and bad.

Secondly, default recommended configuration of vsftpd has the 'async_abor_enable'
off.  This is VERY BAD, it will confuse every FTP client out there.  In fact it
completely breaks the functionality of ABORT ftp commands.  What vsftpd actually
does when you use this default is it tries to complete the data transmission
and then reads garbage instead of recognizing the ABORT command (clients use
URG data to implement asynchronous commands, vsftpd interprets the ABORT as part
of the data stream and locks up).

So I am going to recommend two things, edit the vsftpd config on these systems
and set 'async_abor_enable' to 'YES'.  Second, let's get Arjan to build a
kernel with the patch I am about to attach.  It will print a rate limited
debugging message when applications use MSG_PEEK and try to call read on the
same socket in parallel like this.


Comment 9 David Miller 2002-05-21 22:19:50 UTC
Created attachment 58127 [details]
Fix for MSG_PEEK in parallel with normal recvmsg race in TCP.

Comment 10 David Miller 2003-07-15 01:27:45 UTC
This bug has been fixed for a long time, I just
never got around to closing it.



Note You need to log in before you can comment on or make changes to this bug.