From Bugzilla Helper: User-Agent: Mozilla/4.79 [en] (Windows NT 5.0; U) Description of problem: Calling write() on a TCP/IP socket causes the PSH flag not to be set on packets of length equal to the MTU. This is not a problem with many TCP/IP stacks and is not a problem unless all the packets sent by the write() have length equal to the MTU. My specific problem is that the exim mailer (3.34) cannot deliver emails of length 1448 bytes or 2896 bytes to SMTP running on IBM VM/ESA 2.4 because they fit into one or two packets of length equal to the MTU. Apparently the VM TCP/IP stack will queue locally destined packets up to a total of 4096 bytes unless it sees a PSH flag. Version-Release number of selected component (if applicable): 2.4.9-31 How reproducible: Always Steps to Reproduce: 1. Run tcpdump. 2. Send a message via exim. You may need to adjust the message length to get it to fit exactly in a number of packets equal to the MTU. Make sure you're not using TLS. Actual Results: The packets containing the message data do not have the PSH flag set. Expected Results: At least the last data packet should have the PSH flag set. Additional info: See RFC 793. I would argue that the write() interface is equivalent to an implementation of send() without a push flag for the purposes of interpreting the RFC.
This is a bug in the VM tcp/ip stack; please ask IBM for the hotfix for this.
Do you know of a hotfix for this or do you want me to ask them for one? When we consulted IBM, they said this was a Linux bug. Can you help me find evidence to the contrary?
The TCP RFSs state rather clearly that the user should not be made to wait indefinitly for queued recieved data just because PSH is not set. PSH is advisory.
s/RFS/RFC/
I don't know my way around the RFCs. :-( Could you give me an RFC number and section?
for example: rfc793 section 2.8: There is no necessary relationship between push functions and segment boundaries. The data in any particular segment may be the result of a single SEND call, in whole or part, or of multiple SEND calls. The purpose of push function and the PUSH flag is to push data through from the sending user to the receiving user. It does not provide a record service. There is a coupling between the push function and the use of buffers of data that cross the TCP/user interface. Each time a PUSH flag is associated with data placed into the receiving user's buffer, the buffer is returned to the user for processing even if the buffer is not filled. If data arrives that fills the user's buffer before a PUSH is seen, the data is passed to the user in buffer size units. while this states the behavior of what to do WHEN you get a PUSH, it also states that there doesn't need to be a relation between segment boundaries and PUSHes. Also NOWHERE does it say you MUST send a PUSH (other than in final packets and urgent packets)....
RFC 1122, 4.2.2.2: A TCP MAY implement PUSH flags on SEND calls. If PUSH flags are not implemented, then the sending TCP: (1) must not buffer data indefinitely, and (2) MUST set the PSH bit in the last buffered segment (i.e., when there is no more queued data to be sent). Does Linux implement PUSH flags on SEND calls? I can't find any userland settings for this.
Dave, can you reconcile these two quotes in context?
It looks like indeed we are required to fix this. I'll cook up a patch.
Since davem acknowledges that there is a problem, I'm reopening the bug.
Created attachment 60886 [details] Patch to set the PUSH flag on the last packet of outgoing TCP messages even when packet size = MSS
We already installed a fix into our kernel sources, there is no need for you to provide a new one and this bug should be closed.
This bug is *not* fixed in kernel 2.4.9-34; I had to install a kernel with this patch to fix it. It looks like you fixed do_tcp_sendpages but tcp_sendmsg needs to have the same fix applied.
Created attachment 60965 [details] Patch which is installed and fixed the PSH problem.
I attached the patch which is installed and fixes the problem. As you can clearly see, it modifies tcp_sendmsg which is where the problem is. It does not change do_tcp_sendpages, that case actually got this bit right. And if you look it is nearly identical to your patch. It is in our tree, I remember sending this out to Arjan several times.
OK, so this will be in the next errata kernel (whenever that comes out...)?