Description of Problem: In certain circumstances, a write to a pipe may return with EAGAIN instead of blocking. Version-Release number of selected component (if applicable): kernel-2.4.18-18.8.0.src.rpm How Reproducible: 80% of the time. Steps to Reproduce: 1. from a shell: cvs annotate some-file 2>&1 | (sleep 1; less) 2. 3. Actual Results: the 3 lines that cvs annotate writes to stderr followed by the tail of the data that cvs annotate writes to stdout (i.e. the top of the standard output is missing). Expected Results: All cvs output. Additional Information: cvs annotate first writes 3 short lines to standard error in 5 write system calls. It then writes the annotated file to standard output 4096 bytes at a time. If the process reading from the pipe isn't ready, the write should block, but in actual fact, I see that the write system call returns with the error EAGAIN. See the attachment for the trace output of cvs and of less. In the cvs trace search for "EAGAIN", in the less trace search for "read(0,". Note that the first read returns the data that cvs wrote the stderr, but that the second read does not return the same data as the first write to stdout by cvs.
Created attachment 88085 [details] tar file containing strace output of cvs and less commands from bug report
This bug still exists in Fedora Core 1 Test 3 with kernel-2.4.22-1.2115.nptl. I just tried the command on a 825 line file. When scrolled to the bottom, less reported that there were only 217 lines (3 for the header, and so only 214 for the rest of the file).
Still present in Fedora Core 1 with kernel-2.4.22-1.2174.nptl. This is a very annoying bug since I get bit by it almost every when I try to view annotation from within XEmacs. I'm also not happy that this bug is over a year old and apparently nobody has looked at it. What is the use of submitting bug reports?
This bug is still present in Fedora Core 2 with kernel-smp-2.6.6-1.435. This bug was reported a year-and-a-half ago and nobody has even acknowledged the report yet! It is trivial to test!
I've never been able to duplicate this for one. Do you use some particularly weird shell perhaps ?
I noticed the problem first in XEmacs (vc-annotate command). No shell involved there. I was able to reproduce it in my own private shell (originally based on the Bourne shell), but I can also reproduce it in bash. I hadn't mentioned it, the CVS repositories I use for this are all remote repositories accessed with SSH.
Can't reproduce it with bash and ssh repositories either. Can you get me an strace following forks including that of the shell itself ? That way I can see which task sets the pipe to nonblocking
Created attachment 101257 [details] strace output of bash showing problematic invocation There is an old attachment with strace output of cvs and less. I'll add a newly created strace output using this command: strace -o @bash -f bash -c 'cvs annotate xec.c 2>&1 | (sleep 2; less)' This invokation also resulted in a truncated result. This was created on FC1.
I'm chasing a similar effect with truncated output from rsync -n over ssh. For what it's worth, in my case the problem is not kernel-related, but is an unfortunate side-effect of the "2>&1 | [something slow, like tee]" idiom when combined with ssh. It seems that rsync's child ssh sets its stdERR non-blocking, and that stderr has been inherited unchanged from the top-level rsync. (The rsync has supplied pipes for its child's stdin and stdout, but left the stderr alone. The code that does this is commented as being derived from CVS code, which is a possible connection to the original report; see rsync-2.6.2/pipe.c::piped_child()) Because of the "2>&1", the top-level stderr is a dup of the top-level stdout, so ssh has inadvertantly made rsync's stdOUT non-blocking. Rsync is not expecting that, and does not check the return code from fflush(stdout), so it can silently drop lines from stdout. (See the end of rsync-2.6.3/log.c::rwrite()) For me, removing the "2>&1" from the command restores the full output, at the cost of not capturing stderr. The original reporter might like to try the same experiment. (Versions: OpenSSH_3.7.1p2, rsync version 2.6.2)
See also http://groups.google.com/groups?th=e4df2fdc1f4f4950 and http://sources.redhat.com/ml/bug-glibc/2002-08/msg00041.html
Dropping the (equivalent of) 2>&1 has been my workaround for ages. David's analysis is consistent with what I have seen, so perhaps this is not a kernel bug after all.
I have no clues (assuming it still isn't fixed with latest updates). Reassigning to 'bash', maybe Tim has some insight.
reflecting change in comment 12
No idea what happened to the comment I already wrote when I reassigned this to cvs, but here's a summary: Read the link: http://sources.redhat.com/ml/bug-glibc/2002-08/msg00041.html Paul Eggart thinks CVS needs changing, and I tend to agree.
Fedora Core 2 is now maintained by the Fedora Legacy project for security updates only. If this problem is a security issue, please reopen and reassign to the Fedora Legacy product. If it is not a security issue and hasn't been resolved in the current FC3 updates or in the FC4 test release, reopen and change the version to match.
I can't reproduce the bug in FC3+updates anymore, so the report can be closed, as far as I am concerned.