Bug 21607

Summary: ssh fails to display output of remote commands
Product: [Retired] Red Hat Linux Reporter: Ben LaHaise <bcrl>
Component: opensshAssignee: Nalin Dahyabhai <nalin>
Status: CLOSED RAWHIDE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: dr, pekkas
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-02-12 21:17:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben LaHaise 2000-12-02 00:59:47 UTC
On a number of machines here running RH7 with updates, ssh is failing to
display the last chunk of output from commands sporadically.  I'm using a
key to connect to the server.  A sample session is:

[bcrl@today /md0/work-ac4]$ ssh toolbox echo foo
foo
[bcrl@today /md0/work-ac4]$ ssh toolbox echo foo
foo
[bcrl@today /md0/work-ac4]$ ssh toolbox echo foo
[bcrl@today /md0/work-ac4]$ ssh toolbox echo foo
foo
[bcrl@today /md0/work-ac4]$ 

Note how the third command did not output foo.  This seems to be fairly
reproducible.

Comment 1 Pekka Savola 2000-12-02 17:29:06 UTC
This didn't happen with 2.2.0, correct?


Comment 2 Ben LaHaise 2000-12-02 19:38:15 UTC
I'm not sure since 2.2.0 has never been run on these machines.  I don't recall
ever seeing this behaviour with the 2.1.1p4 variant shipped with RH7.

Comment 3 Pekka Savola 2000-12-02 21:21:20 UTC
This is related to #19837 (logout w/ openssh and background job locks up).

This only happens with Protocol 2.

There is a problem how hanging background processes problem was fixed.  
The fix causes data loss under certain conditions.  This can be reproduced as follows:
---
ssh localhost dd if=/dev/zero bs=10000 count=1 | wc -c
---
This should report 10000, not 0.

The fix has been reverted in the latest snapshots, but the main issue (which 
surfaces on non-BSD systems, IIRC) has not been resolved.
Help would be appreciated.

Quoting Damien Miller on openssh-unix-dev on 23 Nov:
---- 
The problem is caused by my workaround for the sshd hand upon logout 
when background processes with open std{in,out,err} fd open.

On OpenBSD, when the child of sshd (which has children of its own) exits,
the stdout fd is marked as readable in the 
serverloop.c:wait_until_we_can_do_something select() and a subsequent read 
completes with a return value of 0.

On Linux, nothing is reported on the select() unit all grandchildren 
have exited (and thus closed their std* fds), then the child stdout fd
is marked as readable, but a subsequent read returns with a -1 and 
errno=EIO.

The workaround in the portable version was to allow a single pass through 
through the select (grep for child_has_selected in serverlop.c) and then
simulate a read failed on the channel (grep for djm in session.c).

The workaround in the portable version was to allow a single pass through 
through the select (grep for child_has_selected in serverlop.c) and then
simulate a read failed on the channel (grep for djm in session.c).

The problem is that data may not have fully drained from the child before
the output is forcibly removed. The current strategy of giving the child
a chance to drain is broken: under high load, it may take a long time for
all the data to make it through, so any timelimit is arbitrary. 
----





Comment 4 Pekka Savola 2000-12-03 22:01:48 UTC
Correction: this _does_ happen with Protocol 1, too.  

There were some other problems, with scp, that only showed up with Protocol2.

Comment 5 Need Real Name 2000-12-22 10:04:46 UTC
I've had the same problem on Solaris, and the latest (Dec 22) snapshot fixed it.


Comment 6 Pekka Savola 2000-12-22 10:07:06 UTC
Yes, but it re-introduces another problem of hanging background processes at
exit (sleep 10 &; exit).

Comment 7 Need Real Name 2000-12-22 10:08:19 UTC
I've had the same problem on Solaris, and the latest (Dec 22) snapshot fixed it.


Comment 8 Pekka Savola 2001-02-12 21:17:48 UTC
*** Bug 27049 has been marked as a duplicate of this bug. ***

Comment 9 Pekka Savola 2001-02-24 08:18:58 UTC
2.5.1p1 is (will be) in rawhide.