Bug 21607
| Summary: | ssh fails to display output of remote commands | ||
|---|---|---|---|
| Product: | [Retired] Red Hat Linux | Reporter: | Ben LaHaise <bcrl> |
| Component: | openssh | Assignee: | Nalin Dahyabhai <nalin> |
| Status: | CLOSED RAWHIDE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.0 | CC: | dr, pekkas |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | i386 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2001-02-12 21:17:51 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ben LaHaise
2000-12-02 00:59:47 UTC
This didn't happen with 2.2.0, correct? I'm not sure since 2.2.0 has never been run on these machines. I don't recall ever seeing this behaviour with the 2.1.1p4 variant shipped with RH7. This is related to #19837 (logout w/ openssh and background job locks up).
This only happens with Protocol 2.
There is a problem how hanging background processes problem was fixed.
The fix causes data loss under certain conditions. This can be reproduced as follows:
---
ssh localhost dd if=/dev/zero bs=10000 count=1 | wc -c
---
This should report 10000, not 0.
The fix has been reverted in the latest snapshots, but the main issue (which
surfaces on non-BSD systems, IIRC) has not been resolved.
Help would be appreciated.
Quoting Damien Miller on openssh-unix-dev on 23 Nov:
----
The problem is caused by my workaround for the sshd hand upon logout
when background processes with open std{in,out,err} fd open.
On OpenBSD, when the child of sshd (which has children of its own) exits,
the stdout fd is marked as readable in the
serverloop.c:wait_until_we_can_do_something select() and a subsequent read
completes with a return value of 0.
On Linux, nothing is reported on the select() unit all grandchildren
have exited (and thus closed their std* fds), then the child stdout fd
is marked as readable, but a subsequent read returns with a -1 and
errno=EIO.
The workaround in the portable version was to allow a single pass through
through the select (grep for child_has_selected in serverlop.c) and then
simulate a read failed on the channel (grep for djm in session.c).
The workaround in the portable version was to allow a single pass through
through the select (grep for child_has_selected in serverlop.c) and then
simulate a read failed on the channel (grep for djm in session.c).
The problem is that data may not have fully drained from the child before
the output is forcibly removed. The current strategy of giving the child
a chance to drain is broken: under high load, it may take a long time for
all the data to make it through, so any timelimit is arbitrary.
----
Correction: this _does_ happen with Protocol 1, too. There were some other problems, with scp, that only showed up with Protocol2. I've had the same problem on Solaris, and the latest (Dec 22) snapshot fixed it. Yes, but it re-introduces another problem of hanging background processes at exit (sleep 10 &; exit). I've had the same problem on Solaris, and the latest (Dec 22) snapshot fixed it. *** Bug 27049 has been marked as a duplicate of this bug. *** 2.5.1p1 is (will be) in rawhide. |