Description of Problem: After upgrading from kernel-2.4.9-21 to kernel-2.4.9-31, ssh returns wrong exit code under some circumstances (seriously munging our systems management!) Version-Release number of selected component (if applicable): 2.9p2-12 How Reproducible: "ssh localhost rpm -qa </dev/null; echo $?" should reveal exit code 0 but on most upgraded systems gives exit code 255 removing "ssh localhost" gives exit code 0 which is correct adding "strace" before either "ssh" or "rpm" appears to show that the ssh/sshd system is changing the 0 to -1. However "ssh localhost exit 0" works correctly.
The problem is a race in sshd which was exposed by small changes in the kernel. Here's a simpler test case: ssh localhost perl -e "'close(STDIN);close(STDOUT);close(STDERR);sleep(1);exit(0);'" Because the fd's are closed before the process exits, the cleanup code in session.c is used instead of the cleanup code in serverloop.c. The session.c cleanup code ignores the process exit code, contrary to spec which says that sshd waits for process exit AND all fd's closed, then passes the exit code to ssh. The effect of the current code is to wait for process exit OR all fd's closed, which results in the observed race as well as other incorrect behavior. WORKAROUND Until this is fixed, the following workaround seems to work in most cases. Instead of: ssh localhost foo - use this: ssh localhost 'foo; exit $?'
Bug does not exist in openssh-3.1p1-2 which was released today to fix a serious security bug.