Description of problem: Performing a wait on the pid of a co-process fails in the following circumstance: #! /bin/ksh ls -l |& pid=$! tee -a /tmp/out <&p wait $pid print $? The wait's return value is always 127. It does not matter whether the co-process exits with a successful value (e.g., "ls -l") or an unsuccessful value (e.g., "ls -l does-not-exist"); 127 is always returned. POSIX documentation on wait(1) (http://pubs.opengroup.org/onlinepubs/009695399/utilities/wait.html) states "If one or more pid operands are specified that represent unknown process IDs, wait shall treat them as if they were known process IDs that exited with exit status 127" and later on (in "Exit Status") indicates that 127 is returned only if "[t]he command identified by the last pid operand specified is unknown." We are porting our application to Linux and ran across this problem. The sample code works correctly under the Korn shell (ksh88) supplied by AIX, HP-UX, and Solaris. It also works correctly under the pdksh-5.2.14-30.6 distributed with RHEL 4.8. Version-Release number of selected component (if applicable): I've compiled a number of different ksh SRPMS. The problem is present in ksh-20100202-1.el5 and persists in ksh-20100202-1.el5_6.6, as well in ksh-20100621-2.el6 and ksh-20100621-6.el6. However, the problem is not present in ksh-20080202-2.el5 or ksh-20080202-2.el5_3.1. How reproducible: Always Steps to Reproduce: Run sample code above. Actual results: Return value of wait is always 127. Expected results: wait returns status of specified command
I've compiled a number of versions of ksh93 (original ast-ksh releases, rather than Red Hat SRPMS) to identify when this bug was introduced; it appears to have cropped up between ast-ksh.2008-12-12 and ast-ksh.2009-05-01. By examining the diffs and subsequent trial and error, I found that the following is responsible for the failure: --- 2008-12-12/src/cmd/ksh93/sh/jobs.c 2008-12-10 07:56:00.000000000 -0600 +++ 2009-05-01/src/cmd/ksh93/sh/jobs.c 2009-04-29 17:07:32.000000000 -0500 @@ -1176,7 +1177,8 @@ job.pwlist = pw; pw->p_env = sh.curenv; pw->p_pid = pid; - pw->p_flag = P_EXITSAVE; + if(!sh.outpipe || sh_isoption(SH_PIPEFAIL)) + pw->p_flag = P_EXITSAVE; pw->p_exitmin = sh.xargexit; pw->p_exit = 0; if(sh_isstate(SH_MONITOR)) I speculate wildly that this code change corresponds to the following in the Changelog: 09-01-28 A bug in which a command substitution could return an exit status of 127 when the pipefail option is enabled has been fixed. By commenting out the "if" and leaving an unconditional initialization of pw->p_flag, my specific bug is fixed (presumably at the expense of the bug fix which prompted the addition of this code). I have verified that this holds true for ksh-20100202-1.el5.src.rpm and ast-ksh.2011-02-08 as well: --- 2011-02-08/src/cmd/ksh93/sh/jobs.c--orig 2011-07-28 13:33:32.000000000 -0500 +++ 2011-02-08/src/cmd/ksh93/sh/jobs.c 2011-07-28 13:33:50.000000000 -0500 @@ -1346,7 +1346,7 @@ pw->p_shp = shp; pw->p_env = shp->curenv; pw->p_pid = pid; - if(!shp->outpipe || (sh_isoption(SH_PIPEFAIL) && job.waitall)) + /*if(!shp->outpipe || (sh_isoption(SH_PIPEFAIL) && job.waitall))*/ pw->p_flag = P_EXITSAVE; pw->p_exitmin = shp->xargexit; pw->p_exit = 0; I have submitted this information to the ast-developers list.
It's reproducible for me. Just first broken version is 2009-01-20, so that line from changelog a)is different, b)has a wrong date. Anyway, thanks for your detective work. Let me know if you get answer off-list.
David Korn replied to me off-list: --begin quoted text-------------------------------------------------- I wasn't able to reproduce this at first because my version of tee is built-in. Once I change to /usr/bin/tee, I was able to reproduce the problem. You are correct about where the fix is, but the change is from if(!sh.outpipe || sh_isoption(SH_PIPEFAIL)) to if(!sh.outpipe || sh_isoption(SH_PIPEFAIL) || sh.cpid==pid) The fix will be in the next update. --end quoted text---------------------------------------------------- Note that this specific patch is applicable to older releases of ast-ksh. I have verified that the following patch corrects the issue in 2010-02-02 (the version of ksh93 shipped with RHEL 5.5): --- src/cmd/ksh93/sh/jobs.c--orig 2011-08-05 12:06:41.000000000 -0500 +++ src/cmd/ksh93/sh/jobs.c 2011-08-05 11:45:06.000000000 -0500 @@ -1186,7 +1186,7 @@ job.pwlist = pw; pw->p_env = sh.curenv; pw->p_pid = pid; - if(!sh.outpipe || (sh_isoption(SH_PIPEFAIL) && job.waitall)) + if(!sh.outpipe || (sh_isoption(SH_PIPEFAIL) && job.waitall) || sh.cpid==pid) pw->p_flag = P_EXITSAVE; pw->p_exitmin = sh.xargexit; pw->p_exit = 0;
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0159.html