Bug 726199

Summary: wait fails on pid of co-process
Product: Red Hat Enterprise Linux 5 Reporter: Mike Jetzer <mjetzer.cdc>
Component: kshAssignee: Michal Hlavinka <mhlavink>
Status: CLOSED ERRATA QA Contact: qe-baseos-tools-bugs
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.5CC: mfranc, ovasik, prc
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ksh-20100621-1.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 728900 1243788 1243789 (view as bug list) Environment:
Last Closed: 2012-02-21 05:51:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 728900, 799362, 1243788, 1243789    

Description Mike Jetzer 2011-07-27 19:51:53 UTC
Description of problem:

Performing a wait on the pid of a co-process fails in the following circumstance:
    #! /bin/ksh

    ls -l |&
    pid=$!

    tee -a /tmp/out <&p

    wait $pid
    print $?

The wait's return value is always 127.  It does not matter whether the co-process exits with a successful value (e.g., "ls -l") or an unsuccessful value (e.g., "ls -l does-not-exist"); 127 is always returned.

POSIX documentation on wait(1) (http://pubs.opengroup.org/onlinepubs/009695399/utilities/wait.html) states "If one or more pid operands are specified that represent unknown process IDs, wait shall treat them as if they were known process IDs that exited with exit status 127" and later on (in "Exit Status") indicates that 127 is returned only if "[t]he command identified by the last pid operand specified is unknown."

We are porting our application to Linux and ran across this problem.  The sample code works correctly under the Korn shell (ksh88) supplied by AIX, HP-UX, and Solaris.  It also works correctly under the pdksh-5.2.14-30.6 distributed with RHEL 4.8.


Version-Release number of selected component (if applicable):

I've compiled a number of different ksh SRPMS.  The problem is present in ksh-20100202-1.el5 and persists in ksh-20100202-1.el5_6.6, as well in ksh-20100621-2.el6 and ksh-20100621-6.el6.  However, the problem is not present in ksh-20080202-2.el5 or ksh-20080202-2.el5_3.1.


How reproducible:
Always

Steps to Reproduce:
Run sample code above.
  
Actual results:
Return value of wait is always 127.

Expected results:
wait returns status of specified command

Comment 1 Mike Jetzer 2011-07-28 19:21:01 UTC
I've compiled a number of versions of ksh93 (original ast-ksh releases, rather than Red Hat SRPMS) to identify when this bug was introduced; it appears to have cropped up between ast-ksh.2008-12-12 and ast-ksh.2009-05-01.

By examining the diffs and subsequent trial and error, I found that the following is responsible for the failure:

--- 2008-12-12/src/cmd/ksh93/sh/jobs.c  2008-12-10 07:56:00.000000000 -0600
+++ 2009-05-01/src/cmd/ksh93/sh/jobs.c  2009-04-29 17:07:32.000000000 -0500
@@ -1176,7 +1177,8 @@
        job.pwlist = pw;
        pw->p_env = sh.curenv;
        pw->p_pid = pid;
-       pw->p_flag = P_EXITSAVE;
+       if(!sh.outpipe || sh_isoption(SH_PIPEFAIL))
+               pw->p_flag = P_EXITSAVE;
        pw->p_exitmin = sh.xargexit;
        pw->p_exit = 0;
        if(sh_isstate(SH_MONITOR))

I speculate wildly that this code change corresponds to the following in the Changelog:
    09-01-28  A bug in which a command substitution could return an exit status
              of 127 when the pipefail option is enabled has been fixed.

By commenting out the "if" and leaving an unconditional initialization of pw->p_flag, my specific bug is fixed (presumably at the expense of the bug fix which prompted the addition of this code).  I have verified that this holds true for ksh-20100202-1.el5.src.rpm and ast-ksh.2011-02-08 as well:

--- 2011-02-08/src/cmd/ksh93/sh/jobs.c--orig    2011-07-28 13:33:32.000000000 -0500
+++ 2011-02-08/src/cmd/ksh93/sh/jobs.c  2011-07-28 13:33:50.000000000 -0500
@@ -1346,7 +1346,7 @@
        pw->p_shp = shp;
        pw->p_env = shp->curenv;
        pw->p_pid = pid;
-       if(!shp->outpipe || (sh_isoption(SH_PIPEFAIL) && job.waitall))
+       /*if(!shp->outpipe || (sh_isoption(SH_PIPEFAIL) && job.waitall))*/
                pw->p_flag = P_EXITSAVE;
        pw->p_exitmin = shp->xargexit;
        pw->p_exit = 0;

I have submitted this information to the ast-developers list.

Comment 2 Michal Hlavinka 2011-07-29 08:02:55 UTC
It's reproducible for me. Just first broken version is 2009-01-20, so that line from changelog a)is different, b)has a wrong date. Anyway, thanks for your detective work. Let me know if you get answer off-list.

Comment 3 Mike Jetzer 2011-08-05 17:10:06 UTC
David Korn replied to me off-list:
--begin quoted text--------------------------------------------------
I wasn't able to reproduce this at first because my version of tee is built-in.
Once I change to /usr/bin/tee, I was able to reproduce the problem.

You are correct about where the fix is, but the change is
from
       if(!sh.outpipe || sh_isoption(SH_PIPEFAIL))
to
       if(!sh.outpipe || sh_isoption(SH_PIPEFAIL) || sh.cpid==pid)

The fix will be in the next update.
--end quoted text----------------------------------------------------


Note that this specific patch is applicable to older releases of ast-ksh.

I have verified that the following patch corrects the issue in 2010-02-02 (the version of ksh93 shipped with RHEL 5.5):

--- src/cmd/ksh93/sh/jobs.c--orig       2011-08-05 12:06:41.000000000 -0500
+++ src/cmd/ksh93/sh/jobs.c     2011-08-05 11:45:06.000000000 -0500
@@ -1186,7 +1186,7 @@
        job.pwlist = pw;
        pw->p_env = sh.curenv;
        pw->p_pid = pid;
-       if(!sh.outpipe || (sh_isoption(SH_PIPEFAIL) && job.waitall))
+       if(!sh.outpipe || (sh_isoption(SH_PIPEFAIL) && job.waitall) || sh.cpid==pid)
                pw->p_flag = P_EXITSAVE;
        pw->p_exitmin = sh.xargexit;
        pw->p_exit = 0;

Comment 6 errata-xmlrpc 2012-02-21 05:51:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0159.html