Bug 726199 - wait fails on pid of co-process
Summary: wait fails on pid of co-process
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: ksh
Version: 5.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Michal Hlavinka
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks: 728900 799362 1243788 1243789
TreeView+ depends on / blocked
 
Reported: 2011-07-27 19:51 UTC by Mike Jetzer
Modified: 2015-07-16 10:06 UTC (History)
3 users (show)

Fixed In Version: ksh-20100621-1.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 728900 1243788 1243789 (view as bug list)
Environment:
Last Closed: 2012-02-21 05:51:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0159 0 normal SHIPPED_LIVE ksh bug fix and enhancement update 2012-02-20 14:53:47 UTC

Description Mike Jetzer 2011-07-27 19:51:53 UTC
Description of problem:

Performing a wait on the pid of a co-process fails in the following circumstance:
    #! /bin/ksh

    ls -l |&
    pid=$!

    tee -a /tmp/out <&p

    wait $pid
    print $?

The wait's return value is always 127.  It does not matter whether the co-process exits with a successful value (e.g., "ls -l") or an unsuccessful value (e.g., "ls -l does-not-exist"); 127 is always returned.

POSIX documentation on wait(1) (http://pubs.opengroup.org/onlinepubs/009695399/utilities/wait.html) states "If one or more pid operands are specified that represent unknown process IDs, wait shall treat them as if they were known process IDs that exited with exit status 127" and later on (in "Exit Status") indicates that 127 is returned only if "[t]he command identified by the last pid operand specified is unknown."

We are porting our application to Linux and ran across this problem.  The sample code works correctly under the Korn shell (ksh88) supplied by AIX, HP-UX, and Solaris.  It also works correctly under the pdksh-5.2.14-30.6 distributed with RHEL 4.8.


Version-Release number of selected component (if applicable):

I've compiled a number of different ksh SRPMS.  The problem is present in ksh-20100202-1.el5 and persists in ksh-20100202-1.el5_6.6, as well in ksh-20100621-2.el6 and ksh-20100621-6.el6.  However, the problem is not present in ksh-20080202-2.el5 or ksh-20080202-2.el5_3.1.


How reproducible:
Always

Steps to Reproduce:
Run sample code above.
  
Actual results:
Return value of wait is always 127.

Expected results:
wait returns status of specified command

Comment 1 Mike Jetzer 2011-07-28 19:21:01 UTC
I've compiled a number of versions of ksh93 (original ast-ksh releases, rather than Red Hat SRPMS) to identify when this bug was introduced; it appears to have cropped up between ast-ksh.2008-12-12 and ast-ksh.2009-05-01.

By examining the diffs and subsequent trial and error, I found that the following is responsible for the failure:

--- 2008-12-12/src/cmd/ksh93/sh/jobs.c  2008-12-10 07:56:00.000000000 -0600
+++ 2009-05-01/src/cmd/ksh93/sh/jobs.c  2009-04-29 17:07:32.000000000 -0500
@@ -1176,7 +1177,8 @@
        job.pwlist = pw;
        pw->p_env = sh.curenv;
        pw->p_pid = pid;
-       pw->p_flag = P_EXITSAVE;
+       if(!sh.outpipe || sh_isoption(SH_PIPEFAIL))
+               pw->p_flag = P_EXITSAVE;
        pw->p_exitmin = sh.xargexit;
        pw->p_exit = 0;
        if(sh_isstate(SH_MONITOR))

I speculate wildly that this code change corresponds to the following in the Changelog:
    09-01-28  A bug in which a command substitution could return an exit status
              of 127 when the pipefail option is enabled has been fixed.

By commenting out the "if" and leaving an unconditional initialization of pw->p_flag, my specific bug is fixed (presumably at the expense of the bug fix which prompted the addition of this code).  I have verified that this holds true for ksh-20100202-1.el5.src.rpm and ast-ksh.2011-02-08 as well:

--- 2011-02-08/src/cmd/ksh93/sh/jobs.c--orig    2011-07-28 13:33:32.000000000 -0500
+++ 2011-02-08/src/cmd/ksh93/sh/jobs.c  2011-07-28 13:33:50.000000000 -0500
@@ -1346,7 +1346,7 @@
        pw->p_shp = shp;
        pw->p_env = shp->curenv;
        pw->p_pid = pid;
-       if(!shp->outpipe || (sh_isoption(SH_PIPEFAIL) && job.waitall))
+       /*if(!shp->outpipe || (sh_isoption(SH_PIPEFAIL) && job.waitall))*/
                pw->p_flag = P_EXITSAVE;
        pw->p_exitmin = shp->xargexit;
        pw->p_exit = 0;

I have submitted this information to the ast-developers list.

Comment 2 Michal Hlavinka 2011-07-29 08:02:55 UTC
It's reproducible for me. Just first broken version is 2009-01-20, so that line from changelog a)is different, b)has a wrong date. Anyway, thanks for your detective work. Let me know if you get answer off-list.

Comment 3 Mike Jetzer 2011-08-05 17:10:06 UTC
David Korn replied to me off-list:
--begin quoted text--------------------------------------------------
I wasn't able to reproduce this at first because my version of tee is built-in.
Once I change to /usr/bin/tee, I was able to reproduce the problem.

You are correct about where the fix is, but the change is
from
       if(!sh.outpipe || sh_isoption(SH_PIPEFAIL))
to
       if(!sh.outpipe || sh_isoption(SH_PIPEFAIL) || sh.cpid==pid)

The fix will be in the next update.
--end quoted text----------------------------------------------------


Note that this specific patch is applicable to older releases of ast-ksh.

I have verified that the following patch corrects the issue in 2010-02-02 (the version of ksh93 shipped with RHEL 5.5):

--- src/cmd/ksh93/sh/jobs.c--orig       2011-08-05 12:06:41.000000000 -0500
+++ src/cmd/ksh93/sh/jobs.c     2011-08-05 11:45:06.000000000 -0500
@@ -1186,7 +1186,7 @@
        job.pwlist = pw;
        pw->p_env = sh.curenv;
        pw->p_pid = pid;
-       if(!sh.outpipe || (sh_isoption(SH_PIPEFAIL) && job.waitall))
+       if(!sh.outpipe || (sh_isoption(SH_PIPEFAIL) && job.waitall) || sh.cpid==pid)
                pw->p_flag = P_EXITSAVE;
        pw->p_exitmin = sh.xargexit;
        pw->p_exit = 0;

Comment 6 errata-xmlrpc 2012-02-21 05:51:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0159.html


Note You need to log in before you can comment on or make changes to this bug.