Bug 1369571

Summary: RFC: confusing ksh behavior
Product: Red Hat Enterprise Linux 6 Reporter: Paulo Andrade <pandrade>
Component: kshAssignee: Siteshwar Vashisht <svashisht>
Status: CLOSED CANTFIX QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.8CC: zpytela
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1460950 (view as bug list) Environment:
Last Closed: 2017-06-13 08:50:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1460950    

Description Paulo Andrade 2016-08-23 19:41:05 UTC
ksh has an apparently undocumented behavior that
from time to time confuses users.

  For example:

$ ksh
$ exit 267
Segmentation fault (core dumped)

$ ksh
$ exit 286
Power failure

  Or, depending on the signal it will just call pause()
and block forever:

$ ksh
$ exit 273
<< sets SIGCHLD to SIG_DFL and calls pause >>

  The problem is that frequently some users report a crash,
that was caused by some shell code doing the equivalent to:

trap 'exit 267' EXIT

or

trap 'exit 267' 0

  Sample backtrace, and variables state, so the condition
should be easy to understand:

(gdb) bt
#0  0x000000347b432907 in sigpending (set=<value optimized out>)
    at ../sysdeps/unix/sysv/linux/ia64/sigpending.c:38
#1  0x000000000041ab99 in sh_done (ptr=0x76e420, sig=11)
    at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/fault.c:664
#2  0x0000000000407e7c in sh_main (ac=<value optimized out>, 
    av=0x7ffc234452b8, userinit=<value optimized out>)
    at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/main.c:354
#3  0x000000347b41ed5d in __libc_start_main (main=0x406c00 <main>, argc=19, 
    ubp_av=0x7ffc234452b8, init=<value optimized out>, 
    fini=<value optimized out>, rtld_fini=<value optimized out>, 
    stack_end=0x7ffc234452a8) at libc-start.c:251

Note the 0 argument for sig.

(gdb) frame 2
#2  0x0000000000407e7c in sh_main (ac=<value optimized out>, 
    av=0x7ffc234452b8, userinit=<value optimized out>)
    at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/main.c:354
354		sh_done(shp,0);

the sig variable, that was also an argument is changed
due to whatever was in trapcom[0, but it is lost...

	if(t=shp->st.trapcom[0])
	{
		shp->st.trapcom[0]=0; /*should free but not long */
		shp->oldexit = savxit;
		sh_trap(t,0);
		savxit = shp->exitval;
	}
(savxit was initialized with the value of shp->exitval)

(gdb) frame 1
#1  0x000000000041ab99 in sh_done (ptr=0x76e420, sig=11)
    at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/fault.c:664
664			kill(getpid(),sig);
(gdb) p shp->oldexit
$1 = 0
(gdb) p shp->exitval
$2 = 267

  To avoid such problems, I would suggest (unless it is
documented and expected behavior and I am missing something
obvious) this pseudo patch to src/cmd/ksh93/bltins/cflow.c:

-	if(n<0 || n==256 || n > SH_EXITMASK+shp->gd->sigmax+1)
-			n &= ((unsigned int)n)&SH_EXITMASK;
+	n &= ((unsigned int)n)&SH_EXITMASK;

Comment 2 Paulo Andrade 2016-09-28 17:30:58 UTC
  It should be worth noting that this issue does not
happen in rhel5 ksh-20100621 where the related diff
chunk is:
$ diff -u {ksh-20100621,ksh-20120801}/src/cmd/ksh93/bltins/cflow.c

@@ -64,7 +64,9 @@
 		errormsg(SH_DICT,ERROR_usage(2),"%s",optusage((char*)0));
 	pp->mode = (**argv=='e'?SH_JMPEXIT:SH_JMPFUN);
 	argv += opt_info.index;
-	n = (((arg= *argv)?(int)strtol(arg, (char**)0, 10)&SH_EXITMASK:shp->oldexit));
+	n = (((arg= *argv)?(int)strtol(arg, (char**)0, 10):shp->oldexit));
+	if(n<0 || n==256 || n > SH_EXITMASK+shp->gd->sigmax+1)
+			n &= ((unsigned int)n)&SH_EXITMASK;
 	/* return outside of function, dotscript and profile is exit */
 	if(shp->fn_depth==0 && shp->dot_depth==0 && !sh_isstate(SH_PROFILE))
 		pp->mode = SH_JMPEXIT;

  Unfortunately "blame" on
https://github.com/att/ast/blame/master/src/cmd/ksh93/bltins/cflow.c
does not help, so I do not know for sure what is the
reason of testing for "n==256". But I believe if such
test were to be done, it should be "(n&256)".

  There is no related diff in src/cmd/ksh93/sh/fault.c:sh_done()
that is, if the exit value AND 256 is not zero, it will
AND it, and "kill itself" with the result.