Bug 1247268

Summary: ksh-20120801-28.el6.x86_64 introduced by RHEL6u7 segfaults
Product: Red Hat Enterprise Linux 6 Reporter: Jindrich Novy <jindrich.novy>
Component: kshAssignee: Michal Hlavinka <mhlavink>
Status: CLOSED DUPLICATE QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.7CC: cww, fkrska, jindrich.novy, kdudka, martin.x.andersen, pandrade, salmy, zpytela
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-25 08:49:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1172231    
Attachments:
Description Flags
ksh-20120801-trapcom.patch
none
ksh-20120801-std_malloc.patch none

Description Jindrich Novy 2015-07-27 16:46:22 UTC
Description of problem:
Korn shell segfaults.

Version-Release number of selected component (if applicable):
20120801-28.el6.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. Run sufficiently complex ksh script.
2.
3.

Actual results:
kernel: script..22217[22217] general protection ip:4dd304 sp:7fff1e979ee0 error:0 in ksh93[400000+15d000]

Expected results:
No segfault.

Additional info:
The offset of the segfault ip:4dd304 should give enough hint which patch on top of ksh-20120801-21.el6_6.3.x86_64 is causing the segfault, ksh-20120801-21.el6_6.3.x86_64 works flawlessly.

Comment 1 Jindrich Novy 2015-07-27 16:51:50 UTC
We use korn shell quite extensively so this regression is particularly annoying. Currently we version-locked ksh version to ksh-20120801-21.el6_6.3.x86_64 to avoid crashes.

I'm happy to test if you provide srpm.

Comment 4 Michal Hlavinka 2015-07-28 10:58:46 UTC
(In reply to Jindrich Novy from comment #0)
> Steps to Reproduce:
> 1. Run sufficiently complex ksh script.

We need reproducer for testing and regression tests to prevent it from happening in next releases. Above description is insufficient. Please provide reproducer we can use. Thanks

Comment 6 Jindrich Novy 2015-07-30 14:47:33 UTC
Ok, the segfault is caused by Patch60 - "trapcom" patch.

Please revert.

It is obvious from the second stack frame:

(gdb) f 2
#2  0x0000000000453caf in sh_subshell (shp=0x76e320, t=0x2b2fe8f150b0, flags=1, comsub=3) at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/subshell.c:740
740                                             free(shp->st.trapcom[isig]);
(gdb) l
735                     shp->st.otrap = 0;
736                     if(nsig)
737                     {
738                             for (isig = 0; isig < nsig; ++isig)
739                                     if (shp->st.trapcom[isig] && shp->st.trapcom[isig]!=Empty)
740                                             free(shp->st.trapcom[isig]);
741                             memcpy((char*)&shp->st.trapcom[0],savsig,nsig*sizeof(char*));
742                             free((void*)savsig);
743                     }
744                     shp->options = sp->options;

Comment 7 Paulo Andrade 2015-08-01 03:08:08 UTC
Created attachment 1058208 [details]
ksh-20120801-trapcom.patch

User reported this patch corrects the problem.
The change to the original ksh-20120801-trapcom.patch
patch is to not strdup (cosmetic) neither free (crash)
thespecial Empty constant.

Comment 9 Eric Weaver 2015-08-03 09:54:12 UTC
Hi, we are also incurring this same problem with production batch jobs while running ksh93.  Any idea when the RPM will come out?  And, we've never tried an SRPM from RedHat, but if the RPM is not coming out today, then we'd like to try the SRM; how does one get the SRPM?

Comment 11 Kamil Dudka 2015-08-06 14:36:13 UTC
Jindro, could you please confirm that replacing ksh-20120801-trapcom.patch by attachment #1058208 [details] prevents ksh from crashing in your environment?

Comment 12 Jindrich Novy 2015-08-06 18:27:29 UTC
Still segfaults with the patch applied. Within the internal allocator:

Program terminated with signal 11, Segmentation fault.
#0  bestsearch (vd=0x76cb00, size=0, wanted=<value optimized out>) at /usr/src/debug/ksh-20120801/src/lib/libast/vmalloc/vmbest.c:292
292                             {       if(size <= (s = SIZE(t)) )
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.166.el6_7.1.x86_64
(gdb) bt
#0  bestsearch (vd=0x76cb00, size=0, wanted=<value optimized out>) at /usr/src/debug/ksh-20120801/src/lib/libast/vmalloc/vmbest.c:292
#1  0x00000000004d98f8 in bestreclaim (vd=0x76cb00, wanted=0x0, c=6) at /usr/src/debug/ksh-20120801/src/lib/libast/vmalloc/vmbest.c:422
#2  0x00000000004da33d in bestalloc (vm=0x76cb80, size=8192, local=1) at /usr/src/debug/ksh-20120801/src/lib/libast/vmalloc/vmbest.c:661
#3  0x00000000004de9b7 in _ast_malloc (size=8192) at /usr/src/debug/ksh-20120801/src/lib/libast/vmalloc/malloc.c:521
#4  0x00002b997ee6065a in ?? ()
#5  0x00002b997e5d3440 in ?? ()
#6  0x000000000076a2c0 in ?? ()
#7  0x00002b997e56ebe8 in ?? ()
#8  0x000000000076a2c0 in ?? ()
#9  0x00002b997e5bc920 in ?? ()
#10 0x0000000000000000 in ?? ()

Comment 13 Jindrich Novy 2015-08-06 18:37:04 UTC
In /var/log/messages:

segfault at 39 ip 00000000004d95e9 sp 00007fffc26d6f40 error 4 in ksh93[400000+15d000]

valgrind says:

==12776== Syscall param mount(type) points to unaddressable byte(s)
==12776==    at 0x55A313A: mount (in /lib64/libc-2.12.so)
==12776==    by 0x480CF7: fs3d (fs3d.c:57)
==12776==    by 0x420E0A: sh_init (init.c:1303)
==12776==    by 0x407BA1: sh_main (main.c:141)
==12776==    by 0x54D8D5C: (below main) (in /lib64/libc-2.12.so)
==12776==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

Comment 14 Jindrich Novy 2015-08-06 18:48:39 UTC
Within frame #0:

(gdb) p t
$1 = (Block_t *) 0x31

This really isn't dereferencable.

Comment 15 Paulo Andrade 2015-08-06 19:24:15 UTC
Hi Jindrich,

Please check a backtrace with ksh-debuginfo installed.
This would allow us to know where in ksh the problem
happened.

Comment 16 Paulo Andrade 2015-08-06 19:26:06 UTC
Created attachment 1060061 [details]
ksh-20120801-std_malloc.patch

To have a meaningful valgrind check it is also
required to applythe attached patch, and this
change to ksh.spec:

-export CCFLAGS="$RPM_OPT_FLAGS -fno-strict-aliasing $XTRAFLAGS -DSHOPT_AUDIT"
+export CCFLAGS="-O0 -g3 -fno-strict-aliasing $XTRAFLAGS -DSHOPT_AUDIT -D_AST_std_malloc=1"

Comment 17 Jindrich Novy 2015-08-07 10:13:12 UTC
Hi Paulo,

both mock and local build fails with:

+ cc -O0 -g3 -fno-strict-aliasing -Wno-unknown-pragmas -Wno-missing-braces -Wno-unused-result -Wno-return-type -Wno-int-to-pointer-cast -Wno-parentheses -Wno-unused -Wno-unused-but-set-variable -Wno-cpp -DSHOPT_AUDIT -D_AST_std_malloc=1 -L. -L/builddir/build/BUILD/ksh-20120801/arch/linux.i386-64/lib -o suid_exec suid_exec.o -last -last
/usr/bin/ld: cannot find -last
collect2: ld returned 1 exit status
mamake [cmd/ksh93]: *** exit code 1 making suid_exec

Note that builddir/build/BUILD/ksh-20120801/arch/linux.i386-64/lib contains only static libraries and no libast.a. Maybe something else needs to be tweaked so that libast.a is built?

Comment 18 Jindrich Novy 2015-08-07 11:19:44 UTC
Ok, ksh doesn't segfault with the patch in comment #16 and -D_AST_std_malloc=1.

Comment 19 Paulo Andrade 2015-08-07 12:53:06 UTC
Hi Jindrich,

If this version does not trigger the problem, one built without
it should not fail as well. Unless:

o There is a toolchain bug, as the suggested patch did build with
  -O0 for easier debug
o There is a bug in the ksh malloc. I can only think of possible
  issues if using the (not even documented anymore) alarm interface

Comment 21 Martin Andersen 2015-08-12 12:53:20 UTC
This issue hit us pretty severely this weekend. Any estimate on when the new package with the proposed patch will make it to the official repos?

Comment 22 Kamil Dudka 2015-08-25 08:27:27 UTC
Is this a duplicate of bug #1247383?