Bug 191783

Summary: pdksh core dumps
Product: Red Hat Enterprise Linux 2.1 Reporter: Kurtis Rader <kdrader>
Component: pdkshAssignee: Karsten Hopp <karsten>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 2.1CC: tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-01-10 11:46:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
files and scripts to reproduce the failure
none
Replace pdksh cell based allocation with linked list none

Description Kurtis Rader 2006-05-15 19:18:22 UTC
Description of problem:

pdksh will die from a SIGSEGV under some circumstances.

Version-Release number of selected component (if applicable):

pdksh 5.2.14-22

How reproducible:

Can be reproduced on demand given proper initial conditions. However, the 
problem is extremely sensitive to the initial conditions and therefore 
difficult to reproduce in an arbitrary environment. Problem has been observed 
on RHEL 2.1 and RHEL 3 (not surprising since they have the same version of 
pdksh). However, I have been unable to reproduce the problem on RHEL 3 using 
the exact same reproduction steps; not even with exec-shield and exec-shield-
randomize disabled.

Steps to Reproduce:
1. Unpack the attachment with cwd == /
2. ./pdksh_segv
  
Actual results:

./pdksh_segv: line 27: 27597 Segmentation fault

Expected results:

Script runs to completion

Additional info:

From a version of pdksh built with "-g":

Core was generated by `/home/root/pdksh.dbg ./exec_java.csh AAA BBB CCC DDD'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
#0  ablockfree (bp=0x5f6f796e, ap=0x8074dc8) at alloc.c:475
475             if (bp->next == bp) /* only block */
(gdb) bt
#0  ablockfree (bp=0x5f6f796e, ap=0x8074dc8) at alloc.c:475
#1  0x08067ac4 in setstr (vq=0x807d5b8, 
    s=0x807d000 "/xxx/lib//app_e_kofu.jar", error_ok=0) at var.c:372
#2  0x08055745 in execute (t=0x807b248, flags=0) at exec.c:332
#3  0x0805f6f7 in shell (s=0x8079d28, toplevel=1) at main.c:616
#4  0x0805f2c8 in main (argc=4, argv=0xbffed244) at main.c:429

Comment 1 Kurtis Rader 2006-05-15 19:18:26 UTC
Created attachment 129109 [details]
files and scripts to reproduce the failure

Comment 2 Kurtis Rader 2006-05-17 18:34:33 UTC
It has been found that the pdksh allocation test harness can be used to produce 
a failure that may be related to this problem.

1.download and extract source of pdksh.
  I used pdksh-5.2.14-21.

2.make "TEST_ALLOC" tool in alloc.c
  % cd pdksh-5.2.14-21
  % ./configure
  % gcc -o test_alloc -DTEST_ALLOC=1 -DDEBUG_ALLOC=1 -O2 alloc.c

3.run test_alloc and test("INPUT>" means input contents to pdksh)
  % ./test_alloc
  INPUT > alloc 800 = i1
  OUTPUT> 0x804b028 = alloc(800)  1,i1
  INPUT > alloc 792 = i2
  OUTPUT> 0x804b358 = alloc(792)  2,i2
  INPUT > alloc 10 = i3
  OUTPUT> 0x804b680 = alloc(10)  3,i3
  INPUT > aprint 0
  OUTPUT> aprint(0, 0)  4
  OUTPUT> aprint: block  0 (p=0x804b008,0x804b008,n=0x804b008): 0x0x804b018 ..
  OUTPUT> 0x0x804b988 (2416)
  OUTPUT> aprint:   0x0x804b018 .. 0x0x804b348 (816) allocated
  OUTPUT> aprint:   0x0x804b348 .. 0x0x804b670 (808) allocated
  OUTPUT> aprint:   0x0x804b670 .. 0x0x804b690 (32) allocated
  OUTPUT> aprint:   0x0x804b690 .. 0x0x804b988 (760) free
  INPUT > afree i1
  OUTPUT> afree(0x804b028)  5,i1
  INPUT > afree i2
  OUTPUT> afree(0x804b358)  6,i2
  INPUT > aprint 0
  OUTPUT> aprint(0, 0)  7
  OUTPUT> aprint: block  0 (p=0x804b008,0x804b008,n=0x804b008):
0x0x804b018 ..
0x0x804b988 (2416)
  OUTPUT> aprint:   0x0x804b018 .. 0x0x804b670 (1624) free
  OUTPUT> aprint:   0x0x804b670 .. 0x0x804b690 (32) allocated
  OUTPUT> aprint:   0x0x804b690 .. 0x0x804b988 (760) free

  INPUT > alloc 1591 = i4
  OUTPUT> acheck: big cell doesn't make up whole block
  OUTPUT> aerror: acheck failed



Comment 3 Chris Lalancette 2006-05-31 14:04:44 UTC
I backported a patch from Debian that replaces the wacky allocation scheme that
pdksh uses with a much more simple and sane one.  It seems to fix the problem in
my testing...I am going to attach the patch here, and put RPMS here:

http://people.redhat.com/clalance/pdksh.html

I only put up the i386 package and the source RPM.

NOTE: this patch has not yet been blessed by the pdksh maintainer.  That means
these packages are for testing only, and does not guarantee anything.  Also,
given the maintenance status of 2.1, I don't know if it will ever make an
update.  But it will be more convincing if there is testing (beyond my own) to
back up the patch.

Comment 4 Chris Lalancette 2006-05-31 14:05:24 UTC
Created attachment 130274 [details]
Replace pdksh cell based allocation with linked list

Comment 5 Kurtis Rader 2006-05-31 21:58:43 UTC
(In reply to comment #4)
> Created an attachment (id=130274) [edit]
> Replace pdksh cell based allocation with linked list
> 

Resolves problem for me.