Bug 418231 - ppc64: utrace(?): Stack overflows on ptrace testsuite
ppc64: utrace(?): Stack overflows on ptrace testsuite
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.1
ppc64 Linux
high Severity high
: ---
: ---
Assigned To: Roland McGrath
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-12-10 11:04 EST by Jan Kratochvil
Modified: 2011-03-18 00:48 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-03-18 00:48:59 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
ppc64 stack increase 4x (from 16KB to 64KB) (3.16 KB, patch)
2007-12-10 11:04 EST, Jan Kratochvil
no flags Details | Diff

  None (edit)
Description Jan Kratochvil 2007-12-10 11:04:42 EST
Description of problem:
During unprivileged userland ptrace tests one can crash the ppc64 kernel.
Kernel panic dumps point to a stack corruption and larger stack workarounds the
problem.
Either ppc64 has the default 16KB stack too small or utrace is just too hungry.

Version-Release number of selected component (if applicable):
kernel-2.6.18-58.el5.utrace1.ppc64
(kernel-2.6.18-58.el5.utrace2.ppc64 crashes the same way but the dumps included
here match the utrace1 build)
(kernel-2.6.18-58.el5.ppc64 is not well testable as userland locks up too early)

How reproducible:
Usually during the first `make check'.

Steps to Reproduce:
1. Download http://sourceware.org/systemtap/wiki/utrace/tests .
2. make check, specifically using:
   i=0;while :;do date --iso=seconds;TESTTIME=$[10 * 60] make
check;i=$[$i+1];echo $i;done

Actual results:
kernel-2.6.18-58.el5.utrace1.ppc64
http://porkchop.devel.redhat.com/brewroot/scratch/roland/task_1067117/
Unable to handle kernel paging request for data at address 0x004a8850
Faulting instruction address: 0xc00000000006e6d8
cpu 0x1: Vector: 300 (Data Access) at [c0000000593bba70]
    pc: c00000000006e6d8: .do_exit+0x4cc/0xa14
    lr: c00000000006e6a8: .do_exit+0x49c/0xa14
    sp: c0000000593bbcf0
   msr: 8000000000009032
   dar: 4a8850
 dsisr: 40000000
  current = 0xc00000005ac57310
  paca    = 0xc000000000475000
    pid   = 2216, comm = tee
enter ? for help
1:mon> _

Expected results:
No crash.

Additional info:
This crash in do_exit() indicates `struct thread_info' corruption indicating a
stack overflow.  Other crashes usually indicated just a general memory
corruption.  The -debug kernel also does not print anything useful.

I wrote proper umapped-page-below (by vmap()) stack checker a long time ago but
only for x86.  I have also its simplified version for x86_64 but nothing for
ppc/ppc64.  The x86_64 patch would be easily portable, though.

Increasing the stack size 2x (to 32KB) still crashed at the same place although
during a `make check' run #9 (with the 16KB stack it crashes during the run #1):
2.6.18-58.el5.utrace2ppcstack2x:
Unable to handle kernel paging request for data at address 0x65b21ca8
Faulting instruction address: 0xc0000000000891d8
cpu 0x0: Vector: 300 (Data Access) at [c000000065bef8d0]
    pc: c0000000000891d8: .debug_mutex_add_waiter+0x4c/0x6c
    lr: c00000000034fb0c: .__mutex_lock_interruptible_slowpath+0x108/0x33c
    sp: c000000065befb50
   msr: 8000000000001032
   dar: 65b21ca8
 dsisr: 42000000
  current = 0xc000000065b21550
  paca    = 0xc000000000465000
    pid   = 4597, comm = make
enter ? for help
0:mon> ?

Including a patch increasing the stack size 4x (to 64KB), it passed 15 `make
check' runs so far but it may be also a false positive.  Still the whole testing
indicates the ppc problem is related the stack size overflow issue.

Kernel build 2.6.18-58.el5.utrace2ppcstack4x.ppc64 with the attached patch at:
  http://porkchop.devel.redhat.com/brewroot/scratch/jkratoch/task_1072746/
Comment 1 Jan Kratochvil 2007-12-10 11:04:42 EST
Created attachment 282881 [details]
ppc64 stack increase 4x (from 16KB to 64KB)
Comment 2 Jan Kratochvil 2007-12-10 17:33:28 EST
As 2.6.18-58.el5.utrace2ppcstack4x.ppc64 crashed in RHTS Job 11941 increasing
the stack size is probably not a solution.  But it delays the crash a lot.

x86 stack overflow patch:
http://people.redhat.com/jkratoch/kernel-stackoverflow-x86-2005.patch

x86_64 stack overflow patch (simple):
http://people.redhat.com/jkratoch/kernel-stackoverflow-x86_64.patch

This ppc64 crashing Bug is AFAIK not tracked so far for utrace so keeping this
Bug open.

RHTS Job 11941 - 2.6.18-58.el5.utrace2ppcstack4x:
list_del corruptio

RHTS Job 11921 - 2.6.18-58.el5.utrace2ppcstack2x:
Unable to handle kernel paging request for data at address 0x004a8850
Faulting instruction address: 0xc000000000067784
cpu 0x0: Vector: 300 (Data Access) at [c0000000780df950]
    pc: c000000000067784: .copy_process+0x294/0x158c
    lr: c000

RHTS Job 11871 - 2.6.18-58.el5.utrace1
kernel BUG in check_dead_utrace at kernel/utrace.c:328!
cpu 0x0: Vector: 700 (Program Check) at [c00000000269f7e0]
    pc: c0000000000ae0c4: .check_dead_utrace+0x178/0x22c
    lr: c0000000000aec44: .wake_quiescent+0x94/0x1dc
    sp: c00000000269fa60
   msr: 8000000000029032
  current = 0xc0000000764a5b60
  paca    = 0xc000000000474e00
    pid   = 18121, comm = late-ptrace-may
kernel BUG in check_dead_utrace at kernel/utrace.c:328!
enter ? for help
0:mon>

RHTS Job 11868 - 2.6.18-58.el5.utrace1
Unable to handle kernel paging request for data at address 0x004a8850
Faulting instruction address: 0xc00000000006e6d8
cpu 0x1: Vector: 300 (Data Access) at [c0000000593bba70]
    pc: c00000000006e6d8: .do_exit+0x4cc/0xa14
    lr: c00000000006e6a8: .do_exit+0x49c/0xa14
    sp: c0000000593bbcf0
   msr: 8000000000009032
   dar: 4a8850
 dsisr: 40000000
  current = 0xc00000005ac57310
  paca    = 0xc000000000475000
    pid   = 2216, comm = tee
enter ? for help
1:mon>

RHTS Job 11852 - 2.6.18-58.el5.utrace1
Unable to handle kernel paging request for data at address 0x004a8850
Faulting instruction address: 0xc000000000067784
cpu 0x0: Vector: 300 (Data Access) at [c00000004ca1b950]
    pc: c000000000067784: .copy_process+0x294/0x158c
    lr: c000000000067664: .copy_process+0x174/0x158c
    sp: c00000004ca1bbd0
   msr: 800000000000b032
   dar: 4a8850
 dsisr: 40000000
  current = 0xc000000050870b40
  paca    = 0xc000000000474e00
    pid   = 2126, comm = runtests.sh
enter ? for help
0:mon>

RHTS Job 11791 - 2.6.18-58.el5.utrace1
Unable to handle kernel paging request for data at address 0x004a8850
Faulting instruction address: 0xc000000000067784
cpu 0x0: Vector: 300 (Data Access) at [c00000002b1eb950]
    pc: c000000000067784: .copy_process+0x294/0x158c
    lr: c000000000067664: .copy_process+0x174/0x158c
    sp: c00000002b1ebbd0
   msr: 9000000000009032
   dar: 4a8850
 dsisr: 40000000
  current = 0xc00000000806dce0
  paca    = 0xc000000000474e00
    pid   = 26101, comm = rhts-test-runne
enter ? for help
0:mon>
Comment 3 Jan Kratochvil 2011-03-18 00:48:59 EDT
kernel-2.6.18-238.el5.ppc64
Red Hat Enterprise Linux Server release 5.6 (Tikanga)

After 24h it still has not crashed
(ibm-js22-vios-01-lp3.rhts.eng.bos.redhat.com), it may have beeen already fixed.

Note You need to log in before you can comment on or make changes to this bug.