Bug 453438 - [ia64] clone2 (pthread_create) crashes with -f
[ia64] clone2 (pthread_create) crashes with -f
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: strace (Show other bugs)
5.2
ia64 Linux
low Severity medium
: rc
: ---
Assigned To: Roland McGrath
Brian Brock
:
Depends On: 455874
Blocks:
  Show dependency treegraph
 
Reported: 2008-06-30 12:50 EDT by Jan Kratochvil
Modified: 2009-01-20 17:10 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 17:10:04 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fix. (698 bytes, patch)
2008-06-30 12:50 EDT, Jan Kratochvil
no flags Details | Diff

  None (edit)
Description Jan Kratochvil 2008-06-30 12:50:54 EDT
Description of problem:
Currently one cannot `strace -f' multithreaded processes.

Version-Release number of selected component (if applicable):
strace-4.5.16-1.el5.1.ia64
kernel-2.6.18-94.el5.ia64

How reproducible:
Always.

Steps to Reproduce:
cat >thread.c <<EOH; gcc -o thread thread.c -pthread; strace -f ./thread
#include <pthread.h>
void *start (void *arg) { return arg; }
pthread_t thread1;
int main () { pthread_create (&thread1, NULL, start, NULL); sleep (1); return 0; }
EOH

Actual results:
execve("./thread", ["./thread"], [/* 41 vars */]) = 1
...
clone2(Process 8979 attached
child_stack=0x200000000031c000, stack_size=0x9feb80,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEAR
TID, parent_tidptr=0x2000000000d1b2d0, tls=0x2000000000d1b910,
child_tidptr=0x2000000000d1b2d0) = 8979
...
[pid  8978] nanosleep({1, 0},  <unfinished ...>
[pid  8979] --- SIGSEGV (Segmentation fault) @ 2000000000236d20 (3d0f00) ---
Process 8979 detached
+++ killed by SIGSEGV +++


Expected results:
execve("./thread", ["./thread"], [/* 41 vars */]) = 1
...
clone2(Process 9008 attached
child_stack=0x200000000031c000, stack_size=0x9feb80,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x2000000000d1b2d0, tls=0x2000000000d1b910,
child_tidptr=0x2000000000d1b2d0) = 9008
...
[pid  9007] nanosleep({1, 0},  <unfinished ...>
[pid  9008] get_robust_list(0x2000000000d1b2e0, 0x18, 0) = 0
[pid  9008] exit(0)                     = ?
Process 9008 detached
<... nanosleep resumed> {1, 0})         = 0
exit_group(0)                           = ?

Additional info:
Patch posted to upstream <strace-devel@lists.sourceforge.net>:


In the case of `child_stack=0' (such as is in the case of FORK glibc call) or
for the parent of the `child_stack!=0' sample above the call RESTORE_ARG0 still
rewrites a memory not containing the modifying syscall argument, just in such
case nothing crashes.  In the case of a new stack (a child of PTHREAD_CREATE)
RESTORE_ARG0 corrupts the IN0 stacked register and glibc crashes at
glibc/sysdeps/unix/sysv/linux/ia64/clone2.S:
1:      ld8 out1=[in0],8        /* Retrieve code pointer.       */

IMO according to ia64 RSE (Register Stack Engine) IMO there is no access for
the caller to the passed registers after the callee returns, therefore
RESTORE_ARG* should be a nop there.  Still a review from someone with a better
RSE proficiency regarding the kernel syscalls would be useful.

Fix tested on RHEL-5 kernel-2.6.18-94.el5.ia64.  Older kernels (such as
kernel-2.6.18-53.el5.ia64) do not crash as they have a bug causing strace not
tracing the children (as strace is unable to force CLONE_PTRACE there).


Trace of the former/buggy strace:
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], __WALL, NULL) = 3932
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT PIPE TERM], NULL, 8) = 0
ptrace(PTRACE_PEEKUSER, 3932, psr, NULL) = 16
ptrace(PTRACE_PEEKUSER, 3932, r15, NULL) = 1213
ptrace(PTRACE_PEEKUSER, 3932, r10, NULL) = 0
ptrace(PTRACE_PEEKUSER, 3932, r8, NULL) = 1
ptrace(PTRACE_PEEKUSER, 3932, ar.bsp, NULL) = 0x600007ffffe7c1f0
ptrace(PTRACE_PEEKUSER, 3932, cfm, NULL) = 1167
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1c0, NULL) = 0x3d0f00
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1c8, NULL) = 0x200000000031c000
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1d0, NULL) = 0x9feb80
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1d8, NULL) = 0x2000000000d1b2d0
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1e0, NULL) = 0x2000000000d1b2d0
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1e8, NULL) = 0x2000000000d1b910
ptrace(PTRACE_PEEKUSER, 3932, ar.bsp, NULL) = 0x600007ffffe7c1f0
ptrace(PTRACE_PEEKUSER, 3932, cfm, NULL) = 1167
ptrace(PTRACE_POKEDATA, 3932, 0x600007ffffe7c1c0, 0x3d2f00) = 0
write(2, "clone2(", 7)                  = 7
ptrace(PTRACE_SYSCALL, 3932, 0x1, SIG_0) = 0
--- SIGCHLD (Child exited) @ a000000000010621 (1f400000f5c) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 3933
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT PIPE TERM], NULL, 8) = 0
write(2, "Process 3933 attached (waiting for parent)\n", 43) = 43
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], __WALL, NULL) = 3932
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT PIPE TERM], NULL, 8) = 0
ptrace(PTRACE_PEEKUSER, 3932, psr, NULL) = 16
ptrace(PTRACE_PEEKUSER, 3932, r8, NULL) = 3933
ptrace(PTRACE_PEEKUSER, 3932, r10, NULL) = 0
ptrace(PTRACE_PEEKUSER, 3932, r10, NULL) = 0
ptrace(PTRACE_PEEKUSER, 3932, r8, NULL) = 3933
ptrace(PTRACE_PEEKUSER, 3932, ar.bsp, NULL) = 0x600007ffffe7c1f0
ptrace(PTRACE_PEEKUSER, 3932, cfm, NULL) = 1167
ptrace(PTRACE_POKEDATA, 3932, 0x600007ffffe7c1c0, 0x3d0f00) = 0
ptrace(PTRACE_POKEDATA, 3932, 0x600007ffffe7c1c8, 0x200000000031c000) = 0
### New BSP is set for the new thread: vvv
ptrace(PTRACE_PEEKUSER, 3933, ar.bsp, NULL) = 0x200000000031c078
ptrace(PTRACE_PEEKUSER, 3933, cfm, NULL) = 1167
### These two lines corrupt it: vvv
ptrace(PTRACE_POKEDATA, 3933, 0x200000000031c048, 0x3d0f00) = 0
ptrace(PTRACE_POKEDATA, 3933, 0x200000000031c050, 0x200000000031c000) = 0
### These two lines corrupt it: ^^^
ptrace(PTRACE_SYSCALL, 3933, 0x1, SIG_0) = 0
--- SIGCHLD (Child exited) @ a000000000010621 (1f400000f5d) ---
write(2, "Process 3933 resumed (parent 3932 ready)\n", 41) = 41
Comment 1 Jan Kratochvil 2008-06-30 12:50:54 EDT
Created attachment 310600 [details]
Fix.
Comment 2 RHEL Product and Program Management 2008-06-30 13:00:23 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 3 Jan Kratochvil 2008-06-30 13:05:57 EDT
RHEL-4 is not affected by this bug:
kernel-2.6.9-67.EL.ia64
strace-4.5.16-1.el4.2.ia64
despite the corruption of unknown data occurs there for the child with a new stack.

Therefore this Bug it is a regression against RHEL-4.
It is not a regression since RHEL-5.1 as `-f' did not work there at all.
Comment 4 Eric Bachalo 2008-07-18 11:15:48 EDT
This problem will be fixed in 
strace RHEL 5.3 rebase to version 4.5.17

http://bugzilla.redhat.com/show_bug.cgi?id=455874
Comment 6 Roland McGrath 2008-08-28 20:26:50 EDT
built 4.5.18-1.el5
Comment 11 errata-xmlrpc 2009-01-20 17:10:04 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0233.html

Note You need to log in before you can comment on or make changes to this bug.