Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 453438

Summary: [ia64] clone2 (pthread_create) crashes with -f
Product: Red Hat Enterprise Linux 5 Reporter: Jan Kratochvil <jan.kratochvil>
Component: straceAssignee: Roland McGrath <roland>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: low    
Version: 5.2CC: mnowak
Target Milestone: rc   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 22:10:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 455874    
Bug Blocks:    
Attachments:
Description Flags
Fix. none

Description Jan Kratochvil 2008-06-30 16:50:54 UTC
Description of problem:
Currently one cannot `strace -f' multithreaded processes.

Version-Release number of selected component (if applicable):
strace-4.5.16-1.el5.1.ia64
kernel-2.6.18-94.el5.ia64

How reproducible:
Always.

Steps to Reproduce:
cat >thread.c <<EOH; gcc -o thread thread.c -pthread; strace -f ./thread
#include <pthread.h>
void *start (void *arg) { return arg; }
pthread_t thread1;
int main () { pthread_create (&thread1, NULL, start, NULL); sleep (1); return 0; }
EOH

Actual results:
execve("./thread", ["./thread"], [/* 41 vars */]) = 1
...
clone2(Process 8979 attached
child_stack=0x200000000031c000, stack_size=0x9feb80,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEAR
TID, parent_tidptr=0x2000000000d1b2d0, tls=0x2000000000d1b910,
child_tidptr=0x2000000000d1b2d0) = 8979
...
[pid  8978] nanosleep({1, 0},  <unfinished ...>
[pid  8979] --- SIGSEGV (Segmentation fault) @ 2000000000236d20 (3d0f00) ---
Process 8979 detached
+++ killed by SIGSEGV +++


Expected results:
execve("./thread", ["./thread"], [/* 41 vars */]) = 1
...
clone2(Process 9008 attached
child_stack=0x200000000031c000, stack_size=0x9feb80,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x2000000000d1b2d0, tls=0x2000000000d1b910,
child_tidptr=0x2000000000d1b2d0) = 9008
...
[pid  9007] nanosleep({1, 0},  <unfinished ...>
[pid  9008] get_robust_list(0x2000000000d1b2e0, 0x18, 0) = 0
[pid  9008] exit(0)                     = ?
Process 9008 detached
<... nanosleep resumed> {1, 0})         = 0
exit_group(0)                           = ?

Additional info:
Patch posted to upstream <strace-devel.net>:


In the case of `child_stack=0' (such as is in the case of FORK glibc call) or
for the parent of the `child_stack!=0' sample above the call RESTORE_ARG0 still
rewrites a memory not containing the modifying syscall argument, just in such
case nothing crashes.  In the case of a new stack (a child of PTHREAD_CREATE)
RESTORE_ARG0 corrupts the IN0 stacked register and glibc crashes at
glibc/sysdeps/unix/sysv/linux/ia64/clone2.S:
1:      ld8 out1=[in0],8        /* Retrieve code pointer.       */

IMO according to ia64 RSE (Register Stack Engine) IMO there is no access for
the caller to the passed registers after the callee returns, therefore
RESTORE_ARG* should be a nop there.  Still a review from someone with a better
RSE proficiency regarding the kernel syscalls would be useful.

Fix tested on RHEL-5 kernel-2.6.18-94.el5.ia64.  Older kernels (such as
kernel-2.6.18-53.el5.ia64) do not crash as they have a bug causing strace not
tracing the children (as strace is unable to force CLONE_PTRACE there).


Trace of the former/buggy strace:
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], __WALL, NULL) = 3932
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT PIPE TERM], NULL, 8) = 0
ptrace(PTRACE_PEEKUSER, 3932, psr, NULL) = 16
ptrace(PTRACE_PEEKUSER, 3932, r15, NULL) = 1213
ptrace(PTRACE_PEEKUSER, 3932, r10, NULL) = 0
ptrace(PTRACE_PEEKUSER, 3932, r8, NULL) = 1
ptrace(PTRACE_PEEKUSER, 3932, ar.bsp, NULL) = 0x600007ffffe7c1f0
ptrace(PTRACE_PEEKUSER, 3932, cfm, NULL) = 1167
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1c0, NULL) = 0x3d0f00
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1c8, NULL) = 0x200000000031c000
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1d0, NULL) = 0x9feb80
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1d8, NULL) = 0x2000000000d1b2d0
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1e0, NULL) = 0x2000000000d1b2d0
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1e8, NULL) = 0x2000000000d1b910
ptrace(PTRACE_PEEKUSER, 3932, ar.bsp, NULL) = 0x600007ffffe7c1f0
ptrace(PTRACE_PEEKUSER, 3932, cfm, NULL) = 1167
ptrace(PTRACE_POKEDATA, 3932, 0x600007ffffe7c1c0, 0x3d2f00) = 0
write(2, "clone2(", 7)                  = 7
ptrace(PTRACE_SYSCALL, 3932, 0x1, SIG_0) = 0
--- SIGCHLD (Child exited) @ a000000000010621 (1f400000f5c) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 3933
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT PIPE TERM], NULL, 8) = 0
write(2, "Process 3933 attached (waiting for parent)\n", 43) = 43
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], __WALL, NULL) = 3932
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT PIPE TERM], NULL, 8) = 0
ptrace(PTRACE_PEEKUSER, 3932, psr, NULL) = 16
ptrace(PTRACE_PEEKUSER, 3932, r8, NULL) = 3933
ptrace(PTRACE_PEEKUSER, 3932, r10, NULL) = 0
ptrace(PTRACE_PEEKUSER, 3932, r10, NULL) = 0
ptrace(PTRACE_PEEKUSER, 3932, r8, NULL) = 3933
ptrace(PTRACE_PEEKUSER, 3932, ar.bsp, NULL) = 0x600007ffffe7c1f0
ptrace(PTRACE_PEEKUSER, 3932, cfm, NULL) = 1167
ptrace(PTRACE_POKEDATA, 3932, 0x600007ffffe7c1c0, 0x3d0f00) = 0
ptrace(PTRACE_POKEDATA, 3932, 0x600007ffffe7c1c8, 0x200000000031c000) = 0
### New BSP is set for the new thread: vvv
ptrace(PTRACE_PEEKUSER, 3933, ar.bsp, NULL) = 0x200000000031c078
ptrace(PTRACE_PEEKUSER, 3933, cfm, NULL) = 1167
### These two lines corrupt it: vvv
ptrace(PTRACE_POKEDATA, 3933, 0x200000000031c048, 0x3d0f00) = 0
ptrace(PTRACE_POKEDATA, 3933, 0x200000000031c050, 0x200000000031c000) = 0
### These two lines corrupt it: ^^^
ptrace(PTRACE_SYSCALL, 3933, 0x1, SIG_0) = 0
--- SIGCHLD (Child exited) @ a000000000010621 (1f400000f5d) ---
write(2, "Process 3933 resumed (parent 3932 ready)\n", 41) = 41

Comment 1 Jan Kratochvil 2008-06-30 16:50:54 UTC
Created attachment 310600 [details]
Fix.

Comment 2 RHEL Program Management 2008-06-30 17:00:23 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Jan Kratochvil 2008-06-30 17:05:57 UTC
RHEL-4 is not affected by this bug:
kernel-2.6.9-67.EL.ia64
strace-4.5.16-1.el4.2.ia64
despite the corruption of unknown data occurs there for the child with a new stack.

Therefore this Bug it is a regression against RHEL-4.
It is not a regression since RHEL-5.1 as `-f' did not work there at all.


Comment 4 Eric Bachalo 2008-07-18 15:15:48 UTC
This problem will be fixed in 
strace RHEL 5.3 rebase to version 4.5.17

http://bugzilla.redhat.com/show_bug.cgi?id=455874

Comment 6 Roland McGrath 2008-08-29 00:26:50 UTC
built 4.5.18-1.el5

Comment 11 errata-xmlrpc 2009-01-20 22:10:04 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0233.html