Bug 455078

Summary: [4.7] strace -f fails to follow vfork() processes on ia64 - hangs instead - possible kernel bug?
Product: Red Hat Enterprise Linux 4 Reporter: Jan Kratochvil <jan.kratochvil>
Component: straceAssignee: Jeff Law <law>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.7CC: dvlasenk, jan.kratochvil, jwest, mnewsome, mnowak, riek, tao
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: ia64   
OS: Linux   
URL: http://sourceforge.net/mailarchive/message.php?msg_name=20080630164049.GA19501%40host0.dyn.jankratochvil.net
Whiteboard: PM_RHEL4_8
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-14 20:44:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 452501    
Bug Blocks: 513180    
Attachments:
Description Flags
Simple vfork(2) testcase.
none
Proposed backported fix from 4.5.17 none

Description Jan Kratochvil 2008-07-11 20:34:22 UTC
+++ This bug was initially created as a clone of Bug #452501 +++

`strace -f ./vfork' hangs on ia64.
(`-F' is not needed on GNU/Linux, `-f' is enough to trace vfork(2).)

RHEL4-U7-re20080711.0
kernel-2.6.9-78.EL.ia64
strace-4.5.16-1.el4.2.ia64

-- Additional comment from vmayatsk on 2008-06-26 10:35 EST --
This is because IA64 uses clone() instead of vfork(). I played a bit with strace
sources and found that function setbpt() (file util.c) has flag CLONE_VFORK set
in tcp->inst[0] for the case of SYS_clone/clone2, but hasn't it in the case of
SYS_fork/vfork. When I manually remove CLONE_VFORK from inst[0] (case
SYS_clone), strace -f on IA64 works just like as on x86_64. When I manually add
0x4000 (CLONE_VFORK) to arg0 (case SYS_fork) it hangs on x86_64 just like as on
IA64. I'm not an expert in strace, but seems it's a bug in strace utility.

-- Additional comment from jan.kratochvil on 2008-07-08 14:15 EST --
[...]
The ia64 threads fix is unrelated but posted here:
http://sourceforge.net/mailarchive/message.php?msg_name=20080630164049.GA19501%40host0.dyn.jankratochvil.net

###############################################################################

RHEL4-U7-re20080711.0 ia64
kernel-2.6.9-78.EL.ia64
strace-4.5.16-1.el4.2.ia64
$ strace -f ./vfork
...
clone(Process 16684 attached (waiting for parent)
[ hang ]

RHEL4-U7-re20080711.0 ia64
kernel-2.6.9-78.EL.ia64
strace-4.5.16-1.el4.2.ia64 + the patch above
[ This trace was copied from RHEL-5.2 Bug 452501 but it looks the same. ]
$ ./strace -f ./vfork
...
clone(Process 20396 attached
child_stack=0, flags=CLONE_VM|CLONE_VFORK|SIGCHLD) = 20396
[pid 20395] getpid()                    = 20395
...
[pid 20395] write(1, "20396 pid=20395\n", 1620396 pid=20395
) = 16
...
[pid 20395] nanosleep({1, 0},  <unfinished ...>
[pid 20396] write(1, "0 pid=20395\n", 120 pid=20395
) = 12
...
[pid 20396] nanosleep({1, 0},  <unfinished ...>
[pid 20395] <... nanosleep resumed> {1, 0}) = 0
[pid 20395] execve("/bin/true", ["/bin/true"...], [/* 47 vars */] <unfinished ...>
[pid 20396] <... nanosleep resumed> {1, 0}) = 0
[pid 20396] execve("/bin/true", ["/bin/true"...], [/* 47 vars */]) = 1
[pid 20395] <... execve resumed> )      = 1
...
[pid 20395] fstat(3,  <unfinished ...>
[pid 20396] exit_group(0)               = ?
Process 20396 detached
<... fstat resumed> {st_mode=S_IFREG|0644, st_size=58727440, ...}) = 0
--- SIGCHLD (Child exited) @ a000000000010621 (4fac) ---
mmap(NULL, 58727440, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2000000000308000
close(3)                                = 0
close(1)                                = 0
exit_group(0)                           = ?

Comment 1 Jan Kratochvil 2008-07-11 20:34:22 UTC
Created attachment 311618 [details]
Simple vfork(2) testcase.

Comment 3 Denys Vlasenko 2008-09-11 14:58:12 UTC
Have no RHEL4 ia64 machine at the moment to experiment with, but on RHEL5:

strace-4.5.16-1.el5.1 - exhibits this bug,

strace-4.5.16-1.el5_2.2 - does not.

Just FYI.

Comment 4 Denys Vlasenko 2008-10-07 14:48:00 UTC
A few more data points:

On RHEL4-U7, installed version of strace is strace-4.5.16-1.el4.2 and it exhibits the bug.

I just tested that last upstream release - strace-4.5.18 - can be successfully built on RHEL4-U7 and it does not exhibit this bug.

Comment 5 RHEL Program Management 2008-10-31 16:48:09 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 11 Denys Vlasenko 2009-06-12 10:33:49 UTC
Created attachment 347532 [details]
Proposed backported fix from 4.5.17

Tested to work with the testcase attached by Jan
on ia64 Red Hat Enterprise Linux AS release 4 (Nahant Update 8)

Comment 12 Roland McGrath 2009-06-16 22:42:26 UTC
That backport looks fine to me.