Bug 218410 - non-main task's waitpid exited status lost when tracing
Summary: non-main task's waitpid exited status lost when tracing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: frysk
Version: 5.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Andrew Cagney
QA Contact: Len DiMaggio
URL:
Whiteboard:
Depends On:
Blocks: 173278
TreeView+ depends on / blocked
 
Reported: 2006-12-05 05:46 UTC by Andrew Cagney
Modified: 2007-11-30 22:07 UTC (History)
9 users (show)

Fixed In Version: RHEA-2007-0592
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-07 18:05:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2007:0592 0 normal SHIPPED_LIVE frysk enhancement update 2007-10-30 22:45:30 UTC
Sourceware 3486 0 None None None Never

Description Andrew Cagney 2006-12-05 05:46:28 UTC
The non-main task's exited waitpid status gets lost.  In the below, the exiting
status is seen but nothing further.

1998.1999: received signal 7 (Bus error)
1998.1999: exit this thread
5-Dec-06 12:23:35 AM frysk.proc.LinuxHost$PollWaitOnSigChld execute
FINE: {frysk.proc.LinuxHost$PollWaitOnSigChld@1dcc60,sig=Sig_CHLD} execute

5-Dec-06 12:23:35 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 1999 status 0x6057f WIFSTOPPED/EXIT 5 (Trace/breakpoint
trap)

5-Dec-06 12:23:35 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 0 errno 0 (Success)

5-Dec-06 12:23:35 AM frysk.proc.LinuxHost$PollWaitOnSigChld$5 getTask
FINE: {TaskId,1999} exitEvent

5-Dec-06 12:23:35 AM frysk.proc.Host get
FINE: {frysk.proc.LinuxHost@13a0c0,state=running} get TaskId

5-Dec-06 12:23:35 AM frysk.proc.TaskState$Running handleTerminatingEvent
FINE: {frysk.proc.LinuxTask@20be70,pid=1998,tid=1999,state=running}
handleTerminatingEvent

5-Dec-06 12:23:35 AM frysk.proc.LinuxTask sendContinue
FINE: {frysk.proc.LinuxTask@20be70,pid=1998,tid=1999,state=running} sendContinue

5-Dec-06 12:23:35 AM frysk.proc.LinuxHost$PollWaitOnSigChld execute
FINE: {frysk.proc.LinuxHost$PollWaitOnSigChld@1dcc60,sig=Sig_CHLD} execute

5-Dec-06 12:23:35 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 0 errno 0 (Success)

5-Dec-06 12:23:40 AM frysk.event.EventLoop$2$Timeout execute
FINE:
{{frysk.event.EventLoop$2$Timeout@28ded8,timeMillis=1165296220425,periodMillis=0},expiredfalse}
execute

contrast this with a working trace:

11831.11832: received signal 7 (Bus error)
11831.11832: exit this thread
5-Dec-06 12:32:56 AM frysk.proc.LinuxHost$PollWaitOnSigChld execute
FINE: {frysk.proc.LinuxHost$PollWaitOnSigChld@2d5880,sig=Sig_CHLD} execute

5-Dec-06 12:32:56 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 11832 status 0x6057f WIFSTOPPED/EXIT 5
(Trace/breakpoint trap)

5-Dec-06 12:32:56 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 0 errno 0 (Success)

5-Dec-06 12:32:56 AM frysk.proc.LinuxHost$PollWaitOnSigChld$5 getTask
FINE: {TaskId,11832} exitEvent

5-Dec-06 12:32:56 AM frysk.proc.Host get
FINE: {frysk.proc.LinuxHost@2176c0,state=running} get TaskId

5-Dec-06 12:32:56 AM frysk.proc.TaskState$Running handleTerminatingEvent
FINE: {frysk.proc.LinuxTask@21baf0,pid=11831,tid=11832,state=running}
handleTerminatingEvent

5-Dec-06 12:32:56 AM frysk.proc.LinuxTask sendContinue
FINE: {frysk.proc.LinuxTask@21baf0,pid=11831,tid=11832,state=running} sendContinue

5-Dec-06 12:32:56 AM frysk.proc.LinuxHost$PollWaitOnSigChld execute
FINE: {frysk.proc.LinuxHost$PollWaitOnSigChld@2d5880,sig=Sig_CHLD} execute

5-Dec-06 12:32:56 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 11832 status 0x0 WIFEXITED 0 (exit status)

5-Dec-06 12:32:56 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 0 errno 0 (Success)

5-Dec-06 12:32:56 AM frysk.proc.LinuxHost$PollWaitOnSigChld$5 getTask
FINE: {TaskId,11832} terminated

Comment 1 Andrew Cagney 2006-12-05 05:48:11 UTC
Frysk bug: http://sourceware.org/bugzilla/show_bug.cgi?id=3486

Comment 3 Andrew Cagney 2006-12-05 14:41:19 UTC
(In reply to comment #2)
> appears to show only WNOHANG calls.
> that is racy.  after SIGCHLD, some short period may pass before wait succeeds.
> your guarantee is that a blocking wait will block a very short time, not that a
> WNOHANG wait will succeed immediately.

Que?

Comment 5 Andrew Cagney 2007-03-23 21:36:51 UTC
Was POSIX documentation explaining SIGCHLD and its querks with waitpid ever located?

The assumption that SIGCHLD is always posted after the wait status was recorded
- i.e., SIGIO behavior - is wrong?

Does:

-> SIGCHLD remain pending when waitpid events are pending; allowing one waitpid
read per signal to work?

-> SIGCHLD get withdrawn when all waiptpid events have been consumed; allowing
more efficient draining of waitpid events?

Testing shows that at least the second isn't true and the first, given that the
signal is not counting, likely isn't either.

Comment 6 Andrew Cagney 2007-04-04 19:55:53 UTC
Rwrite to frysk's event-loop to use a blocking waitpid call will prevent problem
of occasional hangs when monitoring a process.  New code currently being tested
upstream.

Testing included in frysk's testsuite.

Comment 7 RHEL Program Management 2007-04-04 20:06:15 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 Andrew Cagney 2007-04-09 18:13:00 UTC
Fixes committed upstream, note that two tests - testCloneThanKillAttached and
testDeleteAttached have been enabled in the testsuite and are now expected to pass.

Index: frysk-core/frysk/proc/ChangeLog
2007-04-09  Andrew Cagney  <cagney>

        * TestProcTasksObserver.java (testCloneThenKillAttached)
        (testDeleteAttached): Remove brokenIfUtraceXXX due to 3486.
        * Manager.java (usePoll): Set to false, enable WaitEventLoop.

Index: frysk-imports/frysk/sys/ChangeLog
2007-04-09  Andrew Cagney  <cagney>

        * cni/Wait.cxx (log): Add "logger" parameter, update calls.
        (waitForEvent): Delete.
        (waitAll): Use "log".  Replace loop calling waitForEvent with
        multiple waitpid calls.


Comment 13 errata-xmlrpc 2007-11-07 18:05:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2007-0592.html



Note You need to log in before you can comment on or make changes to this bug.