Bug 218410 - non-main task's waitpid exited status lost when tracing
non-main task's waitpid exited status lost when tracing
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: frysk (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Andrew Cagney
Len DiMaggio
:
Depends On:
Blocks: 173278
  Show dependency treegraph
 
Reported: 2006-12-05 00:46 EST by Andrew Cagney
Modified: 2007-11-30 17:07 EST (History)
9 users (show)

See Also:
Fixed In Version: RHEA-2007-0592
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 13:05:47 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Sourceware 3486 None None None Never

  None (edit)
Description Andrew Cagney 2006-12-05 00:46:28 EST
The non-main task's exited waitpid status gets lost.  In the below, the exiting
status is seen but nothing further.

1998.1999: received signal 7 (Bus error)
1998.1999: exit this thread
5-Dec-06 12:23:35 AM frysk.proc.LinuxHost$PollWaitOnSigChld execute
FINE: {frysk.proc.LinuxHost$PollWaitOnSigChld@1dcc60,sig=Sig_CHLD} execute

5-Dec-06 12:23:35 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 1999 status 0x6057f WIFSTOPPED/EXIT 5 (Trace/breakpoint
trap)

5-Dec-06 12:23:35 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 0 errno 0 (Success)

5-Dec-06 12:23:35 AM frysk.proc.LinuxHost$PollWaitOnSigChld$5 getTask
FINE: {TaskId,1999} exitEvent

5-Dec-06 12:23:35 AM frysk.proc.Host get
FINE: {frysk.proc.LinuxHost@13a0c0,state=running} get TaskId

5-Dec-06 12:23:35 AM frysk.proc.TaskState$Running handleTerminatingEvent
FINE: {frysk.proc.LinuxTask@20be70,pid=1998,tid=1999,state=running}
handleTerminatingEvent

5-Dec-06 12:23:35 AM frysk.proc.LinuxTask sendContinue
FINE: {frysk.proc.LinuxTask@20be70,pid=1998,tid=1999,state=running} sendContinue

5-Dec-06 12:23:35 AM frysk.proc.LinuxHost$PollWaitOnSigChld execute
FINE: {frysk.proc.LinuxHost$PollWaitOnSigChld@1dcc60,sig=Sig_CHLD} execute

5-Dec-06 12:23:35 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 0 errno 0 (Success)

5-Dec-06 12:23:40 AM frysk.event.EventLoop$2$Timeout execute
FINE:
{{frysk.event.EventLoop$2$Timeout@28ded8,timeMillis=1165296220425,periodMillis=0},expiredfalse}
execute

contrast this with a working trace:

11831.11832: received signal 7 (Bus error)
11831.11832: exit this thread
5-Dec-06 12:32:56 AM frysk.proc.LinuxHost$PollWaitOnSigChld execute
FINE: {frysk.proc.LinuxHost$PollWaitOnSigChld@2d5880,sig=Sig_CHLD} execute

5-Dec-06 12:32:56 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 11832 status 0x6057f WIFSTOPPED/EXIT 5
(Trace/breakpoint trap)

5-Dec-06 12:32:56 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 0 errno 0 (Success)

5-Dec-06 12:32:56 AM frysk.proc.LinuxHost$PollWaitOnSigChld$5 getTask
FINE: {TaskId,11832} exitEvent

5-Dec-06 12:32:56 AM frysk.proc.Host get
FINE: {frysk.proc.LinuxHost@2176c0,state=running} get TaskId

5-Dec-06 12:32:56 AM frysk.proc.TaskState$Running handleTerminatingEvent
FINE: {frysk.proc.LinuxTask@21baf0,pid=11831,tid=11832,state=running}
handleTerminatingEvent

5-Dec-06 12:32:56 AM frysk.proc.LinuxTask sendContinue
FINE: {frysk.proc.LinuxTask@21baf0,pid=11831,tid=11832,state=running} sendContinue

5-Dec-06 12:32:56 AM frysk.proc.LinuxHost$PollWaitOnSigChld execute
FINE: {frysk.proc.LinuxHost$PollWaitOnSigChld@2d5880,sig=Sig_CHLD} execute

5-Dec-06 12:32:56 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 11832 status 0x0 WIFEXITED 0 (exit status)

5-Dec-06 12:32:56 AM frysk.sys.Wait waitAllNoHang
FINE: frysk.sys.Wait pid 0 errno 0 (Success)

5-Dec-06 12:32:56 AM frysk.proc.LinuxHost$PollWaitOnSigChld$5 getTask
FINE: {TaskId,11832} terminated
Comment 1 Andrew Cagney 2006-12-05 00:48:11 EST
Frysk bug: http://sourceware.org/bugzilla/show_bug.cgi?id=3486
Comment 3 Andrew Cagney 2006-12-05 09:41:19 EST
(In reply to comment #2)
> appears to show only WNOHANG calls.
> that is racy.  after SIGCHLD, some short period may pass before wait succeeds.
> your guarantee is that a blocking wait will block a very short time, not that a
> WNOHANG wait will succeed immediately.

Que?
Comment 5 Andrew Cagney 2007-03-23 17:36:51 EDT
Was POSIX documentation explaining SIGCHLD and its querks with waitpid ever located?

The assumption that SIGCHLD is always posted after the wait status was recorded
- i.e., SIGIO behavior - is wrong?

Does:

-> SIGCHLD remain pending when waitpid events are pending; allowing one waitpid
read per signal to work?

-> SIGCHLD get withdrawn when all waiptpid events have been consumed; allowing
more efficient draining of waitpid events?

Testing shows that at least the second isn't true and the first, given that the
signal is not counting, likely isn't either.
Comment 6 Andrew Cagney 2007-04-04 15:55:53 EDT
Rwrite to frysk's event-loop to use a blocking waitpid call will prevent problem
of occasional hangs when monitoring a process.  New code currently being tested
upstream.

Testing included in frysk's testsuite.
Comment 7 RHEL Product and Program Management 2007-04-04 16:06:15 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 8 Andrew Cagney 2007-04-09 14:13:00 EDT
Fixes committed upstream, note that two tests - testCloneThanKillAttached and
testDeleteAttached have been enabled in the testsuite and are now expected to pass.

Index: frysk-core/frysk/proc/ChangeLog
2007-04-09  Andrew Cagney  <cagney@redhat.com>

        * TestProcTasksObserver.java (testCloneThenKillAttached)
        (testDeleteAttached): Remove brokenIfUtraceXXX due to 3486.
        * Manager.java (usePoll): Set to false, enable WaitEventLoop.

Index: frysk-imports/frysk/sys/ChangeLog
2007-04-09  Andrew Cagney  <cagney@redhat.com>

        * cni/Wait.cxx (log): Add "logger" parameter, update calls.
        (waitForEvent): Delete.
        (waitAll): Use "log".  Replace loop calling waitForEvent with
        multiple waitpid calls.
Comment 13 errata-xmlrpc 2007-11-07 13:05:47 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2007-0592.html

Note You need to log in before you can comment on or make changes to this bug.