Description of problem:
We are using the command
ps -e | grep $PID >/dev/null
to determine if the process $PID is still alive.
We found that sometimes this command exits with a nonzero
status (which means $PID is not found in the output of
ps -e) even though the process $PID is alive.
Steps to Reproduce:
1. Start a process.
2. The primordial thread of the process writes the pid to a file.
3. Read the file to get the pid.
4. Execute the command
ps -e | grep $PID >/dev/null || echo "process not detectable"
5. Execute the command
to terminate the process (even if it is not alive).
5. The process has a signal handler for SIGTERM that writes
"SIGTERM received\n" to fd 1.
Sometimes, we see both the "process not detectable" and
"SIGTERM received" messages, indicating that ps -e fails
to list a process that is still alive.
This problem occurs very rarely on Linux. It never occurs
on other Unix platforms (Solaris, HP-UX, AIX, and OSF1).
The process we are monitoring is a multithreaded process
that creates 8 threads. I don't know if this piece of
information is relevant.
We are now using the command
kill -0 $PID >/dev/null 2>/dev/null || echo "process not detectable"
to detect whether the process $PID is still alive. This
method has been reliable on Linux and other Unix platforms.
procps just shows what the kernel tells it; I suspect that if there is
a bug here, it's in the kernel.
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases,
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/