Bug 835838 - Process does not exist, but "/bin/kill -0" and "kill -0" return 0
Process does not exist, but "/bin/kill -0" and "kill -0" return 0
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: util-linux-ng (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Karel Zak
Depends On:
  Show dependency treegraph
Reported: 2012-06-27 05:13 EDT by David Tonhofer
Modified: 2012-06-30 13:57 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-06-28 06:51:49 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Bash session (7.47 KB, text/plain)
2012-06-27 05:13 EDT, David Tonhofer
no flags Details

  None (edit)
Description David Tonhofer 2012-06-27 05:13:14 EDT
Created attachment 594724 [details]
Bash session

Description of problem:

A process with PID 6156 has been forcefully terminated. The process is not in the process list.

However, "kill -0" insists that the process exists and does so as long as a SIGTERM has not been sent.

See bash session attached.

Version-Release number of selected component (if applicable):


How reproducible:

Not reproduced yet.

Expected results:

No invisible process
Comment 2 Jaromír Cápík 2012-06-28 05:59:33 EDT
Hello David.

The kill command is a bit tricky. It exists in two separate upstream sources (procps and util-linux) and it's currently disabled in the procps build to avoid conflicts. As it's shipped with util-linux only, I'm going to change the component to util-linux.

Comment 3 Karel Zak 2012-06-28 06:51:49 EDT

as you can see from your strace output there is no issue with kill(1).

It seems that the 6156 in your example is thread ID and the TID is
possible to use for kill(2) syscall.

The threads are not included in the "ls /proc" output (not included in readdir()
) to avoid performance problems on many systems with huge number
of threads, but you can address the threads directly (e.g. ls /proc/<TID>).

For example gnome-terminal with four threads:

$ ps -eLf | grep 2554
kzak      2554  1994  2554  0    4 Jun27 ?        00:00:02 gnome-terminal 
kzak      2554  1994  2557  0    4 Jun27 ?        00:00:00 gnome-terminal 
kzak      2554  1994  2558  0    4 Jun27 ?        00:00:00 gnome-terminal 
kzak      2554  1994  2561  0    4 Jun27 ?        00:00:00 gnome-terminal 

The important is LWP column (thread ID).

$ ls /proc/2554/task/                                                      
2554  2557  2558  2561

Let's play with thread 2557:

$ cat /proc/2554/task/2557/comm 

$ ls /proc | grep 2557   # <<< nothing !

but the task is accessible from top-level /proc directory if
full path is specified

$ cat /proc/2557/comm                                                     

... and now kill:

$ strace -e kill kill -0 2557                                               
kill(2557, SIG_0)                       = 0
+++ exited with 0 +++
success. Not a bug from my point of view.
Comment 4 David Tonhofer 2012-06-30 13:57:26 EDT
Hi Karel,

> It seems that the 6156 in your example is thread ID and the TID is possible to use for kill(2) syscall.

Forehead slap! I didn't think of this at all. Good grief. Maybe I'm too old school. 

The problem occurred because one of our scripts was of the opinion that the PID an earlier instance had written to a pidfile/lockfile was still valid and it just refused to continue. It seems that the space of PIDs is becoming somewhat constrained now that it is shared with TIDs. Time to move PID/TID to a sparsely populated 128bit space. Maybe.

On the other hand, there is still a problem somewhere..

- The strace shows a call to "kill" but:
- The kill(2) manpage doesn't mention threads at all.
- There is a specially designed tgkill(2) to signal threads.
  Its manpage says; "By contrast, kill(2) can only be used to send a 
  signal to a process (i.e., thread group) as a whole, and the signal will be
  delivered to an arbitrary thread within that process.)"

So either the kill(2) manpage and the tgkill(2) are wrong / need to be completed or there is an implementation problem with the kill syscall.

Best regards,

-- David

Note You need to log in before you can comment on or make changes to this bug.