Description of problem: oo-accept-node now checks whether a process is in cgroups. This check has found that multiple applications that have been in this state: ----- 6641 4883 1 0 May15 ? 00:00:00 /usr/bin/git-receive-pack ~/git/dbc.git/ 6641 4953 4883 0 May15 ? 00:00:03 /usr/bin/git-receive-pack ~/git/dbc.git/ 6641 4954 4883 0 May15 ? 00:00:00 [post-receive] <defunct> ----- The process 4954 (line 3 above) is the one that is found to _not_ be in cgroups. This process, as you can see is defunct and will remain in this state. This is a problem as it is (1) not in cgroups and (2) this has been seen at least 5-10 times in the last day. This means that something is happening when a git receive or a git hook is called that spawns the post-receive process outside of cgroups and the parent is no longer listening for it. This needs to be fixed so that oo-accept-node does not find it and the user's process to post-receive doesn't go defunct. Version-Release number of selected component (if applicable): 2.0.27.1 How reproducible: I'm not sure how to reproduce this but it is occurring in production. Steps to Reproduce: 1. 2. 3. Actual results: A defunct process that is not in cgroups is left hanging around. Expected results: This process should be in cgroups and we should look into why this is happening for our users. Additional info:
I think the core issue is finding out what caused post-receive to go catatonic. Defunct (zombie) processes only exist in the root cgroup - moving there if they start in another one. Take the following C program... #include <unistd.h> int main (int argc, char **argv) { if (fork() == 0) { _exit(0); } for(;;) { pause(); } } Run with cgexec... # cgexec -g cpu,cpuacct,memory,freezer,net_cls:/openshift/725422378167871496781824 ./a.out And you get the following result... # ps 18371 18372 PID TTY STAT TIME COMMAND 18371 pts/0 S+ 0:00 ./a.out 18372 pts/0 Z+ 0:00 [a.out] <defunct> # cat /proc/18371/cgroup 1:net_cls,freezer,memory,cpuacct,cpu:/openshift/725422378167871496781824 # cat /proc/18372/cgroup 1:net_cls,freezer,memory,cpuacct,cpu:/ # cgclassify -g net_cls,freezer,memory,cpuacct,cpu:/openshift/725422378167871496781824 18372 Error changing group of pid 18372: No such process # ps 18371 18372 PID TTY STAT TIME COMMAND 18371 pts/0 S+ 0:00 ./a.out 18372 pts/0 Z+ 0:00 [a.out] <defunct>
The behaviour of git-receive-pack re-parenting to init seems to be an odd edge case where the end-user hits "ctrl-c" during a git push at just the right time. Its been very difficult to reproduce on devenv - we accidentally saw it exactly once. I think the best solution is to provide a script which detects processes which should never be owned by init (ex: git-receive-pack owned by a gear) and terminates them. Will that work for ops?
Please re-open if ops would like a detached process detector. Thanks! *** This bug has been marked as a duplicate of bug 969528 ***