Bug 963895 - defunct process not in cgroups
defunct process not in cgroups
Status: CLOSED DUPLICATE of bug 969528
Product: OpenShift Online
Classification: Red Hat
Component: Containers (Show other bugs)
2.x
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Rob Millner
libra bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-16 14:11 EDT by Kenny Woodson
Modified: 2015-05-14 19:18 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-06-07 18:54:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kenny Woodson 2013-05-16 14:11:32 EDT
Description of problem:

oo-accept-node now checks whether a process is in cgroups.  This check has found that multiple applications that have been in this state:
-----
6641      4883     1  0 May15 ?        00:00:00 /usr/bin/git-receive-pack ~/git/dbc.git/
6641      4953  4883  0 May15 ?        00:00:03 /usr/bin/git-receive-pack ~/git/dbc.git/
6641      4954  4883  0 May15 ?        00:00:00 [post-receive] <defunct>
-----
The process 4954 (line 3 above) is the one that is found to _not_ be in cgroups.  This process, as you can see is defunct and will remain in this state.  

This is a problem as it is (1) not in cgroups and (2) this has been seen at least 5-10 times in the last day.  This means that something is happening when a git receive or a git hook is called that spawns the post-receive process outside of cgroups and the parent is no longer listening for it. 

This needs to be fixed so that oo-accept-node does not find it and the user's process to post-receive doesn't go defunct.

Version-Release number of selected component (if applicable):
2.0.27.1


How reproducible:

I'm not sure how to reproduce this but it is occurring in production.

Steps to Reproduce:
1. 
2.
3.
  
Actual results:

A defunct process that is not in cgroups is left hanging around.

Expected results:

This process should be in cgroups and we should look into why this is happening for our users.

Additional info:
Comment 1 Rob Millner 2013-05-24 17:20:18 EDT
I think the core issue is finding out what caused post-receive to go catatonic.

Defunct (zombie) processes only exist in the root cgroup - moving there if they start in another one.

Take the following C program...
#include <unistd.h>

int main (int argc, char **argv) {
  if (fork() == 0) {
    _exit(0);
  }
  for(;;) {
    pause();
  }
}



Run with cgexec...
# cgexec -g cpu,cpuacct,memory,freezer,net_cls:/openshift/725422378167871496781824 ./a.out

And you get the following result...
# ps 18371 18372
  PID TTY      STAT   TIME COMMAND
18371 pts/0    S+     0:00 ./a.out
18372 pts/0    Z+     0:00 [a.out] <defunct>

# cat /proc/18371/cgroup 
1:net_cls,freezer,memory,cpuacct,cpu:/openshift/725422378167871496781824

# cat /proc/18372/cgroup
1:net_cls,freezer,memory,cpuacct,cpu:/

# cgclassify -g net_cls,freezer,memory,cpuacct,cpu:/openshift/725422378167871496781824 18372
Error changing group of pid 18372: No such process

# ps 18371 18372
  PID TTY      STAT   TIME COMMAND
18371 pts/0    S+     0:00 ./a.out
18372 pts/0    Z+     0:00 [a.out] <defunct>
Comment 2 Rob Millner 2013-05-28 15:20:06 EDT
The behaviour of git-receive-pack re-parenting to init seems to be an odd edge case where the end-user hits "ctrl-c" during a git push at just the right time.  Its been very difficult to reproduce on devenv - we accidentally saw it exactly once.

I think the best solution is to provide a script which detects processes which should never be owned by init (ex: git-receive-pack owned by a gear) and terminates them.

Will that work for ops?
Comment 3 Rob Millner 2013-06-07 18:54:28 EDT
Please re-open if ops would like a detached process detector.  Thanks!

*** This bug has been marked as a duplicate of bug 969528 ***

Note You need to log in before you can comment on or make changes to this bug.