Bug 963895 - defunct process not in cgroups
Summary: defunct process not in cgroups
Keywords:
Status: CLOSED DUPLICATE of bug 969528
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Rob Millner
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-16 18:11 UTC by Kenny Woodson
Modified: 2015-05-14 23:18 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-06-07 22:54:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 957883 0 medium CLOSED Gear remnants (<gear dir>/git) are being leftover from gear deletes (NEW 4/29/13) 2021-02-22 00:41:40 UTC

Internal Links: 957883

Description Kenny Woodson 2013-05-16 18:11:32 UTC
Description of problem:

oo-accept-node now checks whether a process is in cgroups.  This check has found that multiple applications that have been in this state:
-----
6641      4883     1  0 May15 ?        00:00:00 /usr/bin/git-receive-pack ~/git/dbc.git/
6641      4953  4883  0 May15 ?        00:00:03 /usr/bin/git-receive-pack ~/git/dbc.git/
6641      4954  4883  0 May15 ?        00:00:00 [post-receive] <defunct>
-----
The process 4954 (line 3 above) is the one that is found to _not_ be in cgroups.  This process, as you can see is defunct and will remain in this state.  

This is a problem as it is (1) not in cgroups and (2) this has been seen at least 5-10 times in the last day.  This means that something is happening when a git receive or a git hook is called that spawns the post-receive process outside of cgroups and the parent is no longer listening for it. 

This needs to be fixed so that oo-accept-node does not find it and the user's process to post-receive doesn't go defunct.

Version-Release number of selected component (if applicable):
2.0.27.1


How reproducible:

I'm not sure how to reproduce this but it is occurring in production.

Steps to Reproduce:
1. 
2.
3.
  
Actual results:

A defunct process that is not in cgroups is left hanging around.

Expected results:

This process should be in cgroups and we should look into why this is happening for our users.

Additional info:

Comment 1 Rob Millner 2013-05-24 21:20:18 UTC
I think the core issue is finding out what caused post-receive to go catatonic.

Defunct (zombie) processes only exist in the root cgroup - moving there if they start in another one.

Take the following C program...
#include <unistd.h>

int main (int argc, char **argv) {
  if (fork() == 0) {
    _exit(0);
  }
  for(;;) {
    pause();
  }
}



Run with cgexec...
# cgexec -g cpu,cpuacct,memory,freezer,net_cls:/openshift/725422378167871496781824 ./a.out

And you get the following result...
# ps 18371 18372
  PID TTY      STAT   TIME COMMAND
18371 pts/0    S+     0:00 ./a.out
18372 pts/0    Z+     0:00 [a.out] <defunct>

# cat /proc/18371/cgroup 
1:net_cls,freezer,memory,cpuacct,cpu:/openshift/725422378167871496781824

# cat /proc/18372/cgroup
1:net_cls,freezer,memory,cpuacct,cpu:/

# cgclassify -g net_cls,freezer,memory,cpuacct,cpu:/openshift/725422378167871496781824 18372
Error changing group of pid 18372: No such process

# ps 18371 18372
  PID TTY      STAT   TIME COMMAND
18371 pts/0    S+     0:00 ./a.out
18372 pts/0    Z+     0:00 [a.out] <defunct>

Comment 2 Rob Millner 2013-05-28 19:20:06 UTC
The behaviour of git-receive-pack re-parenting to init seems to be an odd edge case where the end-user hits "ctrl-c" during a git push at just the right time.  Its been very difficult to reproduce on devenv - we accidentally saw it exactly once.

I think the best solution is to provide a script which detects processes which should never be owned by init (ex: git-receive-pack owned by a gear) and terminates them.

Will that work for ops?

Comment 3 Rob Millner 2013-06-07 22:54:28 UTC
Please re-open if ops would like a detached process detector.  Thanks!

*** This bug has been marked as a duplicate of bug 969528 ***


Note You need to log in before you can comment on or make changes to this bug.