Bug 221351 - Kill orphan processes
Kill orphan processes
Status: CLOSED CURRENTRELEASE
Product: Fedora Hosted Projects
Classification: Retired
Component: mock (Show other bugs)
unspecified
All Linux
high Severity high
: ---
: ---
Assigned To: Clark Williams
: Patch
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-01-03 19:04 EST by Jan Kratochvil
Modified: 2013-01-09 23:09 EST (History)
2 users (show)

See Also:
Fixed In Version: mock-0.9.7-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-02-28 18:21:18 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fix implementing "mock-helper orphanskill <chrootdir>". (6.21 KB, patch)
2007-01-03 19:04 EST, Jan Kratochvil
no flags Details | Diff
Trivia .src.rpm for the bug reproducibility. (1.57 KB, application/octet-stream)
2007-01-03 19:06 EST, Jan Kratochvil
no flags Details
Fix implementing "mock-helper orphanskill <chrootdir>". (update) (6.13 KB, patch)
2007-01-03 19:18 EST, Jan Kratochvil
no flags Details | Diff

  None (edit)
Description Jan Kratochvil 2007-01-03 19:04:42 EST
Description of problem:
Currently rpm building by mock can get stuck if some stale processes remain
running. mock(1) tries to read all the input and these orphans have their fds
kept open.

Version-Release number of selected component (if applicable):
mock-0.6.8-4.i386

How reproducible:
Always.

Steps to Reproduce:
1. mock --debug -r fedora-6-i386-core --no-clean rebuild
/tmp/gecko-libs-1.8.1.1-0.src.rpm
  
Actual results:
build
DEBUG: Executing /usr/sbin/mock-helper chroot
/var/lib/mock/fedora-6-i386-core/root /sbin/runuser - root -c "cd
/;/sbin/runuser -c 'rpmbuild --rebuild  --target i386 --nodeps
/builddir/build/SRPMS/gecko-libs-1.8.1.1-0.fc6.src.rpm' mockbuild"
[stuck]
With ps(1) showing:
 3044 pts/7    S+     0:00              \_ /usr/bin/python -tt /usr/bin/mock
--debug -r fedora-6-i386-core --no-clean rebuild 
/home/jkratoch/src/rpm/SRPMS/gecko-libs-1.8.1.1-0.src.rpm
 3415 pts/7    Z+     0:00                  \_ [sh] <defunct>
 3516 ?        S      0:00 sleep 24h
and ls(1):
l-wx------ 1 jkratoch jkratoch 64 Jan  4 01:03 /proc/3516/fd/1 -> pipe:[3181267]
l-wx------ 1 jkratoch jkratoch 64 Jan  4 01:03 /proc/3516/fd/2 -> pipe:[3181267]
lr-x------ 1 jkratoch jkratoch 64 Jan  4 01:03 /proc/3044/fd/5 -> pipe:[3181267]
and "strace -s200 -f -q -p 3044":
read(5, 

Expected results:
Finished/closed build.

Additional info:
With the patch the output contains the informational debug line(s):
+ cd /builddir/build/BUILD
+ exit 0
mock-helper: warning: Killed -9 orphan PID 14127: sleep 24h
ending
Comment 1 Jan Kratochvil 2007-01-03 19:04:42 EST
Created attachment 144756 [details]
Fix implementing "mock-helper orphanskill <chrootdir>".
Comment 2 Jan Kratochvil 2007-01-03 19:06:24 EST
Created attachment 144757 [details]
Trivia .src.rpm for the bug reproducibility.

The relevant content is the .spec part:
%build
sleep 24h &
Comment 4 Jan Kratochvil 2007-01-03 19:18:45 EST
Created attachment 144758 [details]
Fix implementing "mock-helper orphanskill <chrootdir>". (update)

(whitespacing)
Comment 8 Andrew Cagney 2007-07-10 09:28:56 EDT
Jan,
As a workaround, is the test disabled?
Comment 9 Jan Kratochvil 2007-07-10 09:36:14 EDT
Andrew,

It affects various testcases, not sure about their count:
29126 ?        T      0:10
/builddir/build/BUILD/gdb-6.3/build-ppc64-redhat-linux-gnu/gdb/testsuite/gdb.threads/watchthreads2
 6968 ?        T      0:00
/builddir/build/BUILD/gdb-6.3/build-ppc-redhat-linux-gnu/gdb/testsuite/gdb.threads/bt-clone-stop

And in many cases I just even can't figure out which testcases caused it:
28906 ?        T      0:00
/builddir/build/BUILD/gdb-6.3/build-x86_64-redhat-linux-gnu/gdb/testsuite/../../gdb/gdb
-nw -nx
28907 ?        Z      0:00  \_ [gdb] <defunct>
29194 ?        T      0:00
/builddir/build/BUILD/gdb-6.3/build-x86_64-redhat-linux-gnu/gdb/testsuite/../../gdb/gdb
-nw -nx
29195 ?        Z      0:00  \_ [gdb] <defunct>

That `bt-clone-stop' testcase is mine and the testcase source is perfectly valid
so there must be bug in the testsuite framework.
Still the testcases just spawn various asynchronous processes by:
    set testpid [eval exec $binfile &]

And the TCL/expect/testsuite has no clue which processes were forked asynchronously.
Comment 10 Clark Williams 2007-07-10 12:36:33 EDT
Jan, 

Extending mock-helper is something we've been resisting fairly strenuously,
since it's a setuid root program and has the potential to be a cracker's attack
vector. I have mixed feelings about adding it, since I see it's utility in the
GDB testsuite case, but I'm not sure that it's a generally useful command. I'll
take it up with my co-maintainers and see what they think.

Just so I understand the intent of the patch, the orphanskill command to
mock-helper processes all the task entries in /proc, finds any task with a
"root" link that matches the current chroot, and sends a kill(pid, SIGKILL) to
that task. Is this correct?

Clark
Comment 11 Jan Kratochvil 2007-07-10 13:00:13 EDT
Clark,

you are right regarding the `orphanskill' command functionality.

I agree it is a testsuite bug, any build should not leave any stale processes.
In the GDB case the testsuite is just too big with no general possibility to fix
it, one would have to review all the 400 testcases / 4MB of sources there. Due
to the effort costs Red Hat + upstream decided not to review+fix the testsuite.

Still I believe it is a clearly detectable failure of a build - if the direct
child process dies and any stale process exists.  Another decision is if the
processes should be silently killed or just aborting the build as a failed one.

It is uneasy to write such `orphanskill' command outside of the mock as the
spawned process may and does change everything making it undetectable without
root privileges (they setsid(), they create new ptys, parents dying reparenting
their children to init(8)).  Killing an unrelated user's process would be a pity.


Fortunately it is NO LONGER A BLOCKER FOR ME as I wrote today a workaround - it
cannot kill all the orphan processes but it kills those causing the mock hang
(using open mock fd).
  http://cvs.jankratochvil.net/viewcvs/nethome/src/orphanripper.c?rev=HEAD
It helps builds outside of the mock but it still may not (does not? unaware now)
kill all the stale processes.
Comment 12 Clark Williams 2007-07-10 13:05:41 EDT
I'm glad you have a workaround and I apologize for taking so long to address it.
I"ve sent a message to the fedora-buildsys-list asking if anyone else could make
use of the orphanskill functionality. 

We'll have a big argument and then decide if it's general purpose enough for
mock or not. Film at 11. :)

Clark
Comment 13 Clark Williams 2008-02-28 18:21:18 EST
orphanskill logic was added in the Great Mock Rewrite done by Michael. Closing
as fixed.

Note You need to log in before you can comment on or make changes to this bug.