Bug 221351

Summary: Kill orphan processes
Product: [Retired] Fedora Hosted Projects Reporter: Jan Kratochvil <jan.kratochvil>
Component: mockAssignee: Clark Williams <williams>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: dcantrell, mikeb
Target Milestone: ---Keywords: Patch
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: mock-0.9.7-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-02-28 23:21:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix implementing "mock-helper orphanskill <chrootdir>".
none
Trivia .src.rpm for the bug reproducibility.
none
Fix implementing "mock-helper orphanskill <chrootdir>". (update) none

Description Jan Kratochvil 2007-01-04 00:04:42 UTC
Description of problem:
Currently rpm building by mock can get stuck if some stale processes remain
running. mock(1) tries to read all the input and these orphans have their fds
kept open.

Version-Release number of selected component (if applicable):
mock-0.6.8-4.i386

How reproducible:
Always.

Steps to Reproduce:
1. mock --debug -r fedora-6-i386-core --no-clean rebuild
/tmp/gecko-libs-1.8.1.1-0.src.rpm
  
Actual results:
build
DEBUG: Executing /usr/sbin/mock-helper chroot
/var/lib/mock/fedora-6-i386-core/root /sbin/runuser - root -c "cd
/;/sbin/runuser -c 'rpmbuild --rebuild  --target i386 --nodeps
/builddir/build/SRPMS/gecko-libs-1.8.1.1-0.fc6.src.rpm' mockbuild"
[stuck]
With ps(1) showing:
 3044 pts/7    S+     0:00              \_ /usr/bin/python -tt /usr/bin/mock
--debug -r fedora-6-i386-core --no-clean rebuild 
/home/jkratoch/src/rpm/SRPMS/gecko-libs-1.8.1.1-0.src.rpm
 3415 pts/7    Z+     0:00                  \_ [sh] <defunct>
 3516 ?        S      0:00 sleep 24h
and ls(1):
l-wx------ 1 jkratoch jkratoch 64 Jan  4 01:03 /proc/3516/fd/1 -> pipe:[3181267]
l-wx------ 1 jkratoch jkratoch 64 Jan  4 01:03 /proc/3516/fd/2 -> pipe:[3181267]
lr-x------ 1 jkratoch jkratoch 64 Jan  4 01:03 /proc/3044/fd/5 -> pipe:[3181267]
and "strace -s200 -f -q -p 3044":
read(5, 

Expected results:
Finished/closed build.

Additional info:
With the patch the output contains the informational debug line(s):
+ cd /builddir/build/BUILD
+ exit 0
mock-helper: warning: Killed -9 orphan PID 14127: sleep 24h
ending

Comment 1 Jan Kratochvil 2007-01-04 00:04:42 UTC
Created attachment 144756 [details]
Fix implementing "mock-helper orphanskill <chrootdir>".

Comment 2 Jan Kratochvil 2007-01-04 00:06:24 UTC
Created attachment 144757 [details]
Trivia .src.rpm for the bug reproducibility.

The relevant content is the .spec part:
%build
sleep 24h &

Comment 4 Jan Kratochvil 2007-01-04 00:18:45 UTC
Created attachment 144758 [details]
Fix implementing "mock-helper orphanskill <chrootdir>". (update)

(whitespacing)

Comment 8 Andrew Cagney 2007-07-10 13:28:56 UTC
Jan,
As a workaround, is the test disabled?


Comment 9 Jan Kratochvil 2007-07-10 13:36:14 UTC
Andrew,

It affects various testcases, not sure about their count:
29126 ?        T      0:10
/builddir/build/BUILD/gdb-6.3/build-ppc64-redhat-linux-gnu/gdb/testsuite/gdb.threads/watchthreads2
 6968 ?        T      0:00
/builddir/build/BUILD/gdb-6.3/build-ppc-redhat-linux-gnu/gdb/testsuite/gdb.threads/bt-clone-stop

And in many cases I just even can't figure out which testcases caused it:
28906 ?        T      0:00
/builddir/build/BUILD/gdb-6.3/build-x86_64-redhat-linux-gnu/gdb/testsuite/../../gdb/gdb
-nw -nx
28907 ?        Z      0:00  \_ [gdb] <defunct>
29194 ?        T      0:00
/builddir/build/BUILD/gdb-6.3/build-x86_64-redhat-linux-gnu/gdb/testsuite/../../gdb/gdb
-nw -nx
29195 ?        Z      0:00  \_ [gdb] <defunct>

That `bt-clone-stop' testcase is mine and the testcase source is perfectly valid
so there must be bug in the testsuite framework.
Still the testcases just spawn various asynchronous processes by:
    set testpid [eval exec $binfile &]

And the TCL/expect/testsuite has no clue which processes were forked asynchronously.


Comment 10 Clark Williams 2007-07-10 16:36:33 UTC
Jan, 

Extending mock-helper is something we've been resisting fairly strenuously,
since it's a setuid root program and has the potential to be a cracker's attack
vector. I have mixed feelings about adding it, since I see it's utility in the
GDB testsuite case, but I'm not sure that it's a generally useful command. I'll
take it up with my co-maintainers and see what they think.

Just so I understand the intent of the patch, the orphanskill command to
mock-helper processes all the task entries in /proc, finds any task with a
"root" link that matches the current chroot, and sends a kill(pid, SIGKILL) to
that task. Is this correct?

Clark

Comment 11 Jan Kratochvil 2007-07-10 17:00:13 UTC
Clark,

you are right regarding the `orphanskill' command functionality.

I agree it is a testsuite bug, any build should not leave any stale processes.
In the GDB case the testsuite is just too big with no general possibility to fix
it, one would have to review all the 400 testcases / 4MB of sources there. Due
to the effort costs Red Hat + upstream decided not to review+fix the testsuite.

Still I believe it is a clearly detectable failure of a build - if the direct
child process dies and any stale process exists.  Another decision is if the
processes should be silently killed or just aborting the build as a failed one.

It is uneasy to write such `orphanskill' command outside of the mock as the
spawned process may and does change everything making it undetectable without
root privileges (they setsid(), they create new ptys, parents dying reparenting
their children to init(8)).  Killing an unrelated user's process would be a pity.


Fortunately it is NO LONGER A BLOCKER FOR ME as I wrote today a workaround - it
cannot kill all the orphan processes but it kills those causing the mock hang
(using open mock fd).
  http://cvs.jankratochvil.net/viewcvs/nethome/src/orphanripper.c?rev=HEAD
It helps builds outside of the mock but it still may not (does not? unaware now)
kill all the stale processes.


Comment 12 Clark Williams 2007-07-10 17:05:41 UTC
I'm glad you have a workaround and I apologize for taking so long to address it.
I"ve sent a message to the fedora-buildsys-list asking if anyone else could make
use of the orphanskill functionality. 

We'll have a big argument and then decide if it's general purpose enough for
mock or not. Film at 11. :)

Clark

Comment 13 Clark Williams 2008-02-28 23:21:18 UTC
orphanskill logic was added in the Great Mock Rewrite done by Michael. Closing
as fixed.