Bug 1705123

Summary: jenkins-slave produce process defunct [ Jenkins "SLAVE" ]
Product: OpenShift Container Platform Reporter: Gabe Montero <gmontero>
Component: ImageStreamsAssignee: Gabe Montero <gmontero>
Status: CLOSED ERRATA QA Contact: XiuJuan Wang <xiuwang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.zCC: aos-bugs, gmontero, jokerman, maupadhy, mmccomas, vbobade, wzheng, xiuwang
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Long running jenkins agent/slave pods can experience the defect process phenomenon that we previously observed with the jenkins master Consequence: A lot of defect processes show up in process listings until the pod is terminated. Fix: Employ `dumb-init` as with the openshift/jenkins master image to clean up these defect processes which occur during jenkins job processing. Result: Process listings within agent/slave pods and on the hosts those pods reside no longer include the defunct processes.
Story Points: ---
Clone Of: 1700314 Environment:
Last Closed: 2019-10-16 06:28:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1700314, 1707447, 1707448, 1718379    
Bug Blocks:    

Comment 1 Gabe Montero 2019-05-01 14:41:21 UTC
PR https://github.com/openshift/jenkins/pull/845 is up

Current thought amongst devex team is this is a post 4.1.0 GA item.

Comment 2 Gabe Montero 2019-05-14 17:19:55 UTC
PR has merged ... in for 4.2 inclusion

will use this bug for that and will clone for 4.1.z inclusion

Comment 3 Gabe Montero 2019-05-14 20:11:57 UTC
also need https://github.com/openshift/ocp-build-data/pull/126 for the osbs/brew side to be able to install dumb-init

Comment 4 XiuJuan Wang 2019-05-23 08:11:32 UTC
Can't reproduce this with 4.2.0-0.ci-2019-05-23-003410 payload.

Steps:
1. Create jenkins server and maven| nodejs pipeline buildconfigs.
2.Login to jenkins console to set the maven/nodejs pod idle 30 mins
3.Trigger maven and nodejs pipeline builds.
4.Rsh into slave pod when time is almost out.
dumb-init process has cleaned defunct processes, no defunct processes exist.

$ oc get pods 
NAME                                  READY   STATUS      RESTARTS   AGE
maven-d1dvn                           1/1     Running     0          22m
nodejs-l6p30                          1/1     Running     0          22m

$ oc rsh nodejs-l6p30 
sh-4.2$ ps -ef 
UID         PID   PPID  C STIME TTY          TIME CMD
default       1      0  0 07:46 ?        00:00:00 /usr/bin/dumb-init -- /usr/local/bin/run-jnlp-client 89e8841a50290a213f74d378a5f2939031e6b330ef27d24e052439ffa294ad44 nodejs-l6p30
default       6      1  1 07:46 ?        00:00:25 java -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -cp /home/jenkins/remoting.jar hudson.re
default     319      0  0 08:09 pts/0    00:00:00 /bin/sh
default     327    319  0 08:09 pts/0    00:00:00 ps -ef
sh-4.2$ exit
$ oc rsh maven-d1dvn 
sh-4.2$ ps -ef
UID         PID   PPID  C STIME TTY          TIME CMD
default       1      0  0 07:46 ?        00:00:00 /usr/bin/dumb-init -- /usr/local/bin/run-jnlp-client 26293759c3dac125c3fc913b3d9381a5f5b6a3972f4e1bf96b6ef3cf706e2b89 maven-d1dvn
default       6      1  2 07:46 ?        00:00:30 java -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -cp /home/jenkins/remoting.jar hudson.re
default     461      0  0 08:10 pts/0    00:00:00 /bin/sh
default     469    461  0 08:10 pts/0    00:00:00 ps -ef

Comment 5 errata-xmlrpc 2019-10-16 06:28:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922