Bug 1705123

Summary:	jenkins-slave produce process defunct [ Jenkins "SLAVE" ]
Product:	OpenShift Container Platform	Reporter:	Gabe Montero <gmontero>
Component:	ImageStreams	Assignee:	Gabe Montero <gmontero>
Status:	CLOSED ERRATA	QA Contact:	XiuJuan Wang <xiuwang>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.1.z	CC:	aos-bugs, gmontero, jokerman, maupadhy, mmccomas, vbobade, wzheng, xiuwang
Target Milestone:	---
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Long running jenkins agent/slave pods can experience the defect process phenomenon that we previously observed with the jenkins master Consequence: A lot of defect processes show up in process listings until the pod is terminated. Fix: Employ `dumb-init` as with the openshift/jenkins master image to clean up these defect processes which occur during jenkins job processing. Result: Process listings within agent/slave pods and on the hosts those pods reside no longer include the defunct processes.	Story Points:	---
Clone Of:	1700314	Environment:
Last Closed:	2019-10-16 06:28:21 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1700314, 1707447, 1707448, 1718379
Bug Blocks:

Comment 1 Gabe Montero 2019-05-01 14:41:21 UTC

PR https://github.com/openshift/jenkins/pull/845 is up

Current thought amongst devex team is this is a post 4.1.0 GA item.

Comment 2 Gabe Montero 2019-05-14 17:19:55 UTC

PR has merged ... in for 4.2 inclusion

will use this bug for that and will clone for 4.1.z inclusion

Comment 3 Gabe Montero 2019-05-14 20:11:57 UTC

also need https://github.com/openshift/ocp-build-data/pull/126 for the osbs/brew side to be able to install dumb-init

Comment 4 XiuJuan Wang 2019-05-23 08:11:32 UTC

Can't reproduce this with 4.2.0-0.ci-2019-05-23-003410 payload.

Steps:
1. Create jenkins server and maven| nodejs pipeline buildconfigs.
2.Login to jenkins console to set the maven/nodejs pod idle 30 mins
3.Trigger maven and nodejs pipeline builds.
4.Rsh into slave pod when time is almost out.
dumb-init process has cleaned defunct processes, no defunct processes exist.

$ oc get pods 
NAME                                  READY   STATUS      RESTARTS   AGE
maven-d1dvn                           1/1     Running     0          22m
nodejs-l6p30                          1/1     Running     0          22m

$ oc rsh nodejs-l6p30 
sh-4.2$ ps -ef 
UID         PID   PPID  C STIME TTY          TIME CMD
default       1      0  0 07:46 ?        00:00:00 /usr/bin/dumb-init -- /usr/local/bin/run-jnlp-client 89e8841a50290a213f74d378a5f2939031e6b330ef27d24e052439ffa294ad44 nodejs-l6p30
default       6      1  1 07:46 ?        00:00:25 java -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -cp /home/jenkins/remoting.jar hudson.re
default     319      0  0 08:09 pts/0    00:00:00 /bin/sh
default     327    319  0 08:09 pts/0    00:00:00 ps -ef
sh-4.2$ exit
$ oc rsh maven-d1dvn 
sh-4.2$ ps -ef
UID         PID   PPID  C STIME TTY          TIME CMD
default       1      0  0 07:46 ?        00:00:00 /usr/bin/dumb-init -- /usr/local/bin/run-jnlp-client 26293759c3dac125c3fc913b3d9381a5f5b6a3972f4e1bf96b6ef3cf706e2b89 maven-d1dvn
default       6      1  2 07:46 ?        00:00:30 java -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -cp /home/jenkins/remoting.jar hudson.re
default     461      0  0 08:10 pts/0    00:00:00 /bin/sh
default     469    461  0 08:10 pts/0    00:00:00 ps -ef

Comment 5 errata-xmlrpc 2019-10-16 06:28:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922