Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1707448

Summary:	jenkins-slave produce process defunct [ Jenkins "SLAVE" ]
Product:	OpenShift Container Platform	Reporter:	Gabe Montero <gmontero>
Component:	ImageStreams	Assignee:	Gabe Montero <gmontero>
Status:	CLOSED ERRATA	QA Contact:	XiuJuan Wang <xiuwang>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.11.0	CC:	aos-bugs, fgrosjea, gmontero, jokerman, maupadhy, mmccomas, vbobade, wzheng, xiuwang
Target Milestone:	---
Target Release:	3.11.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Long running jenkins agent/slave pods can experience the defect process phenomenon that we previously observed with the jenkins master Consequence: A lot of defect processes show up in process listings until the pod is terminated. Fix: Employ `dumb-init` as with the openshift/jenkins master image to clean up these defect processes which occur during jenkins job processing. Result: Process listings within agent/slave pods and on the hosts those pods reside no longer include the defunct processes.	Story Points:	---
Clone Of:	1700314	Environment:
Last Closed:	2019-06-26 09:08:09 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1700314, 1718379
Bug Blocks:	1705123, 1707447

Comment 1 Gabe Montero 2019-05-07 15:03:33 UTC

PR https://github.com/openshift/jenkins/pull/844 is up for 3.11

Comment 2 Gabe Montero 2019-05-07 16:43:43 UTC

OK the PR has merged.

This needs to be tested once the maven/nodejs images show up on brew with this commit.

To test, you can run one of our sample pipelines that leverage node('maven') and node('nodejs)

But go into the jenkins console and change the idle timeout for those pods to a non-zero number like 30 minutes or something.

Run the pipeline build, and those slave pods should stay up for up to that 30 minute setting.

rsh into the slave pod, and run ps -ef 

eventually dumb-init should clean up any defunct processes, and you should see the dumb-init / run-jnlp-client processes.

Comment 4 Gabe Montero 2019-05-13 13:44:27 UTC

Hey XiuJuan

That pull does not impact the openshift3/jenkins-2-rhel7 image

It impacts slave-base, which in turn is is used by the nodejs and maven agent/slave images.

That said, yeah, I pulled the latest 3.11 build and it does not look like the latest changes
have been pulled into brew and built.

I believe I did see some emails from ART (Luke Meyer I think) about pausing some of the 3.x
activity until 4.1 ships.

My guess is we just have to be patient.

Comment 5 Gabe Montero 2019-05-13 13:45:40 UTC

OH - forgot ... I have been getting ART/OSBS message about failed builds for the agent images as well.

Most likely that has some bearing.  It is on my list today to sort out what is up with that.

Comment 6 Gabe Montero 2019-05-13 14:52:24 UTC

Turns out we will need https://github.com/openshift/ocp-build-data/pull/120 to merge before we can start getting slave builds at osbs/brew with dumb-init

Comment 7 XiuJuan Wang 2019-05-14 06:20:14 UTC

Gabe,
Thanks,
When maven/nodejs agent|slave images come out, will do test again.

Comment 8 XiuJuan Wang 2019-05-23 09:01:00 UTC

Can't reproduce this with 
openshift3/jenkins-agent-maven-35-rhel7:v3.11 (v3.11.104-3)
openshift3/jenkins-agent-nodejs-8-rhel7:v3.11 (v3.11.104-3)
openshift3/jenkins-slave-maven-rhel7:v3.11  (v3.11.104-3)
openshift3/jenkins-slave-nodejs-rhel7:v3.11 (v3.11.104-3)
openshift3/jenkins-2-rhel7:v3.11 (v3.11.104-4)

Steps:
1. Create jenkins server and maven| nodejs pipeline buildconfigs.
2.Login to jenkins console to set the maven/nodejs pod idle 30 mins
3.Trigger maven and nodejs pipeline builds.
4.Rsh into slave pod when time is almost out.
dumb-init process has cleaned defunct processes, no defunct processes exist.

# oc get pods 
NAME                                  READY     STATUS      RESTARTS   AGE
maven-cd8jf                           1/1       Running     0          28m
nodejs-s4ld0                          1/1       Running     0          28m

# oc rsh maven-cd8jf 
sh-4.2$ ps -ef
UID         PID   PPID  C STIME TTY          TIME CMD
default       1      0  0 07:44 ?        00:00:00 /usr/bin/dumb-init -- /usr/local/bin/run-jnlp-client b61603a67cc20951d043ada93bac71a4e05e824feadeb45cedd21eec3fb428e4 maven-cd8jf
default       6      1  1 07:44 ?        00:00:27 java -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -cp /home/jenkins/remoting.jar hudson.re
default    1388      0  0 08:11 ?        00:00:00 /bin/sh
default    1396   1388  0 08:11 ?        00:00:00 ps -ef
sh-4.2$ exit
exit
# oc rsh nodejs-s4ld0 
sh-4.2$ ps -ef
UID         PID   PPID  C STIME TTY          TIME CMD
default       1      0  0 07:43 ?        00:00:00 /usr/bin/dumb-init -- /usr/local/bin/run-jnlp-client 3f07ef821c50969340bf7be8fa4909574ae1d8a296b846a54da71260462910f4 nodejs-s4ld0
default       6      1  1 07:43 ?        00:00:28 java -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -cp /home/jenkins/remoting.jar hudson.re
default    1541      0  0 08:12 ?        00:00:00 /bin/sh
default    1549   1541  0 08:12 ?        00:00:00 ps -ef
sh-4.2$

Comment 10 errata-xmlrpc 2019-06-26 09:08:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1605