Bug 1707448 - jenkins-slave produce process defunct [ Jenkins "SLAVE" ]
Summary: jenkins-slave produce process defunct [ Jenkins "SLAVE" ]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: ImageStreams
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Gabe Montero
QA Contact: XiuJuan Wang
URL:
Whiteboard:
Depends On: 1700314 1718379
Blocks: 1705123 1707447
TreeView+ depends on / blocked
 
Reported: 2019-05-07 14:37 UTC by Gabe Montero
Modified: 2019-06-26 09:08 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Long running jenkins agent/slave pods can experience the defect process phenomenon that we previously observed with the jenkins master Consequence: A lot of defect processes show up in process listings until the pod is terminated. Fix: Employ `dumb-init` as with the openshift/jenkins master image to clean up these defect processes which occur during jenkins job processing. Result: Process listings within agent/slave pods and on the hosts those pods reside no longer include the defunct processes.
Clone Of: 1700314
Environment:
Last Closed: 2019-06-26 09:08:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:1605 0 None None None 2019-06-26 09:08:23 UTC

Comment 1 Gabe Montero 2019-05-07 15:03:33 UTC
PR https://github.com/openshift/jenkins/pull/844 is up for 3.11

Comment 2 Gabe Montero 2019-05-07 16:43:43 UTC
OK the PR has merged.

This needs to be tested once the maven/nodejs images show up on brew with this commit.

To test, you can run one of our sample pipelines that leverage node('maven') and node('nodejs)

But go into the jenkins console and change the idle timeout for those pods to a non-zero number like 30 minutes or something.

Run the pipeline build, and those slave pods should stay up for up to that 30 minute setting.

rsh into the slave pod, and run ps -ef 

eventually dumb-init should clean up any defunct processes, and you should see the dumb-init / run-jnlp-client processes.

Comment 4 Gabe Montero 2019-05-13 13:44:27 UTC
Hey XiuJuan

That pull does not impact the openshift3/jenkins-2-rhel7 image

It impacts slave-base, which in turn is is used by the nodejs and maven agent/slave images.

That said, yeah, I pulled the latest 3.11 build and it does not look like the latest changes
have been pulled into brew and built.

I believe I did see some emails from ART (Luke Meyer I think) about pausing some of the 3.x
activity until 4.1 ships.

My guess is we just have to be patient.

Comment 5 Gabe Montero 2019-05-13 13:45:40 UTC
OH - forgot ... I have been getting ART/OSBS message about failed builds for the agent images as well.

Most likely that has some bearing.  It is on my list today to sort out what is up with that.

Comment 6 Gabe Montero 2019-05-13 14:52:24 UTC
Turns out we will need https://github.com/openshift/ocp-build-data/pull/120 to merge before we can start getting slave builds at osbs/brew with dumb-init

Comment 7 XiuJuan Wang 2019-05-14 06:20:14 UTC
Gabe,
Thanks,
When maven/nodejs agent|slave images come out, will do test again.

Comment 8 XiuJuan Wang 2019-05-23 09:01:00 UTC
Can't reproduce this with 
openshift3/jenkins-agent-maven-35-rhel7:v3.11 (v3.11.104-3)
openshift3/jenkins-agent-nodejs-8-rhel7:v3.11 (v3.11.104-3)
openshift3/jenkins-slave-maven-rhel7:v3.11  (v3.11.104-3)
openshift3/jenkins-slave-nodejs-rhel7:v3.11 (v3.11.104-3)
openshift3/jenkins-2-rhel7:v3.11 (v3.11.104-4)

Steps:
1. Create jenkins server and maven| nodejs pipeline buildconfigs.
2.Login to jenkins console to set the maven/nodejs pod idle 30 mins
3.Trigger maven and nodejs pipeline builds.
4.Rsh into slave pod when time is almost out.
dumb-init process has cleaned defunct processes, no defunct processes exist.

# oc get pods 
NAME                                  READY     STATUS      RESTARTS   AGE
maven-cd8jf                           1/1       Running     0          28m
nodejs-s4ld0                          1/1       Running     0          28m

# oc rsh maven-cd8jf 
sh-4.2$ ps -ef
UID         PID   PPID  C STIME TTY          TIME CMD
default       1      0  0 07:44 ?        00:00:00 /usr/bin/dumb-init -- /usr/local/bin/run-jnlp-client b61603a67cc20951d043ada93bac71a4e05e824feadeb45cedd21eec3fb428e4 maven-cd8jf
default       6      1  1 07:44 ?        00:00:27 java -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -cp /home/jenkins/remoting.jar hudson.re
default    1388      0  0 08:11 ?        00:00:00 /bin/sh
default    1396   1388  0 08:11 ?        00:00:00 ps -ef
sh-4.2$ exit
exit
# oc rsh nodejs-s4ld0 
sh-4.2$ ps -ef
UID         PID   PPID  C STIME TTY          TIME CMD
default       1      0  0 07:43 ?        00:00:00 /usr/bin/dumb-init -- /usr/local/bin/run-jnlp-client 3f07ef821c50969340bf7be8fa4909574ae1d8a296b846a54da71260462910f4 nodejs-s4ld0
default       6      1  1 07:43 ?        00:00:28 java -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -cp /home/jenkins/remoting.jar hudson.re
default    1541      0  0 08:12 ?        00:00:00 /bin/sh
default    1549   1541  0 08:12 ?        00:00:00 ps -ef
sh-4.2$

Comment 10 errata-xmlrpc 2019-06-26 09:08:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1605


Note You need to log in before you can comment on or make changes to this bug.