Bug 807260

Summary: Jenkins hang there forever when slave app DNS can not be resolved.
Product: OKD Reporter: Johnny Liu <jialiu>
Component: ContainersAssignee: Bill DeCoste <wdecoste>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.xCC: rmillner
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-13 18:30:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Johnny Liu 2012-03-27 11:44:19 UTC
Description of problem:
Jenkins hang there forever when slave app DNS can not be resolved.

jenkins log:
<--snip-->
Mar 27, 2012 7:40:13 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Mar 27, 2012 7:40:14 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Mar 27, 2012 7:40:19 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Mar 27, 2012 7:40:19 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
<--snip-->

Version-Release number of selected component (if applicable):
jenkins-plugin-openshift-0.5.4-1.el6_2.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create an app with jenkins client embedded.
2. Log into instance, set PUBLIC_HOSTNAME to be invalid on purpose to reproduce this issue.
# vi /etc/stickshift/stickshift-node.conf
PUBLIC_HOSTNAME=aa.bbbbius.com
# /usr/libexec/mcollective/update_yaml.rb > /etc/mcollective/facts.yaml
3. Do some change, do git push to trigger git jenkins build.
  
Actual results:
Jenkins build job hang there for ever.

Expected results:
When some failure is always happening, jenkins build should fail to avoid user's wasting time on it, and tell user to check jenkins log to debug this issue.


Additional info:

Comment 1 Johnny Liu 2012-03-27 11:46:09 UTC
Actually this issue is already addressed in Bug 802686, but the fix patch ignore this issue, just fix partially. So I file this new bug to track this issue.

Comment 2 Bill DeCoste 2012-03-27 19:54:58 UTC
Added a timeout to the node/slave. Default is 60s. Build will be terminated at timeout if DNS does not resolve

Comment 3 Johnny Liu 2012-03-29 05:36:57 UTC
Verified this bug with devenv_1679, and PASS.

Jenkins log:
<--snip-->
Mar 29, 2012 1:34:00 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Mar 29, 2012 1:34:01 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Mar 29, 2012 1:33:04 AM hudson.plugins.openshift.OpenShiftSlave connect
WARNING: Slave DNS not propagated. Timing out.
Mar 29, 2012 1:33:04 AM hudson.plugins.openshift.OpenShiftCloud$2 call
WARNING: Unable to provision node java.io.IOException: Slave DNS not propagated. Timing out.
Mar 29, 2012 1:33:05 AM hudson.slaves.NodeProvisioner update
WARNING: Provisioned slave phptest-build failed to launch
java.io.IOException: Slave DNS not propagated. Timing out.
	at hudson.plugins.openshift.OpenShiftSlave.connect(OpenShiftSlave.java:198)
	at hudson.plugins.openshift.OpenShiftSlave.provision(OpenShiftSlave.java:210)
	at hudson.plugins.openshift.OpenShiftCloud$2.call(OpenShiftCloud.java:459)
	at hudson.plugins.openshift.OpenShiftCloud$2.call(OpenShiftCloud.java:451)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:679)
<--snip-->