Bug 811509 - Need more timeout to resolve node/slave DNS
Summary: Need more timeout to resolve node/slave DNS
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Containers
Version: 1.x
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Bill DeCoste
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-04-11 09:53 UTC by Johnny Liu
Modified: 2012-04-27 20:46 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-27 20:46:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Johnny Liu 2012-04-11 09:53:54 UTC
Description of problem:
According to 807260, the default timeout is 60s for resolving slave DNS.
Maybe it is not enough. 
Recently, I often encounter jenkins build failure due to not enough timeout.

$ git push
remote: You can track your build at https://jenkins-jialiu.dev.rhcloud.com/job/phptest-build
remote: 
remote: Waiting for build to schedule................................................................
remote: **BUILD FAILED/CANCELLED**
remote: Please see the Jenkins log for more details via rhc-tail-files
remote: !!!!!!!!
remote: Deployment Halted!
remote: If the build failed before the deploy step, your previous
remote: build is still running.  Otherwise, your application may be
remote: partially deployed or inaccessible.
remote: Fix the build and try again.
remote: !!!!!!!!
To ssh://0f9708805c19406082c4d126cb227f26.rhcloud.com/~/git/phptest.git/
   8f71a00..6b127b3  master -> master

But check jenkins log, found it succeed.
<--jenkins log-->
Apr 11, 2012 5:36:39 AM hudson.plugins.openshift.OpenShiftSlave stopApp
INFO: Slave stopping application...
Apr 11, 2012 5:36:41 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Connecting to slave phptestbldr...
Apr 11, 2012 5:36:41 AM hudson.plugins.openshift.OpenShiftSlave stopApp
INFO: Slave stopping application...
Apr 11, 2012 5:36:42 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Established UUID = b4cdcaa6dfe64929bbc651be669ac2b7
Apr 11, 2012 5:36:44 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Connecting to slave phptestbldr...
Apr 11, 2012 5:36:44 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Established UUID = b4cdcaa6dfe64929bbc651be669ac2b7
Apr 11, 2012 5:36:47 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:36:47 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 11, 2012 5:36:49 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:36:50 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 11, 2012 5:36:52 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:36:53 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 11, 2012 5:36:55 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:36:55 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 11, 2012 5:36:58 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:36:58 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 11, 2012 5:37:00 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:37:00 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 11, 2012 5:37:03 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:37:04 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 11, 2012 5:37:05 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:37:06 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 11, 2012 5:37:09 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:37:09 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 11, 2012 5:37:11 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:37:11 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 11, 2012 5:37:14 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:37:15 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 11, 2012 5:37:16 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:37:17 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 11, 2012 5:37:20 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:37:20 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS resolved - phptestbldr-jialiu.dev.rhcloud.com/10.62.5.140
Apr 11, 2012 5:37:20 AM hudson.plugins.openshift.OpenShiftComputer <init>
INFO: Creating Computer
Apr 11, 2012 5:37:20 AM hudson.plugins.openshift.OpenShiftComputerLauncher launch
INFO: Launching slave...
Apr 11, 2012 5:37:20 AM hudson.plugins.openshift.OpenShiftComputerLauncher launch
INFO: Checking availability of computer hudson.plugins.openshift.OpenShiftSlave@fb809362
Apr 11, 2012 5:37:20 AM hudson.plugins.openshift.OpenShiftComputerLauncher launch
INFO: Checking SSH access to application phptestbldr-jialiu.dev.rhcloud.com
Apr 11, 2012 5:37:21 AM hudson.slaves.NodeProvisioner update
INFO: phptest-build provisioning successfully completed. We have now 1 computer(s)
Apr 11, 2012 5:37:22 AM hudson.plugins.openshift.OpenShiftComputerLauncher launch
INFO: Connected via SSH.
Apr 11, 2012 5:37:22 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 11, 2012 5:37:23 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS resolved - phptestbldr-jialiu.dev.rhcloud.com/10.62.5.140
Apr 11, 2012 5:37:23 AM hudson.slaves.NodeProvisioner update
INFO: phptest-build provisioning successfully completed. We have now 1 computer(s)
Apr 11, 2012 5:37:28 AM hudson.plugins.openshift.OpenShiftComputerLauncher launch
INFO: Slave connected.
Apr 11, 2012 5:37:52 AM hudson.model.Run run
INFO: phptest-build #1 main build action completed: SUCCESS
<--jenkins log-->

That will cause application can not be accessed, though jenkins build actually has been completed successfully.

Version-Release number of selected component (if applicable):
devenv_1715

How reproducible:
Often

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Johnny Liu 2012-04-11 09:55:49 UTC
Personally, I think 360s will be better.

Comment 2 Bill DeCoste 2012-04-11 14:04:34 UTC
Increased to 5 mins (300000ms)

Comment 3 Johnny Liu 2012-04-12 11:42:19 UTC
Re-test this bug jenkins-plugin-openshift-0.5.11-2.el6_2.x86_64 with on devenv-stage_166, it looks like the timeout still is "60s".

Check timestamps in jenkins log: 
<--snip-->
INFO: Connecting to slave phptestbldr...
Apr 12, 2012 7:36:04 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Established UUID = ac764dee3cd74acea181aa4aa314f149
Apr 12, 2012 7:36:04 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Established UUID = ac764dee3cd74acea181aa4aa314f149
Apr 12, 2012 7:36:09 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
Apr 12, 2012 7:36:09 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS is resolvable...
<--snip-->
Apr 12, 2012 7:37:11 AM hudson.plugins.openshift.OpenShiftSlave connect
WARNING: Slave DNS not propagated. Timing out.
Apr 12, 2012 7:37:11 AM hudson.plugins.openshift.OpenShiftCloud$2 call
WARNING: Unable to provision node java.io.IOException: Slave DNS not propagated. Timing out.
Apr 12, 2012 7:37:11 AM hudson.plugins.openshift.OpenShiftCloud cancelBuild
<--snip-->

Comment 4 Bill DeCoste 2012-04-12 12:43:24 UTC
Missed changing the jenkins_job_template.xml in li. Should be good to go in the next build.

Comment 5 Johnny Liu 2012-04-16 07:33:55 UTC
Re-test this bug with cartridge-jenkins-client-1.4-0.25.2-1.el6_2.noarchh on devenv-1732, still reproduce.

Check timestamps in jenkins log: 
<--snip-->
Apr 16, 2012 3:30:46 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Established UUID = c117b18230624c878527375d92eebbbd
Apr 16, 2012 3:30:51 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS for phptestbldr-jialiu.dev.rhcloud.com is resolvable ...
Apr 16, 2012 3:30:51 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
<--snip-->
Apr 16, 2012 3:31:51 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 16, 2012 3:31:56 AM hudson.plugins.openshift.OpenShiftSlave connect
WARNING: Slave DNS not propagated. Timing out.
Apr 16, 2012 3:31:56 AM hudson.plugins.openshift.OpenShiftCloud$2 call
WARNING: Unable to provision node java.io.IOException: Slave DNS not propagated. Timing out.
Apr 16, 2012 3:31:56 AM hudson.plugins.openshift.OpenShiftCloud cancelBuild
INFO: Cancelling build
Apr 16, 2012 3:31:56 AM hudson.plugins.openshift.OpenShiftCloud cancelBuild
WARNING: Build for label phptest-build has been cancelled
Apr 16, 2012 3:31:56 AM hudson.slaves.NodeProvisioner update
WARNING: Provisioned slave phptest-build failed to launch
java.io.IOException: Slave DNS not propagated. Timing out
<--snip-->

Comment 6 Johnny Liu 2012-04-16 07:35:47 UTC
(In reply to comment #5)
> Re-test this bug with cartridge-jenkins-client-1.4-0.25.2-1.el6_2.noarchh on
> devenv-1732, still reproduce.
> 
Testing is executed against devenv_1723

Comment 7 Bill DeCoste 2012-04-16 14:17:51 UTC
One more time.

Comment 8 Johnny Liu 2012-04-18 06:49:40 UTC
The bug is fixed in jenkins_job_template.xml for every cartridge, but currently there is no latest cartridge, once newer cartridge is came out, I will verify this bug.
E.g:
For now, cartridge-php-5.3-0.91.2-1.el6_2.noarch is installed on latest instance.

Comment 9 Johnny Liu 2012-04-23 05:31:27 UTC
Verified this bug on devenv_1735, and PASS.

Check timestamps in jenkins log: 
<--snip-->
Apr 23, 2012 1:24:27 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Established UUID = 1592ec2d410c4e9bb425db6c828ac252
Apr 23, 2012 1:24:33 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS for wsgitestbldr-jialiu.dev.rhcloud.com is resolvable ...
Apr 23, 2012 1:24:37 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 23, 2012 1:24:42 AM hudson.plugins.openshift.OpenShiftSlave connect
<--snip-->
Apr 23, 2012 1:29:31 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS not propagated yet, retrying...
Apr 23, 2012 1:29:36 AM hudson.plugins.openshift.OpenShiftSlave connect
WARNING: Slave DNS not propagated. Timing out.
Apr 23, 2012 1:29:36 AM hudson.plugins.openshift.OpenShiftCloud$2 call
WARNING: Unable to provision node java.io.IOException: Slave DNS not propagated. Timing out.
Apr 23, 2012 1:29:36 AM hudson.plugins.openshift.OpenShiftCloud cancelBuild
INFO: Cancelling build
<--snip-->


Note You need to log in before you can comment on or make changes to this bug.