Bug 984882 - It will report as failed from client output immediately when jenkins slave creation got failed the 1st time regardless of the 5 times retrying on the node
Summary: It will report as failed from client output immediately when jenkins slave cr...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Jhon Honce
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-16 09:37 UTC by Meng Bo
Modified: 2015-05-14 23:24 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-24 03:22:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Meng Bo 2013-07-16 09:37:30 UTC
Description of problem:
Set the builder timeout to some small value to make the slave dns propagate timeout. It will report build failed from client side immediately. But it will actually retry more times on the node.

Client should report failure only if all the retries failed.


Version-Release number of selected component (if applicable):
stage(devenv-stage_406)

How reproducible:
always

Steps to Reproduce:
1.Create app with jenkins embedded
2.Set the builder timeout to small value from jenkins console to make the timeout 
3.Git push
4.Check jenkins log 

Actual results:
Client will report failure, but actually the build started.

Expected results:
Client should report failure only if all the retries failed.

Additional info:
Client result:
remote: Executing Jenkins build.
remote: 
remote: You can track your build at https://jk1-bmengstg.stg.rhcloud.com/job/jbeap1-build
remote: 
remote: Waiting for build to schedule......................................................
remote: **BUILD FAILED/CANCELLED**
remote: Please see the Jenkins log for more details via 'rhc tail'
remote: !!!!!!!!
remote: Deployment Halted!
remote: If the build failed before the deploy step, your previous
remote: build is still running.  Otherwise, your application may be
remote: partially deployed or inaccessible.
remote: Fix the build and try again.
remote: !!!!!!!!
remote: An error occurred executing 'gear postreceive' (exit code: 1)
remote: Error message: Failed to execute: 'control post-receive' for /var/lib/openshift/51e50ce72587c8366f00011f/jenkins-client
remote: 
remote: For more details about the problem, try running the command again with the '--trace' option.
To ssh://51e50ce72587c8366f00011f.rhcloud.com/~/git/jbeap1.git/
   bd59a51..f8ed65d  master -> master


Jenkins log:
Jul 16, 2013 5:15:27 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Connecting to slave jbeap1bldr...
Jul 16, 2013 5:15:27 AM com.openshift.internal.client.RestService request
INFO: Requesting GET on https://stg.openshift.redhat.com/broker/rest/domains/bmengstg/applications/jbeap1bldr/gear_groups
Jul 16, 2013 5:15:27 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Established UUID = 51e50ef4dbd93c448a0002ee
Jul 16, 2013 5:15:32 AM hudson.plugins.openshift.OpenShiftSlave connect
WARNING: Slave DNS not propagated. Timing out.
Jul 16, 2013 5:15:32 AM hudson.plugins.openshift.OpenShiftCloud provision
WARNING: Caught java.io.IOException: Slave DNS not propagated. Timing out.. Will retry 4 more times before canceling build.
java.io.IOException: Slave DNS not propagated. Timing out.
	at hudson.plugins.openshift.OpenShiftSlave.connect(OpenShiftSlave.java:205)
	at hudson.plugins.openshift.OpenShiftSlave.provision(OpenShiftSlave.java:228)
	at hudson.plugins.openshift.OpenShiftCloud.provisionSlave(OpenShiftCloud.java:470)
	at hudson.plugins.openshift.OpenShiftCloud.provision(OpenShiftCloud.java:401)
	at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:264)
	at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:51)
	at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:347)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54)
	at java.util.TimerThread.mainLoop(Timer.java:555)
	at java.util.TimerThread.run(Timer.java:505)
Jul 16, 2013 5:15:37 AM hudson.plugins.openshift.OpenShiftCloud getSlaves
INFO: Didn't find existing slave for: jbeap1bldr
Jul 16, 2013 5:15:37 AM hudson.plugins.openshift.OpenShiftSlave <init>
INFO: Creating slave with 1mins time-to-live
Jul 16, 2013 5:15:37 AM hudson.plugins.openshift.OpenShiftComputer <init>
INFO: Creating Computer
Jul 16, 2013 5:15:37 AM hudson.plugins.openshift.OpenShiftComputerLauncher launch
INFO: Launching slave...
Jul 16, 2013 5:15:37 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Connecting to slave jbeap1bldr...
Jul 16, 2013 5:15:37 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Established UUID = 51e50ef4dbd93c448a0002ee
Jul 16, 2013 5:15:37 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Checking to see if slave DNS for jbeap1bldr-bmengstg.stg.rhcloud.com is resolvable ...
Jul 16, 2013 5:15:37 AM hudson.plugins.openshift.OpenShiftSlave connect
INFO: Slave DNS resolved - jbeap1bldr-bmengstg.stg.rhcloud.com/10.80.102.67

Comment 1 Jhon Honce 2013-10-25 18:48:00 UTC
The following command will cause the host to drop 30% of the packets to $DNS_PROVIDER. This can be useful for testing.

# iptables -A OUTPUT -m statistic --mode random --probability 0.3 -d $DNS_PROVIDER -j DROP

https://brewweb.devel.redhat.com/buildinfo?buildID=302084

Comment 2 Meng Bo 2013-10-28 08:13:39 UTC
Checked on devenv_3953, with package jenkins-plugin-openshift-0.6.23-0.el6oso.x86_64

1. Create php1 with jenkins client added
2. Pre-create the php1bldr to make the slave connection force fail
3. Check the jenkins log during build schedule

It will retry 5 times and cancel the build after all retries failed.

Move bug to verified.


Note You need to log in before you can comment on or make changes to this bug.