Description of problem: It appears the application/gear creation when a build slave is needed is doomed to fail, as the application does not have the needed bootstrap slave.jar based on the ENV variables of the application. (more info below) Attempting to connect slave... Transferring slave.jar file... ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins java.io.IOException: Download of slave.jar failed. Return code = 4 at hudson.plugins.openshift.OpenShiftComputerLauncher.launch(OpenShiftComputerLauncher.java:97) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:222) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Version-Release number of selected component (if applicable): Latest openshift enterprise 30 day trial How reproducible: Always Steps to Reproduce... The below is my log of debugging which I was sending to a colleague, it shows the issue, what we tried to fix/hack/understand more about the problem. Hopefully it will provide all info needed. the file exists at; https://jenkins1-mj.poc.matt-j.co.uk/jnlpJars/slave.jar so that env variable for the slave SHOULD be the hostname of the 'master' jenkins instace it's attached to? I'd think. Build slave variables as follows [MJCITestbldr-mj.poc.matt-j.co.uk ~]\> export declare -x HISTFILE="/var/lib/openshift/945379cddf004aaf91d6993437a9bcb4/app-root/data/.bash_history" declare -x HOME="/var/lib/openshift/945379cddf004aaf91d6993437a9bcb4" declare -x JENKINS_PASSWORD="mVrHmRGXTqkb" declare -x JENKINS_URL="https://jenkins1-mj.poc.matt-j.co.uk/" declare -x JENKINS_USERNAME="system_builder" declare -x LANG="en_GB.UTF-8" declare -x LOGNAME="945379cddf004aaf91d6993437a9bcb4" declare -x MAIL="/var/mail/945379cddf004aaf91d6993437a9bcb4" declare -x OLDPWD declare -x OPENSHIFT_APP_DNS="MJCITestbldr-mj.poc.matt-j.co.uk" declare -x OPENSHIFT_APP_NAME="MJCITestbldr" declare -x OPENSHIFT_APP_UUID="945379cddf004aaf91d6993437a9bcb4" declare -x OPENSHIFT_DATA_DIR="/var/lib/openshift/945379cddf004aaf91d6993437a9bcb4/app-root/data/" declare -x OPENSHIFT_GEAR_DNS="MJCITestbldr-mj.poc.matt-j.co.uk" declare -x OPENSHIFT_GEAR_NAME="MJCITestbldr" declare -x OPENSHIFT_GEAR_UUID="945379cddf004aaf91d6993437a9bcb4" declare -x OPENSHIFT_HOMEDIR="/var/lib/openshift/945379cddf004aaf91d6993437a9bcb4/" declare -x OPENSHIFT_INTERNAL_IP="127.0.253.1" declare -x OPENSHIFT_INTERNAL_PORT="8080" declare -x OPENSHIFT_PHP_IP="127.0.253.1" declare -x OPENSHIFT_PHP_LOG_DIR="/var/lib/openshift/945379cddf004aaf91d6993437a9bcb4/php-5.3/logs/" declare -x OPENSHIFT_PHP_PORT="8080" declare -x OPENSHIFT_REPO_DIR="/var/lib/openshift/945379cddf004aaf91d6993437a9bcb4/app-root/runtime/repo/" declare -x OPENSHIFT_TMP_DIR="/tmp/" declare -x PATH="/usr/libexec/openshift/cartridges/php-5.3/info/bin/:/usr/libexec/openshift/cartridges/abstract-httpd/info/bin/:/usr/libexec/openshift/cartridges/abstract/info/bin/:/sbin:/usr/sbin:/bin:/usr/bin" declare -x PS1="[MJCITestbldr-mj.poc.matt-j.co.uk \\W]\\> " declare -x PWD="/var/lib/openshift/945379cddf004aaf91d6993437a9bcb4" declare -x SELINUX_LEVEL_REQUESTED="" declare -x SELINUX_ROLE_REQUESTED="" declare -x SELINUX_USE_CURRENT_RANGE="" declare -x SHELL="/bin/bash" declare -x SHLVL="1" declare -x SSH_CLIENT="128.107.239.233 16345 22" declare -x SSH_CONNECTION="128.107.239.233 16345 172.31.3.71 22" declare -x SSH_TTY="/dev/pts/0" declare -x TERM="xterm" declare -x TMOUT="300" declare -x USER="945379cddf004aaf91d6993437a9bcb4" Which means the slave is trying to pull from https://mjcitestbldr-mj.poc.matt-j.co.uk/jnlpJars/slave.jar instead not from https://jenkins1-mj.poc.matt-j.co.uk/jnlpJars/slave.jar Which DOESNT appear to have the file; Service Temporarily Unavailable The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later. Maybe it SHOULD work... but the builder app appears stopped [MJCITestbldr-mj.poc.matt-j.co.uk ~]\> ctl_all restart httpd (no pid file) not running Done Application is explicitly stopped! Use 'rhc app start -a MJCITestbldr' to start back up. Done but if it cant start without getting the file from itself then it's never going to start???? But the builder application goes away dynamically it seems after a while. Clicking 'build now' in Jenkins forks a new instance with the same '<appname>bldr' instance name... Your client tools are now configured. matjohn2-mac:~ matt$ rhc app start -a MJCITestbldr Password: ********* Application 'MJCITestbldr' not found So i kicked off a 'build now' in jenkins which made the MJCITestbldr application come back again, 503 service unavailable because it didnt start properly (because it couldnt get the jar?) so i then manually started from rh tools, just like before it auto dissapeared matjohn2-mac:~ matt$ rhc app start -a MJCITestbldr Password: ********* RESULT: MJCITestbldr started ....and now the app is up (from a HTTP perspective) https://mjcitestbldr-mj.poc.matt-j.co.uk/jnlpJars/slave.jar Not Found The requested URL /jnlpJars/slave.jar was not found on this server. From the Log: Apache/2.2.15 (Red Hat) Server at mjcitestbldr-mj.poc.matt-j.co.uk Port 80 But the file doesnt exist [Wed Jun 26 15:00:24 2013] [notice] Digest: done [Wed Jun 26 15:00:24 2013] [notice] Apache/2.2.15 (Unix) configured -- resuming normal operations [Wed Jun 26 15:01:52 2013] [notice] SELinux policy enabled; httpd running as context system_u:system_r:openshift_t:s0:c0,c506 [Wed Jun 26 15:01:52 2013] [notice] mod_bw : Memory Allocated 32 bytes (each conf takes 32 bytes) [Wed Jun 26 15:01:52 2013] [notice] mod_bw : Version 0.8 - Initialized [1 Confs] [Wed Jun 26 15:01:52 2013] [notice] Digest: generating secret for digest authentication ... [Wed Jun 26 15:01:52 2013] [notice] Digest: done [Wed Jun 26 15:01:53 2013] [notice] Apache/2.2.15 (Unix) configured -- resuming normal operations [Wed Jun 26 15:02:04 2013] [error] [client 127.0.253.1] File does not exist: /var/lib/openshift/689bf7ed2b854a789b187ae98505fe35/app-root/runtime/repo/php/jnlpJars [Wed Jun 26 15:02:07 2013] [error] [client 127.0.253.1] File does not exist: /var/lib/openshift/689bf7ed2b854a789b187ae98505fe35/app-root/runtime/repo/php/jnlpJars [Wed Jun 26 15:03:59 2013] [error] [client 127.0.253.1] File does not exist: /var/lib/openshift/689bf7ed2b854a789b187ae98505fe35/app-root/runtime/repo/php/jnlpJars [Wed Jun 26 15:04:00 2013] [error] [client 127.0.253.1] File does not exist: /var/lib/openshift/689bf7ed2b854a789b187ae98505fe35/app-root/runtime/repo/php/jnlpJars So the slave.jar should be on the MJCITestbldr app, but isnt. Lets test by putting it there; Tried manually SCPing the jar to the slave jenkins node to see if that does actually work the gear dissapeared again so tried to rebuild on jenkins again, hence the ID's for the SCP below have changed; matjohn2-mac:~ matt$ mkdir jnlpJars matjohn2-mac:~ matt$ mv Downloads/slave.jar jnlpJars/. matjohn2-mac:~ matt$ scp -r jnlpJars 86e98aadd6d44930b2e972b2e36c89cd.matt-j.co.uk:/var/lib/openshift/86e98aadd6d44930b2e972b2e36c89cd/app-root/runtime/repo/php/. slave.jar 100% 278KB 277.5KB/s 00:00 start the app again matjohn2-mac:~ matt$ rhc app start -a MJCITestbldr Password: ********* RESULT: MJCITestbldr started https://mjcitestbldr-mj.poc.matt-j.co.uk/jnlpJars/slave.jar This request now works... so lets try building in Jenkins again WTF... Still fails Attempting to connect slave... Transferring slave.jar file... ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins java.io.IOException: Download of slave.jar failed. Return code = 4 at hudson.plugins.openshift.OpenShiftComputerLauncher.launch(OpenShiftComputerLauncher.java:97) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:222) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) but the file IS on the box. Finding the code for this plugin in GITHUB to check what it should be doing; https://github.com/openshift/jenkins-cloud-plugin/blob/master/src/main/java/hudson/plugins/openshift/OpenShiftComputerLauncher.java source code looks like it should be resolving, can you double check my understanding of this section; // A command for the initial slave setup StringBuilder execCommand = new StringBuilder(); execCommand.append("mkdir -p $OPENSHIFT_DATA_DIR/jenkins") .append(" && cd $OPENSHIFT_DATA_DIR/jenkins") .append(" && rm -f slave.jar").append(" && wget -q --no-check-certificate https://") .append(getGearDNS(hostName)) .append("/jnlpJars/slave.jar"); looks like it WILL result in https://mjcitestbldr-mj.poc.matt-j.co.uk/jnlpJars/slave.jar which works from a browser now i've SCP'ed it across. Definitley isnt downloading, as nothing appears on node in; ls /var/lib/openshift/86e98aadd6d44930b2e972b2e36c89cd/app-root/data/jenkins/ I wonder if it's DNS Resolution of the jenkins app (not quick enough etc?) hacking the ENV to an IP address doesn't work as it the node needs the DNS name to know which app to route the request too (node has many gears); OPENSHIFT_GEAR_DNS=54.229.72.85 So thats a no go, try adding to hosts to rule out DNS; [root@node ~]# wget --no-check-certificate https://mjcitestbldr-mj.poc.matt-j.co.uk/jnlpJars/slave.jar --2013-06-26 15:44:42-- https://mjcitestbldr-mj.poc.matt-j.co.uk/jnlpJars/slave.jar Resolving mjcitestbldr-mj.poc.matt-j.co.uk... 54.229.72.85 Connecting to mjcitestbldr-mj.poc.matt-j.co.uk|54.229.72.85|:443... connected. WARNING: cannot verify mjcitestbldr-mj.poc.matt-j.co.uk’s certificate, issued by “/C=--/ST=SomeState/L=SomeCity/O=SomeOrganization/OU=SomeOrganizationalUnit/CN=node.poc.matt-j.co.uk/emailAddress=root.matt-j.co.uk”: Self-signed certificate encountered. WARNING: certificate common name “node.poc.matt-j.co.uk” doesn’t match requested host name “mjcitestbldr-mj.poc.matt-j.co.uk”. HTTP request sent, awaiting response... 200 OK Length: 284160 (278K) [application/x-java-archive] Saving to: “slave.jar” 100%[==============================================================================================================================================================================>] 284,160 485K/s in 0.6s 2013-06-26 15:44:42 (485 KB/s) - “slave.jar” saved [284160/284160] So DNS now 100% sure resolves for WGET on the node running our gears, lets see if we have success; Attempting to connect slave... Transferring slave.jar file... ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins java.io.IOException: Download of slave.jar failed. Return code = 4 at hudson.plugins.openshift.OpenShiftComputerLauncher.launch(OpenShiftComputerLauncher.java:97) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:222) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) OK Now i'm confused, i'll log a RH bug for now, as everything that is needed is there! that i can see and it Still isnt succeeding. Status Delete Slave Configure Build History Load Statistics Script Console Log System Information Build Executor Status # Status Attempting to connect slave... Transferring slave.jar file... ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins java.io.IOException: Download of slave.jar failed. Return code = 4 at hudson.plugins.openshift.OpenShiftComputerLauncher.launch(OpenShiftComputerLauncher.java:97) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:222) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Actual results: Jenkins fails to start worker Expected results: Build
The following is a work around and not a solution but knowing that it works may help you debug. In order to work around this at the moment we have rebuilt the plugin. The changes made have been to OpenShiftComputerLauncher.java https://github.com/openshift/jenkins-cloud-plugin/blob/master/src/main/java/hudson/plugins/openshift/OpenShiftComputerLauncher.java We've switched out the generate URL for slave.jar to a hard-coded URL we know to be good. .append(" && rm -f slave.jar").append(" && wget -q --no-check-certificate https://") .append(getGearDNS(hostName)) .append("/jnlpJars/slave.jar"); Ends up as .append(" && rm -f slave.jar").append(" && wget -q --no-check-certificate ") .append("https://dl.dropboxusercontent.com/u/214102/jars/slave.jar"); This is allowing us to get slaves up and running. Jenkins is now building the apps correctly. Thanks.
This is likely resolved with OSE 1.2 (coming soon). Not sure when we'll get a chance to look at this.
I'm wondering if this is related to Bug #984608. If you are willing to test we could give you the updated jenkins-plugin-openshift that we're shipping in 1.2.2.
This is almost certainly related to Bug #984608 that has been fixed in 1.2. I'm marking this as CURRENTRELEASE. If the reporter is willing to debug further I can be available to debug their environment.