Bug 811667
Summary: | jenkins builder app not cleaned up on node creation failure | ||
---|---|---|---|
Product: | OKD | Reporter: | Bill DeCoste <wdecoste> |
Component: | Containers | Assignee: | Dan Mace <dmace> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | libra bugs <libra-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 2.x | CC: | abhgupta, bmeng, dmace, dmcphers, jhonce, jhou, rmillner, szhang, wsun, xjia |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | devenv_2826+ | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-03-15 14:13:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Bill DeCoste
2012-04-11 16:47:25 UTC
Do you have a method for reproducing the failure? When the node creation fails (e.g. from a DNS timeout) then the jenkins node is not created but the Shift builder is created. Since the 15 min lifecycle is controlled by the node the builder lasts forever taking up a gear. The workaround is to rebuild and hope DNS resolves or manually delete the builder. The fix is to delete the Shift builder application via the java client from the jenkins plugin in the event of a DNS timeout. Steps to recreate: 1) Create a jenkins app and a jboss app w/enabled jenkins 2) Go into the configuration of the jboss job and change the timeout from 300000 to something small like 3000 3) Kick off a build. The builder will be created, DNS will timeout, and no node will be created Expected behavior: On DNS timeout the Shift builder is deleted and doesn't hang around taking up a gear indefinitely Assigning this back to Bill based on my conversation with him to fix in the jenkins plugin. Please re-test with latest devenvs. There have been many improvements to the Jenkins plugin's failure handling which might resolve this. Tested on devenv_2816 openshift-origin-cartridge-jenkins-client-1.4-1.4.2-1.git.82.1888878.el6.noarch jenkins-plugin-openshift-0.6.14-0.el6_3.x86_64 openshift-origin-cartridge-jenkins-1.4-1.5.2-1.git.79.1888878.el6.noarch The previous problem is gone, but when building jbossas application without changing configurations, they will fail. The errors logged in jenkins console indicate there is a load error for open4 : Started by user Jenkins System Builder Building remotely on as1bldr in workspace jbossas-7/ci/jenkins/workspace/as1-build Checkout:as1-build / jbossas-7/ci/jenkins/workspace/as1-build - hudson.remoting.Channel@6d8c1:as1bldr Using strategy: Default Checkout:as1-build / jbossas-7/ci/jenkins/workspace/as1-build - hudson.remoting.LocalChannel@1e2b3c Cloning the remote Git repository Cloning repository origin Fetching upstream changes from ssh://511f424f7e88cea7c2000129.rhcloud.com/~/git/as1.git/ Seen branch in repository origin/HEAD Seen branch in repository origin/master Commencing build of Revision ec968694256a345985edaaf8abd225555661f5a8 (origin/HEAD, origin/master) Checking out Revision ec968694256a345985edaaf8abd225555661f5a8 (origin/HEAD, origin/master) No change to record in branch origin/HEAD No change to record in branch origin/master [as1-build] $ /bin/sh -xe /tmp/hudson454816972224339925.sh + source /usr/libexec/openshift/cartridges/abstract/info/lib/jenkins_util + jenkins_rsync '511f424f7e88cea7c2000129.rhcloud.com:~/.m2/' /var/lib/openshift/511f42fe7e88cea7c200016f/.m2/ + rsync --delete-after -az -e /usr/libexec/openshift/cartridges/jenkins-1.4/info/bin/git_ssh_wrapper.sh '511f424f7e88cea7c2000129.rhcloud.com:~/.m2/' /var/lib/openshift/511f42fe7e88cea7c200016f/.m2/ + . ci_build.sh ++ set +x Running .openshift/action_hooks/pre_build /usr/bin/oo-cgroup-read:7:in `require': no such file to load -- open4 (LoadError) from /usr/bin/oo-cgroup-read:7 Build step 'Execute shell' marked build as failure Archiving artifacts Finished: FAILURE [root@ip-10-195-191-40 ~]# oo-cgroup-read /usr/bin/oo-cgroup-read:7:in `require': no such file to load -- open4 (LoadError) from /usr/bin/oo-cgroup-read:7 You have new mail in /var/spool/mail/root [root@ip-10-195-191-40 ~]# gem list open4 *** LOCAL GEMS *** open4 (1.3.0) Because "oo-cgroup-read" is used when snapshot the app, so the function about snapshot totally can't work. So change the Severity to "Medium". Considering the importance of oo-cgroup-read, I have filed bug 912215 to keep track. Please set this bug to ON_QA and I will verify this bug when the open4 issue is fixed, thanks! Bug 912215 is resolved in devenv_2826; updating this one to ON_QA. Bug 912215 still got an SELinux issue, waiting for its fix to verify this bug. Bug 912215 still got an SELinux issue, waiting for its fix to verify this bug. Move bug to Verified, since the bug 912215 has been fixed. Checked on devenv_2894, 1.Create jboss w/ jenkins enabled. 2.Modify the builder timeout to small value. 3.Push build. 4.Check the jenkins log. Build will fail since the low timeout. Jenkins log as below: Mar 04, 2013 10:13:56 PM hudson.plugins.openshift.OpenShiftCloud cancelItem WARNING: Build app1-build app1bldr has been canceled Mar 04, 2013 10:13:56 PM hudson.plugins.openshift.OpenShiftCloud cancelItem INFO: Cancelling Item Mar 04, 2013 10:13:56 PM hudson.plugins.openshift.OpenShiftCloud provision INFO: Provisioned 0 new nodes Mar 04, 2013 10:13:46 PM hudson.plugins.openshift.OpenShiftSlave _terminate INFO: Terminating OpenShift application... Mar 04, 2013 10:13:46 PM hudson.plugins.openshift.OpenShiftSlave _terminate INFO: Terminating slave app1bldr (uuid: 51356197a11aca9e1a000091) Mar 04, 2013 10:13:46 PM hudson.plugins.openshift.OpenShiftCloud provisionSlave INFO: Slave exists without corresponding builder. Deleting slave Mar 04, 2013 10:13:46 PM hudson.plugins.openshift.OpenShiftCloud builderExists INFO: Found an existing builder. Not provisioning... Mar 04, 2013 10:13:46 PM hudson.plugins.openshift.OpenShiftCloud builderExists INFO: Capacity remaining - checking for existing type... Mar 04, 2013 10:13:46 PM hudson.plugins.openshift.OpenShiftCloud getSlave INFO: slaveExists app1bldr app1bldr Mar 04, 2013 10:13:46 PM hudson.plugins.openshift.OpenShiftCloud getSlaves INFO: Found existing slave for: app1bldr Mar 04, 2013 10:13:45 PM hudson.plugins.openshift.OpenShiftCloud getOpenShiftConnection INFO: Initiating Java Client Service - Configured for OpenShift Server https://localhost Mar 04, 2013 10:13:45 PM hudson.plugins.openshift.OpenShiftCloud provision INFO: Provisioning new node for workload = 2 and label = app1-build Mar 04, 2013 10:13:45 PM hudson.slaves.NodeProvisioner update INFO: app1-build provisioning successfully completed. We have now 1 computer(s) After build failed. Check /var/lib/openshift/, there is no builder files left. |