+++ This bug was initially created as a clone of Bug #809621 +++ Description of problem: If a working configserver becomes inactive, any attempts to launch a new application deployment will result in stuck deployments. The stuck deployments cannot be stopped or deleted. To prevent this, conductor should check to see if the configserver is operational, before launching. Version-Release number of selected component (if applicable): * aeolus-conductor-0.8.7-1.el6.src.rpm * aeolus-configure-2.5.2-1.el6.src.rpm * imagefactory-1.0.0rc11-1.el6.src.rpm * oz-0.8.0-5.el6.src.rpm * rubygem-aeolus-cli-0.3.1-1.el6.src.rpm * rubygem-aeolus-image-0.3.0-12.el6.src.rpm How reproducible: * 2 out of 2 attempts Steps to Reproduce: 1. Install and configure Aeolus conductor capable 2. Deploy and configure a working configserver 3. Update the cloud provider account information with valid configserver information 4. Make the configserver go away (block all traffic with iptables, or shut it down) 5. Attempt to launch an application that relies on configserver Actual results: The UI provides the following notifications: > Warnings > Failed to launch following component blueprints: > Errors > systemNo route to host - connect(2) * At this point, conductor shows a deployment in the 'new' state. It never leaves that state, and I cannot delete the application. Expected results: I'd expect to either ... 1) not be allow to deploy when the cfgserver is out of reach 2) or, be able to delete failed deployments that resulted from missing cfgserver Additional info: * See attached debug tarball --- Additional comment from whayutin on 2012-04-03 16:42:50 EDT --- related to https://bugzilla.redhat.com/show_bug.cgi?id=796528 possibly --- Additional comment from jprovazn on 2012-05-23 10:41:30 EDT --- I believe that patch for https://bugzilla.redhat.com/show_bug.cgi?id=796528 fixes this too. --- Additional comment from matt.wagner on 2012-05-23 15:12:58 EDT --- Confirmed -- the patch for #796528 does resolve this issue. With an unreachable config server, instances go directly to create_failed state. It's on master, but not backported anywhere yet. I'm setting this to "modified" to match that bug. --- Additional comment from matt.wagner on 2012-05-25 13:59:55 EDT --- The relevant commits on https://bugzilla.redhat.com/show_bug.cgi?id=796528 are: 7a8502b846a819c27fa141621220ca0bbaeac23c 56016671e651cf17bb0bc5c29b49c5aa55e94536 3dd5f304b8458528d15ddf462e0db7622b56dd09 86987cd9194c344272c0cfff313edcdf66df80c0 Though it sounds like QE isn't pleased with 796528 yet.
This bug is believed to have been resolved by the patch that was applied as part of bz #826130
Created attachment 591034 [details] down_cs [root@qeblade38 ~]# rpm -qa | grep "aeolus" aeolus-conductor-doc-0.8.27-1.el6_3.noarch aeolus-conductor-daemons-0.8.27-1.el6_3.noarch aeolus-conductor-0.8.27-1.el6_3.noarch rubygem-aeolus-image-0.3.0-12.el6.noarch aeolus-configure-2.5.7-1.el6_3.noarch aeolus-all-0.8.27-1.el6_3.noarch rubygem-aeolus-cli-0.3.3-1.el6_3.noarch Conductor return a create_failed state and error message when configserver got shutdown. Verified.