Bug 795794 - Instance came to running state in conductor using a de-registered AMI from EC2
Summary: Instance came to running state in conductor using a de-registered AMI from EC2
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: CloudForms Cloud Engine
Classification: Retired
Component: aeolus-conductor
Version: 1.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
Assignee: Matt Wagner
QA Contact: wes hayutin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-21 14:42 UTC by Rehana
Modified: 2014-08-17 22:27 UTC (History)
6 users (show)

Fixed In Version: v0.8.0-35
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)
EC2 AMI (112.90 KB, image/png)
2012-02-21 14:42 UTC, Rehana
no flags Details
Vanished (206.45 KB, image/png)
2012-02-21 14:43 UTC, Rehana
no flags Details
Running (219.12 KB, image/png)
2012-02-21 14:43 UTC, Rehana
no flags Details
Rails log (8.74 KB, text/plain)
2012-02-21 14:44 UTC, Rehana
no flags Details
DeltaCloud Log (9.36 KB, text/plain)
2012-02-21 14:44 UTC, Rehana
no flags Details
Error message (213.93 KB, image/png)
2012-02-23 10:31 UTC, Rehana
no flags Details

Description Rehana 2012-02-21 14:42:38 UTC
Created attachment 564700 [details]
EC2 AMI

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.Build and Push an image to EC2 AMI Id :ami-78974511
2.De-register the ami-78974511 (PFA:EC2 AMI.png)
3.From conductor, create a application blueprint using this image
4.Create a Application using that Deployable 

  
Actual results:
Got Error message on UI
    
Failed to launch following component blueprints:
Errors
404 : Resource not found

Now the VM as "Vanished"state(PFA:Vanished.png), later came to running state (PFA:Running.png)


Expected results:
when creating a deployable itself, it should have checked for An invalid/no AMI or else the VM should not come to running state if the AMI doesn't exists


Additional info:

rpm -qa | grep aeolus
aeolus-conductor-0.8.0-28.el6.noarch
aeolus-conductor-daemons-0.8.0-28.el6.noarch
aeolus-conductor-doc-0.8.0-28.el6.noarch
rubygem-aeolus-image-0.3.0-7.el6.noarch
aeolus-all-0.8.0-28.el6.noarch
rubygem-aeolus-cli-0.3.0-8.el6.noarch
aeolus-configure-2.5.0-14.el6.noarch


Attached Rail.log and Delta cloud log

Comment 1 Rehana 2012-02-21 14:43:07 UTC
Created attachment 564701 [details]
Vanished

Comment 2 Rehana 2012-02-21 14:43:38 UTC
Created attachment 564702 [details]
Running

Comment 3 Rehana 2012-02-21 14:44:09 UTC
Created attachment 564703 [details]
Rails log

Comment 4 Rehana 2012-02-21 14:44:43 UTC
Created attachment 564704 [details]
DeltaCloud Log

Comment 5 Matt Wagner 2012-02-21 16:16:45 UTC
I have confirmed that it's not possible to _import_ a deregistered AMI, but I imagine that nothing good will happen if you import an AMI and then deregister it. Working on that handling now.

Comment 6 Matt Wagner 2012-02-21 18:24:55 UTC
I think there are two separate bugs here.

One point of clarification, because it becomes very relevant, is that the *instance* does not go to running state. It goes into create_failed, and then, on the next dbomatic poll, goes to "vanished" because it was never actually created. On the next iteration, it is deleted. The *deployment* erroneously ends up in running, and that should be tracked separately. (I have done so in #795891.)

I would argue that what we do here is basically correct. If you import an AMI, build a deployable out of it, deregister the AMI, and attempt to launch the deployable, you get an exception that the AMI could not be found. I think that is proper, but we just need to actually catch the exception and display a sensible error that doesn't include a dump of the entire request.

The second bug is that, if an instance goes into state :vanished or is deleted because it is vanished, we never actually touch the deployment. The deployment foolishly ends up in "running" state even though it has no instances. That's an, erm, mess. (Who wrote that code?) I will track that one separately. I have assigned that https://bugzilla.redhat.com/show_bug.cgi?id=795891 (#795891), also linked to a couple paragraphs above.

Ergo, the summary for what I am fixing in _this_ bug is this: If you attempt to launch a deployable built from an imported image which has since been deleted on the provider, you should see a _sensible_ error message.

Comment 7 Matt Wagner 2012-02-21 18:26:53 UTC
Sorry, I modified my comment as I was writing and didn't proofread for flow. "I would argue that what we do here is basically correct" is a radical topic shift referring to displaying an exception if the image you are launching is missing, NOT our practice of orphaning deployments.

Comment 8 Matt Wagner 2012-02-21 21:10:19 UTC
I sent out a series of patches to the list: http://lists.fedorahosted.org/pipermail/aeolus-devel/2012-February/009068.html

I ended up combining this with 795891 as well in the patchset. They're distinct but related issues.

Comment 9 Matt Wagner 2012-02-22 18:00:11 UTC
Pushed to master a trio of patches to address these issues:

commit d387511bb708856e4a1ac61f9d2fefec51903bfe
Author: Matt Wagner <matt.wagner>
Date:   Tue Feb 21 15:40:23 2012 -0500

    BZ #795794 - Don't delete :vanished instances
    
    If an instance goes into state 'vanished', we should leave it there,
    not delete it. The deletion was meant as an interim solution until
    we could handle things more gracefully, but it ended up causing more
    problems than it avoided.
    
    Related to https://bugzilla.redhat.com/show_bug.cgi?id=795794

commit 3154f1eee7f7c9f0393b6edd697ae3a9953df495
Author: Matt Wagner <matt.wagner>
Date:   Tue Feb 21 15:24:00 2012 -0500

    BZ #795891 - Deployment should not show "running" with no instances
    
    Resolves https://bugzilla.redhat.com/show_bug.cgi?id=795891

commit a07dc025cc7bf0be1e6453862d54999c4d541121
Author: Matt Wagner <matt.wagner>
Date:   Tue Feb 21 14:48:06 2012 -0500

    BZ #795794 - Display a saner error message if launch fails.
    
    The messages after the first line appear to be more of a dump of the
    request than helpful error text, so only relay the first line to the
    user -- this is the one that describes the problem.
    
    Resolves https://bugzilla.redhat.com/show_bug.cgi?id=795794

Comment 10 Rehana 2012-02-23 10:30:16 UTC
Verified the below scenarios,
1) When used a De-registered AMI to create an instance,the instance status changed to create_failed and then to vanished

2)Verified the application was showing the status "stopped" 

3)Observed that the error message can be modified further.

Currently showing this error message (PFA: error message.png)

" Failed to launch following component blueprints:
 500 : Unhandled exception or status code (InvalidAMIID.NotFound: The image id 'ami-52ac7e3b' does not exist"

Expected:

The word like "Unhadled exception or status code" can be removed.

Tested on:
[root@ibm-ls21-04 test]# rpm -qa | grep aeolus
aeolus-conductor-daemons-0.8.0-35.el6.noarch
aeolus-conductor-0.8.0-35.el6.noarch
aeolus-configure-2.5.0-15.el6.noarch
aeolus-conductor-doc-0.8.0-35.el6.noarch
rubygem-aeolus-cli-0.3.0-10.el6.noarch
aeolus-all-0.8.0-35.el6.noarch
rubygem-aeolus-image-0.3.0-9.el6.noarch

Comment 11 Rehana 2012-02-23 10:31:01 UTC
Created attachment 565252 [details]
Error message

Comment 12 wes hayutin 2012-02-24 04:09:31 UTC
good enough for v1.. moving to verified


Rehana, if you have any questions regarding this bug.. please let know
Nice bug btw :)


Note You need to log in before you can comment on or make changes to this bug.