Bug 786844 - failing to report deleted instances
Summary: failing to report deleted instances
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: CloudForms Cloud Engine
Classification: Retired
Component: aeolus-conductor
Version: 1.0.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: beta6
Assignee: Matt Wagner
QA Contact: wes hayutin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-02 14:45 UTC by Dave Johnson
Modified: 2014-08-17 22:27 UTC (History)
7 users (show)

Fixed In Version: v0.8.0-35
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-05-15 22:26:06 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2012:0583 0 normal SHIPPED_LIVE new packages: aeolus-conductor 2012-05-15 22:31:59 UTC

Description Dave Johnson 2012-02-02 14:45:47 UTC
Description of problem:
==================================

Fired up a application(deployment) of a single image on both rhevm and vsphere.  I noticed that if after letting it come online, I could stop the vm on the vendor console and conductor would pick up the transition change and update the conductor ui accordingly.

However, if the vm is running and I stop and delete it, the conductor ui keeps the deployment as running (which is inaccurate).  We probably should create a error or something allowing the conductor user to know that that vm can no longer be found.


Version-Release number of selected component (if applicable):
=================================================================

aeolus-all-0.8.0-17.el6.noarch
aeolus-conductor-0.8.0-17.el6.noarch
aeolus-conductor-daemons-0.8.0-17.el6.noarch
aeolus-conductor-doc-0.8.0-17.el6.noarch
aeolus-configure-2.5.0-11.el6.noarch
deltacloud-core-0.5.0-4.rc1.el6.noarch
deltacloud-core-ec2-0.5.0-4.rc1.el6.noarch
deltacloud-core-rhevm-0.5.0-4.rc1.el6.noarch
deltacloud-core-vsphere-0.5.0-4.rc1.el6.noarch
rubygem-aeolus-cli-0.3.0-7.el6.noarch
rubygem-aeolus-image-0.3.0-7.el6.noarch
rubygem-deltacloud-client-0.5.0-1.rc2.el6.noarch

Comment 1 wes hayutin 2012-02-02 15:39:46 UTC
dave does it update if you refresh the page?https://bugzilla.redhat.com/show_bug.cgi?id=786589

Comment 2 Dave Johnson 2012-02-08 16:30:39 UTC
No, if you stopped/deleted the instance on the vendor console, conductor continued to report the instance as running while incrementing the running timer.

Comment 3 Matt Wagner 2012-02-08 22:35:46 UTC
It's likely been like this from the beginning of time. What happens is that dbomatic polls the provider periodically, retrieves the list of instances, iterates over them, and updates the status of each.

If, between runs of that, an instances goes from running (or any state, really) to being removed, it just won't come back in the list of instances, which means that we will never process it.

The apparent fix here is to, at the end of each poll, delete (?) all the instances that we know about in our database but that did not come back in the instance list. However, that feels like an intuitively dangerous idea.

Comment 4 Matt Wagner 2012-02-08 22:56:41 UTC
What has me particularly worried is the case in which an instance disappears on a cloud provider due to something other than the user explicitly stopping and removing it -- i.e., some sort of catastrophic failure. It would be pretty baffling if the instances suddenly vanished from the Conductor database.

Another option would be to add a 'vanished' instance state, or something to that effect. But it's not clear to me what we should do when that occurs. It wouldn't be too much work to set state == 'vanished' in this case (I think), but I'm not sure how much it really buys us without a good plan to handle this in the UI, too.

Comment 6 wes hayutin 2012-02-10 22:03:04 UTC
woot.. works in

[root@qeblade32 yum.repos.d]# rpm -qa | grep aeolus
aeolus-conductor-daemons-0.8.0-25.el6.noarch
aeolus-conductor-doc-0.8.0-25.el6.noarch
aeolus-configure-2.5.0-12.el6.noarch
rubygem-aeolus-image-0.3.0-7.el6.noarch
aeolus-conductor-0.8.0-25.el6.noarch
rubygem-aeolus-cli-0.3.0-8.el6.noarch
aeolus-all-0.8.0-25.el6.noarch
[root@qeblade32 yum.repos.d]#

Comment 7 wes hayutin 2012-02-10 22:04:42 UTC
crud.. how did that happen wrong bug moving back to post

Comment 8 Matt Wagner 2012-02-13 20:09:21 UTC
Resent. Thread is here: http://lists.fedorahosted.org/pipermail/aeolus-devel/2012-February/008810.html

Comment 9 Matt Wagner 2012-02-14 18:16:15 UTC
Pushed to 0.9-maint branch on GitHub, though it sounds like I narrowly missed the window to get it into the latest tag:

commit 7a022c013c7d1bbb1783a873045fcb3c097c7186
Author: Matt Wagner <matt.wagner>
Date:   Mon Feb 13 14:06:48 2012 -0500

    BZ 786844 - Fix logic flaw in destroy_on_provider
    
    (foo != "x" or foo != "y") will always return true. While fixing,
    I also cleaned up the syntax to make it less prone to such errors,
    and skipped deletion if we were in STATE_VANISHED.

commit edeb5b7d94f77eb4eec911f0935f441715994dfa
Author: Matt Wagner <matt.wagner>
Date:   Fri Feb 10 11:43:00 2012 -0500

    BZ 786844 - Reporting of deleted instances
    
    If an instance goes missing on the backend provider, we no longer ignore it.
    Instead, if it's missing in two consecutive checks, we should remove it.
    
    This is an interim solution; there is a discussion about a long-term fix here:
    http://lists.fedorahosted.org/pipermail/aeolus-devel/2012-February/008735.html
    
    Resolves https://bugzilla.redhat.com/show_bug.cgi?id=786844

Comment 10 Steve Linabery 2012-02-15 19:08:50 UTC
7a022c0  edeb5b7 in aeolus-conductor-0.8.0-28

Comment 11 Dave Johnson 2012-02-21 23:23:40 UTC
Hmm, thinking some more work is needed here???  Reopening for discussion... seen this with vsphere instance...


So a machine vanishes, the monitor tab scorecard has a "green" status which I believe is wrong.  Should be yellow if not red.

Also, clicking on scorecard link, there is no instances under it, even when selecting list view -> all instances.  My opinion is we need to display it with its 'vanished' state.

Comment 13 Matt Wagner 2012-02-22 16:20:35 UTC
dajo -- I completely agree. http://lists.fedorahosted.org/pipermail/aeolus-devel/2012-February/009068.html addresses both of these issues, meant to resolve #795794 and #795891. I'm going to move this over to POST. thrcka is reviewing the patches now.

Comment 14 Matt Wagner 2012-02-22 18:00:54 UTC
Dave -- pushed 3 patches for these issues. See https://bugzilla.redhat.com/show_bug.cgi?id=795794 for the commit hashes.

Comment 15 Dave Johnson 2012-02-24 03:14:47 UTC
good 2 go in aeolus-conductor-0.8.0-35.el6.noarch

Comment 16 errata-xmlrpc 2012-05-15 22:26:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0583.html


Note You need to log in before you can comment on or make changes to this bug.