Bug 791195 - better rhevm instance deletion
Summary: better rhevm instance deletion
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: CloudForms Cloud Engine
Classification: Retired
Component: aeolus-conductor
Version: 1.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
Assignee: Matt Wagner
QA Contact: Dave Johnson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-16 13:02 UTC by Jan Provaznik
Modified: 2014-08-17 22:27 UTC (History)
8 users (show)

Fixed In Version: aeolus-conductor-0.8.0-33
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-08-30 17:15:35 UTC


Attachments (Terms of Use)

Description Jan Provaznik 2012-02-16 13:02:44 UTC
When a rhevm instance is stopped, we try to destroy the instance on provider side - this is done by app/models/instance.rb:destroy_on_provider method.

There is a 'retry' loop in this method which tries to repeatedly destroy the instance if an exception occurs, the exception is raised quite often because app/util/taskomatic.rb:self.destroy_instance method tries to update a task which may not exist.

1. app/util/taskomatic.rb:self.destroy_instance method should try to update a task in ensure block only if it exists
2. if step 1) is done, the rescue+retry block in destroy_on_provider method shouldn't be needed at all

3. actually this method shouldn't be there at all, destroy action should be hooked on instance's after_update callback.

I'm not sure if there is are reproduction steps for QA by which they can make sure this is fixed. Also I wonder if the instance destroy on provider side currently works for RHEVM?

Comment 1 Matt Wagner 2012-02-17 21:35:38 UTC
I tinkered with refactoring, but the actual debugging portion here is a bit more disastrous.

You indeed cannot destroy an instance without stopping it first.

When you stop an instance, the state goes to "PENDING". After it finally stops, it's in "STOPPED".

Oddly, when you then send "destroy!", the state goes back to "PENDING" before disappearing.

Incidentally, if you attempt to fetch an image that has been destroyed, we raise an exception -- just something to be aware of since I wasn't expecting that.

>> c.instance('571b5ae2-6152-4c1e-b5c0-7a8af43a3bb5')
DeltaCloud::API::BackendError: DeltaCloud::API::BackendError
	from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:403:in `handle_backend_error'
	from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:371:in `request'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:218:in `call'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:218:in `process_result'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:169:in `transmit'
	from /usr/lib/ruby/1.8/net/http.rb:543:in `start'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:166:in `transmit'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:60:in `execute'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:31:in `execute'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/resource.rb:54:in `get'
	from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:370:in `send'
	from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:370:in `request'
	from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:142:in `instance'
	from (irb):66

Comment 2 Matt Wagner 2012-02-17 21:40:08 UTC
The above shouldn't be a problem based on this before_destroy filter:

  def destroyable?
    (state == STATE_CREATE_FAILED) or (state == STATE_STOPPED and not restartable?) or (state == STATE_VANISHED)
  end

If an image is in STATE_CREATE_FAILED or STATE_VANISHED, destroy_on_provider is a no-op, so we should only be issuing a destroy! call to Deltacloud if the instance is already stopped -- at least, in our database.

Comment 3 Matt Wagner 2012-02-17 22:09:14 UTC
Potentially related:

Deltacloud JIRA - Incorrect state when stopping an instance:
https://issues.apache.org/jira/browse/DTACLOUD-148

aeolus-devel thread that I wanted to post here but BZ was down:
http://lists.fedorahosted.org/pipermail/aeolus-devel/2012-February/008925.html

Comment 4 Matt Wagner 2012-02-17 22:10:38 UTC
First-draft style/RFC patch on list: http://lists.fedorahosted.org/pipermail/aeolus-devel/2012-February/008972.html

That is solely cleanup.

Comment 6 Matt Wagner 2012-02-21 17:56:53 UTC
Pushed to master:

commit c6aaf494ed78277569a314a24b033ab05fd9d1d7
Author: Matt Wagner <matt.wagner>
Date:   Fri Feb 17 16:58:18 2012 -0500

    BZ 791195 - Clean up destroy_on_provider code
    
    This cleans up some crazy code that obsessively retries to destroy
    instances on the provider after they are removed in Conductor.
    
    Resolves https://bugzilla.redhat.com/show_bug.cgi?id=791195



Cherry-picked onto 0.9-maint as well:

commit 34c0ba2280d0773f9c523f0bad16ccffa453b693
Author: Matt Wagner <matt.wagner>
Date:   Fri Feb 17 16:58:18 2012 -0500

    BZ 791195 - Clean up destroy_on_provider code
    
    This cleans up some crazy code that obsessively retries to destroy
    instances on the provider after they are removed in Conductor.
    
    Resolves https://bugzilla.redhat.com/show_bug.cgi?id=791195

Comment 7 Steve Linabery 2012-02-22 03:52:29 UTC
34c0ba2 in latest build

Comment 8 Dave Johnson 2012-03-05 14:55:55 UTC
good 2 go, right before this landed I noticed deleting 30+ rhevm instances produced a bunch of stack traces and I no longer see those now, it looks good 2 go in aeolus-conductor-0.8.0-39.el6.noarch


Note You need to log in before you can comment on or make changes to this bug.