Bug 791195

Summary: better rhevm instance deletion
Product: [Retired] CloudForms Cloud Engine Reporter: Jan Provaznik <jprovazn>
Component: aeolus-conductorAssignee: Matt Wagner <matt.wagner>
Status: CLOSED CURRENTRELEASE QA Contact: Dave Johnson <dajohnso>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 1.0.0CC: akarol, athomas, deltacloud-maint, dgao, matt.wagner, slinaber, ssachdev, whayutin
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: aeolus-conductor-0.8.0-33 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-30 17:15:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jan Provaznik 2012-02-16 13:02:44 UTC
When a rhevm instance is stopped, we try to destroy the instance on provider side - this is done by app/models/instance.rb:destroy_on_provider method.

There is a 'retry' loop in this method which tries to repeatedly destroy the instance if an exception occurs, the exception is raised quite often because app/util/taskomatic.rb:self.destroy_instance method tries to update a task which may not exist.

1. app/util/taskomatic.rb:self.destroy_instance method should try to update a task in ensure block only if it exists
2. if step 1) is done, the rescue+retry block in destroy_on_provider method shouldn't be needed at all

3. actually this method shouldn't be there at all, destroy action should be hooked on instance's after_update callback.

I'm not sure if there is are reproduction steps for QA by which they can make sure this is fixed. Also I wonder if the instance destroy on provider side currently works for RHEVM?

Comment 1 Matt Wagner 2012-02-17 21:35:38 UTC
I tinkered with refactoring, but the actual debugging portion here is a bit more disastrous.

You indeed cannot destroy an instance without stopping it first.

When you stop an instance, the state goes to "PENDING". After it finally stops, it's in "STOPPED".

Oddly, when you then send "destroy!", the state goes back to "PENDING" before disappearing.

Incidentally, if you attempt to fetch an image that has been destroyed, we raise an exception -- just something to be aware of since I wasn't expecting that.

>> c.instance('571b5ae2-6152-4c1e-b5c0-7a8af43a3bb5')
DeltaCloud::API::BackendError: DeltaCloud::API::BackendError
	from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:403:in `handle_backend_error'
	from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:371:in `request'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:218:in `call'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:218:in `process_result'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:169:in `transmit'
	from /usr/lib/ruby/1.8/net/http.rb:543:in `start'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:166:in `transmit'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:60:in `execute'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:31:in `execute'
	from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/resource.rb:54:in `get'
	from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:370:in `send'
	from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:370:in `request'
	from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:142:in `instance'
	from (irb):66

Comment 2 Matt Wagner 2012-02-17 21:40:08 UTC
The above shouldn't be a problem based on this before_destroy filter:

  def destroyable?
    (state == STATE_CREATE_FAILED) or (state == STATE_STOPPED and not restartable?) or (state == STATE_VANISHED)
  end

If an image is in STATE_CREATE_FAILED or STATE_VANISHED, destroy_on_provider is a no-op, so we should only be issuing a destroy! call to Deltacloud if the instance is already stopped -- at least, in our database.

Comment 3 Matt Wagner 2012-02-17 22:09:14 UTC
Potentially related:

Deltacloud JIRA - Incorrect state when stopping an instance:
https://issues.apache.org/jira/browse/DTACLOUD-148

aeolus-devel thread that I wanted to post here but BZ was down:
http://lists.fedorahosted.org/pipermail/aeolus-devel/2012-February/008925.html

Comment 4 Matt Wagner 2012-02-17 22:10:38 UTC
First-draft style/RFC patch on list: http://lists.fedorahosted.org/pipermail/aeolus-devel/2012-February/008972.html

That is solely cleanup.

Comment 6 Matt Wagner 2012-02-21 17:56:53 UTC
Pushed to master:

commit c6aaf494ed78277569a314a24b033ab05fd9d1d7
Author: Matt Wagner <matt.wagner>
Date:   Fri Feb 17 16:58:18 2012 -0500

    BZ 791195 - Clean up destroy_on_provider code
    
    This cleans up some crazy code that obsessively retries to destroy
    instances on the provider after they are removed in Conductor.
    
    Resolves https://bugzilla.redhat.com/show_bug.cgi?id=791195



Cherry-picked onto 0.9-maint as well:

commit 34c0ba2280d0773f9c523f0bad16ccffa453b693
Author: Matt Wagner <matt.wagner>
Date:   Fri Feb 17 16:58:18 2012 -0500

    BZ 791195 - Clean up destroy_on_provider code
    
    This cleans up some crazy code that obsessively retries to destroy
    instances on the provider after they are removed in Conductor.
    
    Resolves https://bugzilla.redhat.com/show_bug.cgi?id=791195

Comment 7 Steve Linabery 2012-02-22 03:52:29 UTC
34c0ba2 in latest build

Comment 8 Dave Johnson 2012-03-05 14:55:55 UTC
good 2 go, right before this landed I noticed deleting 30+ rhevm instances produced a bunch of stack traces and I no longer see those now, it looks good 2 go in aeolus-conductor-0.8.0-39.el6.noarch