Hide Forgot
When a rhevm instance is stopped, we try to destroy the instance on provider side - this is done by app/models/instance.rb:destroy_on_provider method. There is a 'retry' loop in this method which tries to repeatedly destroy the instance if an exception occurs, the exception is raised quite often because app/util/taskomatic.rb:self.destroy_instance method tries to update a task which may not exist. 1. app/util/taskomatic.rb:self.destroy_instance method should try to update a task in ensure block only if it exists 2. if step 1) is done, the rescue+retry block in destroy_on_provider method shouldn't be needed at all 3. actually this method shouldn't be there at all, destroy action should be hooked on instance's after_update callback. I'm not sure if there is are reproduction steps for QA by which they can make sure this is fixed. Also I wonder if the instance destroy on provider side currently works for RHEVM?
I tinkered with refactoring, but the actual debugging portion here is a bit more disastrous. You indeed cannot destroy an instance without stopping it first. When you stop an instance, the state goes to "PENDING". After it finally stops, it's in "STOPPED". Oddly, when you then send "destroy!", the state goes back to "PENDING" before disappearing. Incidentally, if you attempt to fetch an image that has been destroyed, we raise an exception -- just something to be aware of since I wasn't expecting that. >> c.instance('571b5ae2-6152-4c1e-b5c0-7a8af43a3bb5') DeltaCloud::API::BackendError: DeltaCloud::API::BackendError from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:403:in `handle_backend_error' from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:371:in `request' from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:218:in `call' from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:218:in `process_result' from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:169:in `transmit' from /usr/lib/ruby/1.8/net/http.rb:543:in `start' from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:166:in `transmit' from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:60:in `execute' from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:31:in `execute' from /usr/lib/ruby/gems/1.8/gems/rest-client-1.6.1/lib/restclient/resource.rb:54:in `get' from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:370:in `send' from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:370:in `request' from /usr/lib/ruby/gems/1.8/gems/deltacloud-client-0.4.0/lib/deltacloud.rb:142:in `instance' from (irb):66
The above shouldn't be a problem based on this before_destroy filter: def destroyable? (state == STATE_CREATE_FAILED) or (state == STATE_STOPPED and not restartable?) or (state == STATE_VANISHED) end If an image is in STATE_CREATE_FAILED or STATE_VANISHED, destroy_on_provider is a no-op, so we should only be issuing a destroy! call to Deltacloud if the instance is already stopped -- at least, in our database.
Potentially related: Deltacloud JIRA - Incorrect state when stopping an instance: https://issues.apache.org/jira/browse/DTACLOUD-148 aeolus-devel thread that I wanted to post here but BZ was down: http://lists.fedorahosted.org/pipermail/aeolus-devel/2012-February/008925.html
First-draft style/RFC patch on list: http://lists.fedorahosted.org/pipermail/aeolus-devel/2012-February/008972.html That is solely cleanup.
It's official: http://lists.fedorahosted.org/pipermail/aeolus-devel/2012-February/009002.html
Pushed to master: commit c6aaf494ed78277569a314a24b033ab05fd9d1d7 Author: Matt Wagner <matt.wagner> Date: Fri Feb 17 16:58:18 2012 -0500 BZ 791195 - Clean up destroy_on_provider code This cleans up some crazy code that obsessively retries to destroy instances on the provider after they are removed in Conductor. Resolves https://bugzilla.redhat.com/show_bug.cgi?id=791195 Cherry-picked onto 0.9-maint as well: commit 34c0ba2280d0773f9c523f0bad16ccffa453b693 Author: Matt Wagner <matt.wagner> Date: Fri Feb 17 16:58:18 2012 -0500 BZ 791195 - Clean up destroy_on_provider code This cleans up some crazy code that obsessively retries to destroy instances on the provider after they are removed in Conductor. Resolves https://bugzilla.redhat.com/show_bug.cgi?id=791195
34c0ba2 in latest build
good 2 go, right before this landed I noticed deleting 30+ rhevm instances produced a bunch of stack traces and I no longer see those now, it looks good 2 go in aeolus-conductor-0.8.0-39.el6.noarch