Description of problem: We had loss of certain gears on the gear servers. The user apps were deleted from the file system in /var/lib/stickshift. We are trying to clean up these gears from the db, but we are getting errors in doing so. Version-Release number of selected component (if applicable): How reproducible: 100% of what i've tried. Steps to Reproduce: 1. On gear node, only delete the user gear dir. example: /var/lib/stickshift/<uid> 2. on the broker node, run the command: rhc-admin-ctl-app -b -a <app_name> -l <login> -c destroy 3. or run the command: rhc-admin-ctl-app -b -a <app_name> -l <login> -c force-destroy Actual results: Results from destroy (#2) ==================cli-results============================================ /usr/lib/ruby/gems/1.8/gems/stickshift-controller-0.15.11/lib/stickshift-controller/app/models/application.rb:360:in `destroy': Could not destroy all gears of application. (StickShift::NodeException) from /usr/lib/ruby/gems/1.8/gems/stickshift-controller-0.15.11/lib/stickshift-controller/app/models/application.rb:319:in `cleanup_and_delete' from /usr/bin/rhc-admin-ctl-app:143 ==================/cli-results============================================ ==================mcollective results=============================== I, [2012-08-31T16:23:50.810673 #5276] INFO -- : stickshift.rb:315:in `cartridge_do_action' cartridge_do _action call / request = #<MCollective::RPC::Request:0x7f2db9685c98 @action="cartridge_do", @agent="stickshift", @caller="uid=0", @data= {:cartridge=>"stickshift-node", :args=> {"--with-app-name"=>"disk", "--with-namespace"=>"ay", "--with-container-name"=>"disk", "--with-app-uuid"=>"9434a590cdc843faa807bd5595245fe0", "--with-container-uuid"=>"9434a590cdc843faa807bd5595245fe0"}, :action=>"app-destroy", :process_results=>true}, @sender="mcollect.cloud.redhat.com", @time=1346444630, @uniqid="077ef76ffce695aa1bba1798eb79bf89"> I, [2012-08-31T16:23:50.811328 #5276] INFO -- : stickshift.rb:316:in `cartridge_do_action' cartridge_do _action validation = stickshift-node app-destroy --with-app-namedisk--with-namespaceay--with-container-n amedisk--with-app-uuid9434a590cdc843faa807bd5595245fe0--with-container-uuid9434a590cdc843faa807bd5595245 fe0 I, [2012-08-31T16:23:50.811776 #5276] INFO -- : stickshift.rb:54:in `ss_app_destroy' COMMAND: ss-app-de stroy I, [2012-08-31T16:23:50.813781 #5276] INFO -- : stickshift.rb:67:in `ss_app_destroy' No such file or di rectory - /var/lib/stickshift/9434a590cdc843faa807bd5595245fe0/ I, [2012-08-31T16:23:50.814151 #5276] INFO -- : stickshift.rb:338:in `cartridge_do_action' cartridge_do _action ERROR (-1) ------ No such file or directory - /var/lib/stickshift/9434a590cdc843faa807bd5595245fe0/ ------) ==================/mcollective results=============================== results from force-destroy (#3) ==================cli-results============================================ WARNING: Check gear 91d03ad9c492442a8024a2b01b23b420 on node 'ex-std-node85.prod.rhcloud.com', because destroy did not succeed cleanly. The gear may exist on node, but not in database. WARNING: Please check and fix the user's consumed_gear count vs the actual gears consumed, as they may be out of sync. Success ==================/cli-results============================================ Expected results: the app would be removed from the ex_node, no matter if the home dir existed or not. Additional info:
This case needs to be handled by app-destroy on the node. It should either ignore the case completely and return success (after cleaning up any user or proxy data if still there). If for some reason you want app-destroy to not completely ignore this case then I would be ok with a special error code to indicate to the caller what happened and let them make the choice to ignore.
I disagree with the second half of comment 1. I don't think destroying the gear should be optional. If we've asked to destroy a gear (or a user has), then the gear should be destroyed. Leaving anything around will cause our broken gears alerting to go off, and cause us to have to go manually clean up the gears, which is exactly what we're trying to get away from. If we, or the user has asked to destroy the gear, then we know the risks and we want the gear gone.
jhon to discuss with Rob
https://github.com/openshift/crankcase/pull/468
(In reply to comment #4) > https://github.com/openshift/crankcase/pull/468 It's merged in devenv_2148, move it to ON_QA to verify.
Verifier on devenv_2148 [root@ip-10-123-18-148 stickshift]# ls 06115b1edd75441e97758e50adefad1e 06115b1edd-joydev1 6a9385c83dc5418aabda86f818c61493 dd464d34005c4ea89b3ac6072b1e6b45 jboss1-joydev1 last_access.log qruby18-qgong1 quota1-joydev1 [root@ip-10-123-18-148 stickshift]# rhc-admin-ctl-app -b -l qgong -c destroy -a qruby18 Successfully destroyed application: qruby18 [root@ip-10-123-18-148 stickshift]# ls 06115b1edd75441e97758e50adefad1e 06115b1edd-joydev1 6a9385c83dc5418aabda86f818c61493 dd464d34005c4ea89b3ac6072b1e6b45 jboss1-joydev1 last_access.log quota1-joydev1
Verified on devenv_2159 Steps: 1. Create an app and remove gear dir from node [root@ip-10-4-39-173 stickshift]# ls 615c39382ab14aed843791d97d889933 last_access.log php1-2159t1 [root@ip-10-4-39-173 stickshift]# mv 615c39382ab14aed843791d97d889933 /tmp/ [root@ip-10-4-39-173 stickshift]# ls last_access.log php1-2159t1 2. Destroy app [root@ip-10-4-39-173 stickshift]# rhc-admin-ctl-app -b -a php1 -l jhou -c destroy Successfully destroyed application: php1 [root@ip-10-4-39-173 stickshift]# ls last_access.log