Description of problem: When cleaning up applications that are found in the output of oo-admin-chk our current oo-admin-ctl-app tool does not call oo-app-destroy on the nodes. Example from oo-admin-chk: "Application 'project' with Id '5122d1fd4382ec76060002e4' has no components for group instance with Id '5122d1fd4382ec76060002f2'" Here is the run of oo-admin-ctl-app: oo-admin-ctl-app -l liveC98dc6fd5580 -a project --gear_uuid 5122d1fd4382ec76060002e4 -c destroy Successfully destroyed application: project This actually did not call destroy on the gear on ex-c9-node55 where the gear lives. The data before deletion appears like this: Login: liveC98dc6fd5580 App Name: project App UUID: 5122d1fd4382ec76060002e4 Creation Time: 2013-02-18 08:14:10 PM URL: http://project-liveC98dc6fd5580.rhcloud.com Group Instance[0]: Components: Gear[0] Server Identity: ex-c9-node55.prod.rhcloud.com Gear UUID: 5122d1fd4382ec76060002e4 Gear UID: 4697 Version-Release number of selected component (if applicable): Current release. How reproducible: I'm not sure this bug still exists. Steps to Reproduce: 1. Create an application. 2. Remove the components of the application. 3. Call oo-admin-ctl-app -c destroy Actual results: The application is removed from mongo but not touched on the node itself. Expected results: The oo-admin-ctl-app should perform an entire deletion of the application which would result in mongo data being removed and an oo-app-destroy being call on the node itself. Additional info: From the mongo output I can see the server identity, gear uid, and gear uuid. This should be proof that the application could possibly live out on the node and a oo-app-destroy should be called in order to clean up after ourselves. Currently there are ~600 gears that are experiencing this and we will have to call the oo-admin-ctl-app and then call a clean script to remove these gears.
Lowering priority since this is a bad-data cleanup job rather than fixing a bug in the code. Once these cases are resolved, if new ones are found, then we need to investigate and debug the code issues.
We have made changes that ensure that any failed application operation (that could potentially cause the application to be left without any components) does not cause the pending op group to be deleted in case the rollback failed. This would force the failed app operation to be successfully completed before any other operation (such as delete) can be executed on the application. Refer pull request --> https://github.com/openshift/origin-server/pull/2564
Verified on devenv_3258 steps: 1. Create a gear rhc app create php1 php-5.3 2. On broker, find this app and delete the component_instances field from mongo db.applications.update({name:"php1"},{$unset:{component_instances:1}}) 3. Delete the gear oo-admin-ctl-app -c destroy -l jhou+2 -a php1 --gear_uuid 519c89d5465b3917f7000001 Result: [root@ip-10-137-23-156 openshift]# oo-admin-ctl-app -c destroy -l jhou+2 -a php1 --gear_uuid 519c89d5465b3917f7000001 !!!! WARNING !!!! WARNING !!!! WARNING !!!! You are about to destroy the php1 application. This is NOT reversible, all remote data for this application will be removed. Do you want to destroy this application (y/n): y Successfully destroyed application: php1 the app is removed from both mongo and node.