Bug 853559

Summary: Unable to remove gear when user dir doesn't exist
Product: OKD Reporter: Matt Woodson <mwoodson>
Component: ContainersAssignee: Jhon Honce <jhonce>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.xCC: jhonce, jhou, mpatel, qgong, twiest, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-17 21:29:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matt Woodson 2012-08-31 20:51:26 UTC
Description of problem:

We had loss of certain gears on the gear servers.  The user apps were deleted from the file system in /var/lib/stickshift.   We are trying to clean up these gears from the db, but we are getting errors in doing so.

Version-Release number of selected component (if applicable):


How reproducible:
 
100% of what i've tried.

Steps to Reproduce:
1.  On gear node, only delete the user gear dir.  example: /var/lib/stickshift/<uid>
2.  on the broker node, run the command:

rhc-admin-ctl-app -b -a  <app_name> -l  <login> -c destroy

3. or run the command:

rhc-admin-ctl-app -b -a  <app_name> -l  <login> -c force-destroy
  
Actual results:

Results from destroy (#2)
==================cli-results============================================
/usr/lib/ruby/gems/1.8/gems/stickshift-controller-0.15.11/lib/stickshift-controller/app/models/application.rb:360:in `destroy': Could not destroy all gears of application. (StickShift::NodeException)
	from /usr/lib/ruby/gems/1.8/gems/stickshift-controller-0.15.11/lib/stickshift-controller/app/models/application.rb:319:in `cleanup_and_delete'
	from /usr/bin/rhc-admin-ctl-app:143
==================/cli-results============================================

==================mcollective results===============================
I, [2012-08-31T16:23:50.810673 #5276]  INFO -- : stickshift.rb:315:in `cartridge_do_action' cartridge_do
_action call / request = #<MCollective::RPC::Request:0x7f2db9685c98
 @action="cartridge_do",
 @agent="stickshift",
 @caller="uid=0",
 @data=
  {:cartridge=>"stickshift-node",
   :args=>
    {"--with-app-name"=>"disk",
     "--with-namespace"=>"ay",
     "--with-container-name"=>"disk",
     "--with-app-uuid"=>"9434a590cdc843faa807bd5595245fe0",
     "--with-container-uuid"=>"9434a590cdc843faa807bd5595245fe0"},
   :action=>"app-destroy",
   :process_results=>true},
 @sender="mcollect.cloud.redhat.com",
 @time=1346444630,
 @uniqid="077ef76ffce695aa1bba1798eb79bf89">

I, [2012-08-31T16:23:50.811328 #5276]  INFO -- : stickshift.rb:316:in `cartridge_do_action' cartridge_do
_action validation = stickshift-node app-destroy --with-app-namedisk--with-namespaceay--with-container-n
amedisk--with-app-uuid9434a590cdc843faa807bd5595245fe0--with-container-uuid9434a590cdc843faa807bd5595245
fe0
I, [2012-08-31T16:23:50.811776 #5276]  INFO -- : stickshift.rb:54:in `ss_app_destroy' COMMAND: ss-app-de
stroy
I, [2012-08-31T16:23:50.813781 #5276]  INFO -- : stickshift.rb:67:in `ss_app_destroy' No such file or di
rectory - /var/lib/stickshift/9434a590cdc843faa807bd5595245fe0/
I, [2012-08-31T16:23:50.814151 #5276]  INFO -- : stickshift.rb:338:in `cartridge_do_action' cartridge_do
_action ERROR (-1)
------
No such file or directory - /var/lib/stickshift/9434a590cdc843faa807bd5595245fe0/
------)
==================/mcollective results===============================


results from force-destroy (#3)
==================cli-results============================================
WARNING: Check gear 91d03ad9c492442a8024a2b01b23b420 on node 'ex-std-node85.prod.rhcloud.com', because destroy did not succeed cleanly. The gear may exist on node, but not in database.
WARNING: Please check and fix the user's consumed_gear count vs the actual gears consumed, as they may be out of sync.
Success
==================/cli-results============================================



Expected results:

the app would be removed from the ex_node, no matter if the home dir existed or not.

Additional info:

Comment 1 Dan McPherson 2012-08-31 20:59:39 UTC
This case needs to be handled by app-destroy on the node.  It should either ignore the case completely and return success (after cleaning up any user or proxy data if still there).  If for some reason you want app-destroy to not completely ignore this case then I would be ok with a special error code to indicate to the caller what happened and let them make the choice to ignore.

Comment 2 Thomas Wiest 2012-08-31 23:29:53 UTC
I disagree with the second half of comment 1. I don't think destroying the gear should be optional.

If we've asked to destroy a gear (or a user has), then the gear should be destroyed. Leaving anything around will cause our broken gears alerting to go off, and cause us to have to go manually clean up the gears, which is exactly what we're trying to get away from.

If we, or the user has asked to destroy the gear, then we know the risks and we want the gear gone.

Comment 3 John Poelstra 2012-09-07 18:40:18 UTC
jhon to discuss with Rob

Comment 4 Jhon Honce 2012-09-10 20:26:12 UTC
https://github.com/openshift/crankcase/pull/468

Comment 5 Xiaoli Tian 2012-09-11 09:24:37 UTC
(In reply to comment #4)
> https://github.com/openshift/crankcase/pull/468
It's merged in devenv_2148, move it to ON_QA to verify.

Comment 6 Rony Gong 🔥 2012-09-11 10:08:24 UTC
Verifier on devenv_2148
[root@ip-10-123-18-148 stickshift]# ls
06115b1edd75441e97758e50adefad1e  06115b1edd-joydev1  6a9385c83dc5418aabda86f818c61493  dd464d34005c4ea89b3ac6072b1e6b45  jboss1-joydev1  last_access.log  qruby18-qgong1  quota1-joydev1
[root@ip-10-123-18-148 stickshift]# rhc-admin-ctl-app -b -l qgong -c destroy -a qruby18
Successfully destroyed application: qruby18
[root@ip-10-123-18-148 stickshift]# ls
06115b1edd75441e97758e50adefad1e  06115b1edd-joydev1  6a9385c83dc5418aabda86f818c61493  dd464d34005c4ea89b3ac6072b1e6b45  jboss1-joydev1  last_access.log  quota1-joydev1

Comment 7 Jianwei Hou 2012-09-12 10:19:01 UTC
Verified on devenv_2159

Steps:
1. Create an app and remove gear dir from node
[root@ip-10-4-39-173 stickshift]# ls
615c39382ab14aed843791d97d889933  last_access.log  php1-2159t1
[root@ip-10-4-39-173 stickshift]# mv 615c39382ab14aed843791d97d889933 /tmp/
[root@ip-10-4-39-173 stickshift]# ls
last_access.log  php1-2159t1
2. Destroy app 
[root@ip-10-4-39-173 stickshift]# rhc-admin-ctl-app -b -a php1 -l jhou -c destroy
Successfully destroyed application: php1
[root@ip-10-4-39-173 stickshift]# ls
last_access.log