Bug 1001223

Summary: calling app-destroy on already deleted gear throws error
Product: OpenShift Online Reporter: Rajat Chopra <rchopra>
Component: ContainersAssignee: Dan Mace <dmace>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.xCC: bmeng, xtian, yadu
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-19 16:48:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rajat Chopra 2013-08-26 19:26:55 UTC
Description of problem:

A gear destroy call may fail from the broker's point of view because of several reasons (mcoll timeout/activemq stoppage/network failure etc).
In such circumstances, a repeat call for gear destroy always returns an error from the node because the node apparently has no trace of the said gear anymore (it succeeded destroying the gear the previous time).

This means that such applications always get stuck in mongo although the gear has already been destroyed cleanly. The node should respond with a 'success' if an app-destroy call is made for a gear that does not exist.


Version-Release number of selected component (if applicable):


How reproducible:

Difficult to reproduce on demand, but can be simulated.

Steps to Reproduce:
1. Temporarily introduce a non-zero exitcode with a gear-destroy call on the broker. This is to simulate a network error that fails to receive a response from the node after the app-destroy call has been made.
2. Try to delete the app again, using oo-admin-ctl-app -c destroy or otherwise.



Actual results:
Subsequent app/gear-destroy calls fail because the gear has already been deleted from the node.

Expected results:
Node should report success on app-destroy calls for gears that do not exist.

Additional info:

Comment 1 Rajat Chopra 2013-09-09 18:17:10 UTC
Its debatable whether the 'node' should return a zero exitcode on 'not found' situations. For our case here, though, its probably ok to have an warning message with a zero exitcode.

Comment 2 Dan Mace 2013-09-09 18:41:43 UTC
The node already behaves as desired; turns out there was a regression corrected by: https://github.com/openshift/origin-server/commit/1adf4900493e378f2df567762cb11bc1e9f1fd4e

Should be fixed in master. If it crops up again, be sure to attach logs.

Thanks!

Comment 3 Meng Bo 2013-09-10 07:33:26 UTC
Checked on devenv_3762,

1.Create app
2.Add "return 1" to the middle of "def oo-app-destroy" in mcollective/agent/openshift.rb
3.Restart mcollective and broker
4.Delete the app from rhc client
Will meet Node execution failure here.
5.Check the app on node
Should be deleted
6.Check the app on broker
Should be still existed
7.Remove the change in openshift.rb
8.Restart mcollective and broker
9.Try to destroy the app again via oo-admin-ctl-app
App is deleted successfully.

Move bug to verified.