Bug 1012181

Summary: Dead passenger process leaves downloaded cartridge installation in broken state
Product: OpenShift Online Reporter: Rajat Chopra <rchopra>
Component: PodAssignee: Abhishek Gupta <abhgupta>
Status: CLOSED DUPLICATE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.xCC: dmcphers
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-18 21:41:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rajat Chopra 2013-09-25 22:45:59 UTC
Description of problem:
If the passenger process/thread dies in the middle of installing a downloadable cartridge, the clear-pending-ops script seems to find that application's mongo entry does not have the cartridge stored in it.

Version-Release number of selected component (if applicable):


How reproducible:
Does not happen normally. But should be reproducible by introducing a 'pkill' system command in the code right when a new gear is being sought for the downloadable cartridge.

Steps to Reproduce:
1. Create a nodejs/php app on an origin dev environment.
2. Patch the origin code to kill the broker when a new gear is being searched for (find_capacity maybe?).
3. Add a downloadable cartridge to an existing app
4. See that the broker is dead, and bring it back up. Run oo-admin-clear-pending-ops to clean the blocked pending_op queue for the application. 

Actual results:
add-cartridge gets stuck. 
And any more commands to the app do not work. (remove cart, add cart, stop, restart etc).
oo-admin-clear-pending-ops is not able to clean up the app.
Mongo entry of the app does not store the downloaded cartridge.

Expected results:
Even if the broker dies in the middle, oo-admin-clear-pending-ops should be able to recover from where the previous process left.

Additional info:

Comment 1 Rajat Chopra 2013-10-09 00:57:30 UTC
The above case was deduced from a live application that got broken because of broker thread dying. The downloaded cart was somehow missing from the mongo dump.

I have tried this several times over, but not been able to reproduce the issue. i.e. a killed broker process still does not allow the case where the downloaded cart goes missing.

Two possibilities :
1. This can happen but I have not tried enough code paths.
2. The 'missing downloaded cart' and 'broker thread dying' were two separate incidents that the investigation seems to have clubbed together. If they are unrelated, then this bug is a no-op.

Comment 2 Abhishek Gupta 2013-10-18 21:41:54 UTC
The underlying issue here was the same as bug 997008. The manifestation of the issue described in this bug has not been reproduced. 

Marking this as a duplicate of bug 997008.

*** This bug has been marked as a duplicate of bug 997008 ***