Bug 1039787

Summary: downloaded carts get removed on rollback failure
Product: OpenShift Online Reporter: Rajat Chopra <rchopra>
Component: PodAssignee: Rajat Chopra <rchopra>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 2.xCC: lxia, rchopra
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-30 00:52:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Rajat Chopra 2013-12-10 01:49:04 UTC
Description of problem:
Application can get into 'stuck' state if in the middle of installing a 'downloadable' cartridge, the node stalls.
Essentially if the rollback of an operation also fails, then the downloaded cartridge gets removed from the application's db, inspite of component still being there.

Version-Release number of selected component (if applicable):


How reproducible:
Rare. But can be simulated.

Steps to Reproduce:
1. In a dev environment, modify the node mcollective action file (openshift.rb) such that it returns a non-zero exit code on oo_update_cluster and oo_app_destroy
2. With the above changes and mcollective restarted, try to add a downloadable cartridge to an existing app. example url - http://cartreflect-claytondev.rhcloud.com/reflect?github=narmitag/openshift-origin-cartridge-tungsten
3. 

Actual results:
App reports error and stays stuck. No further operations are possible because the last operation cannot be rolled back (the downloaded cart that it was operating upon has been removed from the db).

Expected results:
App should report error but should not be un-operable after.
After the fix, the app should be operable after the mcollective actions report correctly (not non-zero error codes - indicating that the node is back to normal).



Additional info:

Comment 1 Rajat Chopra 2013-12-10 01:59:38 UTC
Fixed with stage pull request - https://github.com/openshift/origin-server/pull/4308

Master pull request - https://github.com/openshift/origin-server/pull/4307

Comment 2 Liang Xia 2013-12-11 08:20:21 UTC
Following steps in comment #0, and got result as below on devenv-stage_609,

# rhc app create app1 php-5.3 --from-code http://cartreflect-claytondev.rhcloud.com/reflect?github=narmitag/openshift-origin-cartridge-tungsten
Application Options
-------------------
Domain:      lxia
Cartridges:  php-5.3
Source Code: http://cartreflect-claytondev.rhcloud.com/reflect?github=narmitag/openshift-origin-cartridge-tungsten
Gear Size:   default
Scaling:     no

Creating application 'app1' ... 
Unable to complete the requested operation due to: An invalid exit code (131) was returned from the server
ip-10-28-67-145.  This indicates an unexpected problem during the execution of your request..
Reference ID: 8df732a71ba49c39b0d8f42589991472

And no record in mongo for this app.


Hi Rajat Chopra, 
Would you please kindly help to check if this is expected ? Thanks in advance.
Liang

Comment 3 Rajat Chopra 2013-12-11 17:31:50 UTC
The given downloadable cartridge is an embedded cartridge and not a framework one, it cannot be used to create an app. 
So, instead of doing 'rhc app create --from-code' just create a regular app, say php-5.3, and then add the tungsten cartridge to it using 'rhc add cartridge'.

Make sure the app created is a scalable one.

Comment 4 Liang Xia 2013-12-12 05:12:37 UTC
Verified on devenv_4125.

# rhc cartridge add http://cartreflect-claytondev.rhcloud.com/reflect?github=narmitag/openshift-origin-cartridge-tungsten -a phps
The cartridge 'http://cartreflect-claytondev.rhcloud.com/reflect?github=narmitag/openshift-origin-cartridge-tungsten' will be downloaded
and installed
Adding http://cartreflect-claytondev.rhcloud.com/reflect?github=narmitag/openshift-origin-cartridge-tungsten to application 'phps' ... 
Unable to complete the requested operation due to: An invalid exit code (100) was returned from the server domU-12-31-39-04-35-BC.  This
indicates an unexpected problem during the execution of your request..
Reference ID: 033af50843c0dbc930ed43523d74bb21

# rhc app show phps
phps @ http://phps-lxia.dev.rhcloud.com/ (uuid: 52a93d608cdc1fe434000029)
-------------------------------------------------------------------------
  Domain:     lxia
  Created:    Dec 11 11:36 PM
  Gears:      1 (defaults to small)
  Git URL:    ssh://52a93d608cdc1fe434000029.rhcloud.com/~/git/phps.git/
  SSH:        52a93d608cdc1fe434000029.rhcloud.com
  Deployment: auto (on git push)

  php-5.3 (PHP 5.3)
  -----------------
    Scaling: x1 (minimum: 1, maximum: available) on small gears

  haproxy-1.4 (Web Load Balancer)
  -------------------------------
    Gears: Located with php-5.3

# rhc app restart phps
RESULT:
phps restarted