Bug 1011810 - "Scale up" can end successfully when non-zero exit code is returned during the process
Summary: "Scale up" can end successfully when non-zero exit code is returned during th...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Rajat Chopra
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-25 08:00 UTC by Qiushui Zhang
Modified: 2015-05-15 00:21 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-20 18:16:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Platform.log from instance (111.61 KB, application/zip)
2013-09-25 08:00 UTC, Qiushui Zhang
no flags Details

Description Qiushui Zhang 2013-09-25 08:00:04 UTC
Description of problem:
Try to scale up when "post_install" exit code is non-zero. The scale process will end up successfully.

Try to scale up when "deploy" exit code is non-zero. The scale process will end up with no error reported. But the gear is always in "deploying" status

Try to scale up when "install" exit code is non-zero. The scale process will end up fail, saying "Unable to complete the requested operation due to: Node execution failure (invalid exit code from node).."

Version-Release number of selected component (if applicable):
devenv_3824

How reproducible:
always

Steps to Reproduce:
1. Create a scalable app
rhc app create ews1s jbossews-1.0
2. On instance, add "exit 1" to the end of "/usr/libexec/openshift/cartridges/jbossews/bin/post_install". Restart service "ruby193-mcollective"
3. Try to scale up the app
rhc cartridge scale -c jbossews-1.0 -a ews1s --min 2
4. Withdraw the above change and try the similar steps using "deploy" and "install" scripts.

Actual results:
The scale process will end up successfully for non-zero exit code of "post_install" and "deploy". 
Only non-zero exit code in "install" will cause the scale-up fail.
Please refer to the attached platform.log.

Expected results:
The scale up process report the error, quit, and rollback correctly


Additional info:
App create process will fail as expected when the exit code is changed to non-zero

Comment 1 Qiushui Zhang 2013-09-25 08:00:36 UTC
Created attachment 802625 [details]
Platform.log from instance

Comment 2 Dan Mace 2013-10-01 15:08:52 UTC
The broker's responsible for interpreting the non-zero exit codes from the node for each lifecycle action and making the decision about whether to stop/rollback.

Also, this may be appropriate to close as upstream on the scheduler work. I'll leave that up to the broker team.

Comment 3 Abhishek Gupta 2013-10-02 19:40:58 UTC
deploy/post-install steps are invoked through the post-configure hook call from the broker. However, during scale-up of a web_framework cartridge, the post-configure hook is not called. The post-install/deploy actions are called from executing the connection hooks and the results/failures are ignored by the broker.

We need to figure out a way to distinguish between acceptable failures in connection hooks (essentially warnings/messages) and failures that should cause a rollback. This logic/separation may need to be handled in the connection hook implementation itself. 

Marking this bug as UpcomingRelease for now and will create a trello card for it subsequently.

Comment 5 Rajat Chopra 2014-01-20 18:16:01 UTC
To be fixed upstream with the user story - 
https://trello.com/c/o3KaRNRk/188-execution-of-connections-should-handle-errors


Note You need to log in before you can comment on or make changes to this bug.