Bug 1011810 - "Scale up" can end successfully when non-zero exit code is returned during the process
"Scale up" can end successfully when non-zero exit code is returned during th...
Product: OpenShift Online
Classification: Red Hat
Component: Pod (Show other bugs)
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Rajat Chopra
libra bugs
Depends On:
  Show dependency treegraph
Reported: 2013-09-25 04:00 EDT by Qiushui Zhang
Modified: 2015-05-14 20:21 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2014-01-20 13:16:01 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Platform.log from instance (111.61 KB, application/zip)
2013-09-25 04:00 EDT, Qiushui Zhang
no flags Details

  None (edit)
Description Qiushui Zhang 2013-09-25 04:00:04 EDT
Description of problem:
Try to scale up when "post_install" exit code is non-zero. The scale process will end up successfully.

Try to scale up when "deploy" exit code is non-zero. The scale process will end up with no error reported. But the gear is always in "deploying" status

Try to scale up when "install" exit code is non-zero. The scale process will end up fail, saying "Unable to complete the requested operation due to: Node execution failure (invalid exit code from node).."

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Create a scalable app
rhc app create ews1s jbossews-1.0
2. On instance, add "exit 1" to the end of "/usr/libexec/openshift/cartridges/jbossews/bin/post_install". Restart service "ruby193-mcollective"
3. Try to scale up the app
rhc cartridge scale -c jbossews-1.0 -a ews1s --min 2
4. Withdraw the above change and try the similar steps using "deploy" and "install" scripts.

Actual results:
The scale process will end up successfully for non-zero exit code of "post_install" and "deploy". 
Only non-zero exit code in "install" will cause the scale-up fail.
Please refer to the attached platform.log.

Expected results:
The scale up process report the error, quit, and rollback correctly

Additional info:
App create process will fail as expected when the exit code is changed to non-zero
Comment 1 Qiushui Zhang 2013-09-25 04:00:36 EDT
Created attachment 802625 [details]
Platform.log from instance
Comment 2 Dan Mace 2013-10-01 11:08:52 EDT
The broker's responsible for interpreting the non-zero exit codes from the node for each lifecycle action and making the decision about whether to stop/rollback.

Also, this may be appropriate to close as upstream on the scheduler work. I'll leave that up to the broker team.
Comment 3 Abhishek Gupta 2013-10-02 15:40:58 EDT
deploy/post-install steps are invoked through the post-configure hook call from the broker. However, during scale-up of a web_framework cartridge, the post-configure hook is not called. The post-install/deploy actions are called from executing the connection hooks and the results/failures are ignored by the broker.

We need to figure out a way to distinguish between acceptable failures in connection hooks (essentially warnings/messages) and failures that should cause a rollback. This logic/separation may need to be handled in the connection hook implementation itself. 

Marking this bug as UpcomingRelease for now and will create a trello card for it subsequently.
Comment 5 Rajat Chopra 2014-01-20 13:16:01 EST
To be fixed upstream with the user story - 

Note You need to log in before you can comment on or make changes to this bug.