Red Hat Bugzilla – Bug 1011810
"Scale up" can end successfully when non-zero exit code is returned during the process
Last modified: 2015-05-14 20:21:03 EDT
Description of problem:
Try to scale up when "post_install" exit code is non-zero. The scale process will end up successfully.
Try to scale up when "deploy" exit code is non-zero. The scale process will end up with no error reported. But the gear is always in "deploying" status
Try to scale up when "install" exit code is non-zero. The scale process will end up fail, saying "Unable to complete the requested operation due to: Node execution failure (invalid exit code from node).."
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a scalable app
rhc app create ews1s jbossews-1.0
2. On instance, add "exit 1" to the end of "/usr/libexec/openshift/cartridges/jbossews/bin/post_install". Restart service "ruby193-mcollective"
3. Try to scale up the app
rhc cartridge scale -c jbossews-1.0 -a ews1s --min 2
4. Withdraw the above change and try the similar steps using "deploy" and "install" scripts.
The scale process will end up successfully for non-zero exit code of "post_install" and "deploy".
Only non-zero exit code in "install" will cause the scale-up fail.
Please refer to the attached platform.log.
The scale up process report the error, quit, and rollback correctly
App create process will fail as expected when the exit code is changed to non-zero
Created attachment 802625 [details]
Platform.log from instance
The broker's responsible for interpreting the non-zero exit codes from the node for each lifecycle action and making the decision about whether to stop/rollback.
Also, this may be appropriate to close as upstream on the scheduler work. I'll leave that up to the broker team.
deploy/post-install steps are invoked through the post-configure hook call from the broker. However, during scale-up of a web_framework cartridge, the post-configure hook is not called. The post-install/deploy actions are called from executing the connection hooks and the results/failures are ignored by the broker.
We need to figure out a way to distinguish between acceptable failures in connection hooks (essentially warnings/messages) and failures that should cause a rollback. This logic/separation may need to be handled in the connection hook implementation itself.
Marking this bug as UpcomingRelease for now and will create a trello card for it subsequently.
To be fixed upstream with the user story -