Description of problem: Try to scale up when "post_install" exit code is non-zero. The scale process will end up successfully. Try to scale up when "deploy" exit code is non-zero. The scale process will end up with no error reported. But the gear is always in "deploying" status Try to scale up when "install" exit code is non-zero. The scale process will end up fail, saying "Unable to complete the requested operation due to: Node execution failure (invalid exit code from node).." Version-Release number of selected component (if applicable): devenv_3824 How reproducible: always Steps to Reproduce: 1. Create a scalable app rhc app create ews1s jbossews-1.0 2. On instance, add "exit 1" to the end of "/usr/libexec/openshift/cartridges/jbossews/bin/post_install". Restart service "ruby193-mcollective" 3. Try to scale up the app rhc cartridge scale -c jbossews-1.0 -a ews1s --min 2 4. Withdraw the above change and try the similar steps using "deploy" and "install" scripts. Actual results: The scale process will end up successfully for non-zero exit code of "post_install" and "deploy". Only non-zero exit code in "install" will cause the scale-up fail. Please refer to the attached platform.log. Expected results: The scale up process report the error, quit, and rollback correctly Additional info: App create process will fail as expected when the exit code is changed to non-zero
Created attachment 802625 [details] Platform.log from instance
The broker's responsible for interpreting the non-zero exit codes from the node for each lifecycle action and making the decision about whether to stop/rollback. Also, this may be appropriate to close as upstream on the scheduler work. I'll leave that up to the broker team.
deploy/post-install steps are invoked through the post-configure hook call from the broker. However, during scale-up of a web_framework cartridge, the post-configure hook is not called. The post-install/deploy actions are called from executing the connection hooks and the results/failures are ignored by the broker. We need to figure out a way to distinguish between acceptable failures in connection hooks (essentially warnings/messages) and failures that should cause a rollback. This logic/separation may need to be handled in the connection hook implementation itself. Marking this bug as UpcomingRelease for now and will create a trello card for it subsequently.
To be fixed upstream with the user story - https://trello.com/c/o3KaRNRk/188-execution-of-connections-should-handle-errors