1011810 – "Scale up" can end successfully when non-zero exit code is returned during the process

Bug 1011810 - "Scale up" can end successfully when non-zero exit code is returned during the process

Summary: "Scale up" can end successfully when non-zero exit code is returned during th...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Pod
Sub Component:
Version:	2.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Rajat Chopra
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-09-25 08:00 UTC by Qiushui Zhang
Modified:	2015-05-15 00:21 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-01-20 18:16:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Platform.log from instance (111.61 KB, application/zip) 2013-09-25 08:00 UTC, Qiushui Zhang	no flags	Details
View All

Description Qiushui Zhang 2013-09-25 08:00:04 UTC

Description of problem:
Try to scale up when "post_install" exit code is non-zero. The scale process will end up successfully.

Try to scale up when "deploy" exit code is non-zero. The scale process will end up with no error reported. But the gear is always in "deploying" status

Try to scale up when "install" exit code is non-zero. The scale process will end up fail, saying "Unable to complete the requested operation due to: Node execution failure (invalid exit code from node).."

Version-Release number of selected component (if applicable):
devenv_3824

How reproducible:
always

Steps to Reproduce:
1. Create a scalable app
rhc app create ews1s jbossews-1.0
2. On instance, add "exit 1" to the end of "/usr/libexec/openshift/cartridges/jbossews/bin/post_install". Restart service "ruby193-mcollective"
3. Try to scale up the app
rhc cartridge scale -c jbossews-1.0 -a ews1s --min 2
4. Withdraw the above change and try the similar steps using "deploy" and "install" scripts.

Actual results:
The scale process will end up successfully for non-zero exit code of "post_install" and "deploy". 
Only non-zero exit code in "install" will cause the scale-up fail.
Please refer to the attached platform.log.

Expected results:
The scale up process report the error, quit, and rollback correctly


Additional info:
App create process will fail as expected when the exit code is changed to non-zero

Comment 1 Qiushui Zhang 2013-09-25 08:00:36 UTC

Created attachment 802625 [details]
Platform.log from instance

Comment 2 Dan Mace 2013-10-01 15:08:52 UTC

The broker's responsible for interpreting the non-zero exit codes from the node for each lifecycle action and making the decision about whether to stop/rollback.

Also, this may be appropriate to close as upstream on the scheduler work. I'll leave that up to the broker team.

Comment 3 Abhishek Gupta 2013-10-02 19:40:58 UTC

deploy/post-install steps are invoked through the post-configure hook call from the broker. However, during scale-up of a web_framework cartridge, the post-configure hook is not called. The post-install/deploy actions are called from executing the connection hooks and the results/failures are ignored by the broker.

We need to figure out a way to distinguish between acceptable failures in connection hooks (essentially warnings/messages) and failures that should cause a rollback. This logic/separation may need to be handled in the connection hook implementation itself. 

Marking this bug as UpcomingRelease for now and will create a trello card for it subsequently.

Comment 5 Rajat Chopra 2014-01-20 18:16:01 UTC

To be fixed upstream with the user story - 
https://trello.com/c/o3KaRNRk/188-execution-of-connections-should-handle-errors

Note You need to log in before you can comment on or make changes to this bug.