Hide Forgot
Description of problem: Original problem described here: https://issues.jboss.org/browse/OSFUSE-427 In short, the issue is related to the exit codes returned from the JVM when it receives signals such as SIGINT, SIGTERM, etc. The JVM default is to return an exit code of 128+signal-id. E.g for SIGTERM you'd see an exit code of 143 (128+15). This causes OpenShift to display a warning message on pod scale down stating that that the container did not stop cleanly (when in actual fact, it did). Most of the XPAAS images seem to cause this problem. With the exception of the EAP image which launches the JVM differently to most other apps (I.e launches the process in the background). How reproducible: Always for any JVM process launched in the foreground or via 'exec'. Steps to Reproduce: 1. Clone Java project: https://github.com/fabric8-quickstarts/spring-boot-camel 2. Deploy project to OpenShift: mvn fabric8:deploy 3. Scale down spring-boot-camel pod in the console Actual results: Warning message appears: "The container spring-boot did not stop cleanly when terminated (exit code 143)." Expected results: The message is a potential source of confusion because the container did stop correctly. Ideally there should not be any warning messages in these instances.
We had a similar problem with our ruby image before and we were able to resolve that one. I don't think we want to stop warning people when their process fails to exit cleanly. We should look into whether there is anything we can do in the Java images to provide a more appropriate exit code.
Agreed we don't want to stop showing warnings - just suppress them in certain scenarios. Modifying images is tricky because the OpenShift advice seems to be that processes are started with 'exec' (so that everything runs as PID 1). Therefore, you effectively replace the environment and loose the capability to trap exit codes and return something more appropriate. The only workaround I see would be to do things the EAP way, by starting Java in the background, trap signals and then forward them onto the JVM process. But this seems to go against the recommended OpenShift advice. It also adds a bunch of additional complexity that any image wanting to use Java would need to implement - so it's less than ideal.
we can modify our images, but this sounds like a more general concern that anyone writing their own java image is going to run into... we can't solve that short of telling people as a best practice how to deal with it. I wonder if there is a k8s way to indicate what the "expected/success" error code is as part of the pod/container definition? That seems like the best way to generically solve this problem.
This should be an RFE.
Every application may have different needs on graceful shutdown logic. In this case, Java does not set a default error code upon exit (it needs to be explicitly set with a System.exit(0) call). There is a project called springboot-graceful-shutdown[1] that can be injected into a Spring or SpringBoot application to help with OpenShift rolling deployments or projects that need a Spring-based graceful shutdown. [1] https://github.com/SchweizerischeBundesbahnen/springboot-graceful-shutdown
Hello Ryan, I have reply from customer: The provided answer is not helping us at all and is not connected to the problem stated in the ticket in any way. Our problem is not with graceful shutdown. Our Problem is (in short), that OpenShift (since 3.6) displays a warning if our application ends with another exitcode than 0. Even if exitcode 143 is okay for us.
Any error code other than 0 is considered to be an error. Other comments in this thread are about ungraceful java shutdowns, and I was addressing that issue. The customer could capture the java exit code in a shell script and rewrite it to be 0, which seems like the most portable solution.
This situation should be addressed inside the container, not by OCP. There should be some wrapper around the JVM that sanitizes its return code since non-zero return codes have historically indicated an error. The only thing that could be done in OCP for this is to add a field on the container spec like "successCode: 143", but I do not see that flying upstream since it is easily worked around from within the container and would involve a lot of change in the kubelet and CLI tools since they all assume that a non-zero return code in the container status indicates and error.