Bug 1395663
| Summary: | Scaling down pods with a running Java process results in a warning message | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | James Netherton <jnethert> |
| Component: | RFE | Assignee: | Ryan Phillips <rphillips> |
| Status: | CLOSED WONTFIX | QA Contact: | Xiaoli Tian <xtian> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.3.0 | CC: | aileenc, aos-bugs, bparees, clichybi, decarr, jnethert, jokerman, mmccomas, pkanthal, rphillips, sjenning, sreber, suchaudh, vwalek |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-09-19 15:14:08 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
James Netherton
2016-11-16 11:45:42 UTC
We had a similar problem with our ruby image before and we were able to resolve that one. I don't think we want to stop warning people when their process fails to exit cleanly. We should look into whether there is anything we can do in the Java images to provide a more appropriate exit code. Agreed we don't want to stop showing warnings - just suppress them in certain scenarios. Modifying images is tricky because the OpenShift advice seems to be that processes are started with 'exec' (so that everything runs as PID 1). Therefore, you effectively replace the environment and loose the capability to trap exit codes and return something more appropriate. The only workaround I see would be to do things the EAP way, by starting Java in the background, trap signals and then forward them onto the JVM process. But this seems to go against the recommended OpenShift advice. It also adds a bunch of additional complexity that any image wanting to use Java would need to implement - so it's less than ideal. we can modify our images, but this sounds like a more general concern that anyone writing their own java image is going to run into... we can't solve that short of telling people as a best practice how to deal with it. I wonder if there is a k8s way to indicate what the "expected/success" error code is as part of the pod/container definition? That seems like the best way to generically solve this problem. This should be an RFE. Every application may have different needs on graceful shutdown logic. In this case, Java does not set a default error code upon exit (it needs to be explicitly set with a System.exit(0) call). There is a project called springboot-graceful-shutdown[1] that can be injected into a Spring or SpringBoot application to help with OpenShift rolling deployments or projects that need a Spring-based graceful shutdown. [1] https://github.com/SchweizerischeBundesbahnen/springboot-graceful-shutdown Hello Ryan, I have reply from customer: The provided answer is not helping us at all and is not connected to the problem stated in the ticket in any way. Our problem is not with graceful shutdown. Our Problem is (in short), that OpenShift (since 3.6) displays a warning if our application ends with another exitcode than 0. Even if exitcode 143 is okay for us. Any error code other than 0 is considered to be an error. Other comments in this thread are about ungraceful java shutdowns, and I was addressing that issue. The customer could capture the java exit code in a shell script and rewrite it to be 0, which seems like the most portable solution. This situation should be addressed inside the container, not by OCP. There should be some wrapper around the JVM that sanitizes its return code since non-zero return codes have historically indicated an error. The only thing that could be done in OCP for this is to add a field on the container spec like "successCode: 143", but I do not see that flying upstream since it is easily worked around from within the container and would involve a lot of change in the kubelet and CLI tools since they all assume that a non-zero return code in the container status indicates and error. |