Bug 1703877
| Summary: | [stability] MCD pod is periodically exiting with error during some e2e runs | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
| Component: | Machine Config Operator | Assignee: | Antonio Murdaca <amurdaca> |
| Status: | CLOSED ERRATA | QA Contact: | Micah Abbott <miabbott> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.1.0 | CC: | mnguyen, sponnaga |
| Target Milestone: | --- | Keywords: | TestBlocker |
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-16 06:28:21 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Clayton Coleman
2019-04-29 01:20:45 UTC
Apr 27 05:44:30.213 E ns/openshift-machine-config-operator pod/machine-config-daemon-9jww2 node/ip-10-0-154-190.ec2.internal container=machine-config-daemon container exited with code 143: Someone (kubelet likely) is killing (SIGTERM) us. The 143 error code is because the MCD is getting killed after someone asked to SIGTERM it. Now, in the MCD we have an handler for sigterm only during our sync, the rest of the code doesn't really care about sigterm so we don't catch it and we exit with 143 instead of 0 (if we had an handler). PR to fix this by adding an handler for SIGTERM and exiting nicely is here https://github.com/openshift/machine-config-operator/pull/697 Alright, all daemonsets w/o a SIGTERM handler are exposing this behavior of being terminated (full conversation here https://coreos.slack.com/archives/CEKNRGF25/p1556821026430400) The MCO in that job also isn't erroring out also. As outlined in the conversation also, this may be just noise (but we do have a PR anyway). I'm moving the target to 4.2 actually. No reports of 'container exited with code 143' in the last 14 days of test runs. Closing as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |