Bug 1810652

Summary: Node should not delete pods until all container status is available
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: NodeAssignee: Clayton Coleman <ccoleman>
Status: CLOSED ERRATA QA Contact: Sunil Choudhary <schoudha>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4CC: aos-bugs, jokerman, sdodson, tnozicka, wking
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1810722 1926546 (view as bug list) Environment:
Last Closed: 2020-07-13 17:18:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1810722, 1926546    

Description Clayton Coleman 2020-03-05 16:39:17 UTC
The kubelet does not properly terminate pods that are RestartNever - upstream it reports success (even if the pod actually failed), and in OpenShift since 4.1 we provides synthetic status (a fake 137 exit code).  Now that we have fixed the issue upstream, we should backport it to 4.4 at least, possible 4.3.

The upstream e2e reproduces the issue by:

1. Creating a RestartNever pod that should always exit with status code 1
2. Waiting 0-4s
3. Deleting the pod
4. Observing the status written by the kubelet - no container should report exit code 0

To test this in Origin the e2e test is sufficient, and we can verify in upgrade jobs (which terminate lots of pods) that no openshift-* namespace pod exits with code 137 reason ContainerStatusUnknown.

Comment 3 Ryan Phillips 2020-03-26 15:30:28 UTC
*** Bug 1780386 has been marked as a duplicate of this bug. ***

Comment 5 Ryan Phillips 2020-04-06 21:07:39 UTC
*** Bug 1726934 has been marked as a duplicate of this bug. ***

Comment 7 errata-xmlrpc 2020-07-13 17:18:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409