Bug 1810722

Summary: Node should not delete pods until all container status is available
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: NodeAssignee: Clayton Coleman <ccoleman>
Node sub component: Kubelet QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, jokerman, juzhao, lsm5, rphillips, schoudha, sdodson, wking, zyu
Version: 4.4   
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: RestartNever pods would not report statuses correctly. Consequence: Fix: Bugfix upstream. Result:
Story Points: ---
Clone Of: 1810652
: 1821341 (view as bug list) Environment:
Last Closed: 2020-05-04 11:44:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1810652, 1926546    
Bug Blocks: 1821341    

Description Clayton Coleman 2020-03-05 19:16:30 UTC
+++ This bug was initially created as a clone of Bug #1810652 +++

The kubelet does not properly terminate pods that are RestartNever - upstream it reports success (even if the pod actually failed), and in OpenShift since 4.1 we provides synthetic status (a fake 137 exit code).  Now that we have fixed the issue upstream, we should backport it to 4.4 at least, possible 4.3.

The upstream e2e reproduces the issue by:

1. Creating a RestartNever pod that should always exit with status code 1
2. Waiting 0-4s
3. Deleting the pod
4. Observing the status written by the kubelet - no container should report exit code 0

To test this in Origin the e2e test is sufficient, and we can verify in upgrade jobs (which terminate lots of pods) that no openshift-* namespace pod exits with code 137 reason ContainerStatusUnknown.

Comment 4 Scott Dodson 2020-04-07 18:48:55 UTC
*** Bug 1734524 has been marked as a duplicate of this bug. ***

Comment 5 Scott Dodson 2020-04-07 18:49:11 UTC
*** Bug 1821576 has been marked as a duplicate of this bug. ***

Comment 7 errata-xmlrpc 2020-05-04 11:44:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581

Comment 8 W. Trevor King 2021-03-31 04:13:36 UTC
Scott had added UpgradeBlocker to this bug way back, but I don't think we ever ended up blocking update recommendations on this series, and the fix has been out for almost a year, and 4.4 is now end-of-life.  Removing the keyword to get it out of our suspect queue [1].

[1]: https://github.com/openshift/enhancements/pull/475