Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1810722

Summary:	Node should not delete pods until all container status is available
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Node	Assignee:	Clayton Coleman <ccoleman>
Node sub component:	Kubelet	QA Contact:	Sunil Choudhary <schoudha>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	aos-bugs, jokerman, juzhao, lsm5, rphillips, schoudha, sdodson, wking, zyu
Version:	4.4
Target Milestone:	---
Target Release:	4.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: RestartNever pods would not report statuses correctly. Consequence: Fix: Bugfix upstream. Result:	Story Points:	---
Clone Of:	1810652
Clones:	1821341 (view as bug list)		Environment:
Last Closed:	2020-05-04 11:44:56 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1810652, 1926546
Bug Blocks:	1821341

Description Clayton Coleman 2020-03-05 19:16:30 UTC

+++ This bug was initially created as a clone of Bug #1810652 +++

The kubelet does not properly terminate pods that are RestartNever - upstream it reports success (even if the pod actually failed), and in OpenShift since 4.1 we provides synthetic status (a fake 137 exit code).  Now that we have fixed the issue upstream, we should backport it to 4.4 at least, possible 4.3.

The upstream e2e reproduces the issue by:

1. Creating a RestartNever pod that should always exit with status code 1
2. Waiting 0-4s
3. Deleting the pod
4. Observing the status written by the kubelet - no container should report exit code 0

To test this in Origin the e2e test is sufficient, and we can verify in upgrade jobs (which terminate lots of pods) that no openshift-* namespace pod exits with code 137 reason ContainerStatusUnknown.

Comment 4 Scott Dodson 2020-04-07 18:48:55 UTC

*** Bug 1734524 has been marked as a duplicate of this bug. ***

Comment 5 Scott Dodson 2020-04-07 18:49:11 UTC

*** Bug 1821576 has been marked as a duplicate of this bug. ***

Comment 7 errata-xmlrpc 2020-05-04 11:44:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581

Comment 8 W. Trevor King 2021-03-31 04:13:36 UTC

Scott had added UpgradeBlocker to this bug way back, but I don't think we ever ended up blocking update recommendations on this series, and the fix has been out for almost a year, and 4.4 is now end-of-life.  Removing the keyword to get it out of our suspect queue [1].

[1]: https://github.com/openshift/enhancements/pull/475