1843877 – daemonset, deployment, and replicaset status can permafail

Bug 1843877 - daemonset, deployment, and replicaset status can permafail

Summary: daemonset, deployment, and replicaset status can permafail

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-controller-manager
Sub Component:
Version:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.3.z
Assignee:	Maciej Szulik
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:	1843876
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-04 11:16 UTC by Maciej Szulik
Modified:	2020-08-05 10:54 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: In certain cases NotFound error was swallowed by controller logic. Consequence: Missing NotFound event was causing the controller not be aware of missing pods. Fix: Properly react to NotFound events, which indicate that the pod was already removed by a different actor. Result: Controller (deployment, daemonset, replicaset and others) will properly react to pod NotFound event.
Clone Of:	1843876
Environment:
Last Closed:	2020-08-05 10:54:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 25063	0	None	closed	[release-4.3] Bug 1843877: UPSTREAM: 91008: Do not swallow NotFound error for DeletePod in dsc.manage	2020-07-31 17:55:33 UTC
Red Hat Product Errata	RHBA-2020:3180	0	None	None	None	2020-08-05 10:54:34 UTC

Description Maciej Szulik 2020-06-04 11:16:26 UTC

+++ This bug was initially created as a clone of Bug #1843876 +++

+++ This bug was initially created as a clone of Bug #1843462 +++

+++ This bug was initially created as a clone of Bug #1843187 +++

When pod expectations are not met, status for workloads can wedge. When status for workloads wedges, operators wait indefinitely. When operators wait indefinitely status is wrong.  When status is wrong, upgrades can fail.

Picking https://github.com/kubernetes/kubernetes/pull/91008 seems like a fix.

--- Additional comment from Maciej Szulik on 2020-06-03 12:54:58 CEST ---

Comment 1 Maciej Szulik 2020-06-18 09:56:39 UTC

This waiting to be merged in the queue.

Comment 2 Maciej Szulik 2020-07-09 10:56:59 UTC

This waiting to be merged in the queue.

Comment 5 zhou ying 2020-07-13 07:43:34 UTC

Confirmed with payload: 4.3.0-0.nightly-2020-07-12-052232 , this issue has fixed:
Delete one pod at the first terminal , at the same time scale down the deploy, no new pod created . 

[zhouying@dhcp-140-138 ~]$ oc get po 
NAME                        READY   STATUS    RESTARTS   AGE
mydeploy-6cb778bf69-6jngn   1/1     Running   0          5m7s
mydeploy-6cb778bf69-ftl6v   1/1     Running   0          5m7s
mydeploy-6cb778bf69-sgkwc   1/1     Running   0          15s
[zhouying@dhcp-140-138 ~]$ oc delete po/mydeploy-6cb778bf69-sgkwc
pod "mydeploy-6cb778bf69-sgkwc" deleted
[zhouying@dhcp-140-138 ~]$ oc get po 
NAME                        READY   STATUS    RESTARTS   AGE
mydeploy-6cb778bf69-6jngn   1/1     Running   0          5m30s
mydeploy-6cb778bf69-ftl6v   1/1     Running   0          5m30s


[zhouying@dhcp-140-138 ~]$ oc scale deploy/mydeploy --replicas=2
deployment.extensions/mydeploy scaled
[zhouying@dhcp-140-138 ~]$ oc get po 
NAME                        READY   STATUS        RESTARTS   AGE
mydeploy-6cb778bf69-6jngn   1/1     Running       0          5m20s
mydeploy-6cb778bf69-ftl6v   1/1     Running       0          5m20s
mydeploy-6cb778bf69-sgkwc   0/1     Terminating   0          28s

Comment 7 errata-xmlrpc 2020-08-05 10:54:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.3.31 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3180

Note You need to log in before you can comment on or make changes to this bug.