Bug 1790989

Summary:	Cluster managed daemonsets and deployments reporting not all pods are ready when all pods appear to be running
Product:	OpenShift Container Platform	Reporter:	Luke Stanton <lstanton>
Component:	Node	Assignee:	Ryan Phillips <rphillips>
Status:	CLOSED WONTFIX	QA Contact:	Sunil Choudhary <schoudha>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	4.2.0	CC:	acomabon, adeshpan, agarcial, aos-bugs, arghosh, dahernan, danw, e30532, eparis, fabian.ahbeck, ggore, jcrumple, jinjli, jmalde, jokerman, knewcome, mvardhan, oarribas, openshift-bugs-escalate, pbertera, rgregory, rkant, rphillips, scuppett, tnozicka, wking
Target Milestone:	---	Keywords:	Reopened
Target Release:	4.6.0	Flags:	acomabon: needinfo?
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-07-08 17:20:03 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Luke Stanton 2020-01-14 16:18:07 UTC

Description of problem:

When checking the status of cluster managed daemonsets and deployments, some report that not all pods are available. However, it appears that all pods are actually running without issue. Some of these "out of sync" daemonsets/deployments appear to cause their associated operators to go into a degraded state. 

This issue came up without any changes or known activity in the cluster.


How reproducible:

Uncertain


Actual results: 

Some cluster operators reporting as degraded due to the out of sync deployments.


Expected results:

Deployment state would be accurate.

Comment 9 Ryan Phillips 2020-02-04 19:35:28 UTC

Looks like this [1] upstream issue, which is still active.


1. https://github.com/kubernetes/kubernetes/issues/53023

Comment 10 Ryan Phillips 2020-02-04 19:36:46 UTC

Rolling the daemonset seems to mitigate the issue for now.

```
oc rollout restart ds [ds name].
```

Comment 13 W. Trevor King 2020-02-24 21:33:08 UTC

Bug 1804717 might help with this.  Or it will at least maximize the benefit of a fix to Kube's Deployment controller.

Comment 28 Dan Winship 2020-05-14 21:42:24 UTC

1804717 works around the problem for a single DaemonSet but the problem still exists for every other DaemonSet. If we are not going to fix it in kubelet then we need to get rid of every DaemonSet in OCP...

Comment 39 Ryan Phillips 2020-07-08 17:20:03 UTC

There are patches in later releases fixing this issue in 4.2. If this issue is found again in later release, please open a new bug.