Bug 1790989

Summary: Cluster managed daemonsets and deployments reporting not all pods are ready when all pods appear to be running
Product: OpenShift Container Platform Reporter: Luke Stanton <lstanton>
Component: NodeAssignee: Ryan Phillips <rphillips>
Status: CLOSED WONTFIX QA Contact: Sunil Choudhary <schoudha>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.2.0CC: acomabon, adeshpan, agarcial, aos-bugs, arghosh, dahernan, danw, e30532, eparis, fabian.ahbeck, ggore, jcrumple, jinjli, jmalde, jokerman, knewcome, mvardhan, oarribas, openshift-bugs-escalate, pbertera, rgregory, rkant, rphillips, scuppett, tnozicka, wking
Target Milestone: ---Keywords: Reopened
Target Release: 4.6.0Flags: acomabon: needinfo?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-08 17:20:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Luke Stanton 2020-01-14 16:18:07 UTC
Description of problem:

When checking the status of cluster managed daemonsets and deployments, some report that not all pods are available. However, it appears that all pods are actually running without issue. Some of these "out of sync" daemonsets/deployments appear to cause their associated operators to go into a degraded state. 

This issue came up without any changes or known activity in the cluster.


How reproducible:

Uncertain


Actual results: 

Some cluster operators reporting as degraded due to the out of sync deployments.


Expected results:

Deployment state would be accurate.

Comment 9 Ryan Phillips 2020-02-04 19:35:28 UTC
Looks like this [1] upstream issue, which is still active.


1. https://github.com/kubernetes/kubernetes/issues/53023

Comment 10 Ryan Phillips 2020-02-04 19:36:46 UTC
Rolling the daemonset seems to mitigate the issue for now.

```
oc rollout restart ds [ds name].
```

Comment 13 W. Trevor King 2020-02-24 21:33:08 UTC
Bug 1804717 might help with this.  Or it will at least maximize the benefit of a fix to Kube's Deployment controller.

Comment 28 Dan Winship 2020-05-14 21:42:24 UTC
1804717 works around the problem for a single DaemonSet but the problem still exists for every other DaemonSet. If we are not going to fix it in kubelet then we need to get rid of every DaemonSet in OCP...

Comment 39 Ryan Phillips 2020-07-08 17:20:03 UTC
There are patches in later releases fixing this issue in 4.2. If this issue is found again in later release, please open a new bug.