Bug 1843462 - daemonset, deployment, and replicaset status can permafail
Summary: daemonset, deployment, and replicaset status can permafail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.0
Assignee: Maciej Szulik
QA Contact: zhou ying
URL:
Whiteboard:
Depends On: 1843187
Blocks: 1843876
TreeView+ depends on / blocked
 
Reported: 2020-06-03 10:55 UTC by Maciej Szulik
Modified: 2021-04-05 17:45 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: In certain cases NotFound error was swallowed by controller logic. Consequence: Missing NotFound event was causing the controller not be aware of missing pods. Fix: Properly react to NotFound events, which indicate that the pod was already removed by a different actor. Result: Controller (deployment, daemonset, replicaset and others) will properly react to pod NotFound event.
Clone Of: 1843187
: 1843876 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:42:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 25061 0 None closed [release-4.5] Bug 1843462: UPSTREAM: 91008: Do not swallow NotFound error for DeletePod in dsc.manage 2020-08-12 13:42:27 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:43:17 UTC

Description Maciej Szulik 2020-06-03 10:55:48 UTC
+++ This bug was initially created as a clone of Bug #1843187 +++

When pod expectations are not met, status for workloads can wedge. When status for workloads wedges, operators wait indefinitely. When operators wait indefinitely status is wrong.  When status is wrong, upgrades can fail.

Picking https://github.com/kubernetes/kubernetes/pull/91008 seems like a fix.

--- Additional comment from Maciej Szulik on 2020-06-03 12:54:58 CEST ---

Comment 1 Scott Dodson 2020-06-10 18:13:11 UTC
Aligning Keywords with the upstream bug.

Comment 2 Maciej Szulik 2020-06-17 08:11:28 UTC
This is currently blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1845889 which is waiting a backport of https://github.com/openshift/origin/pull/25091 to land and fix k8s conformance tests.

Comment 8 zhou ying 2020-06-22 07:02:03 UTC
Checked with payload: 4.5.0-0.nightly-2020-06-20-194346:

Open 2 terminals , and at the same time, on the first terminal delete one pod for deployment, on the second terminal scale down the deployment . check the deploy, no new pod created:

[zhouying@dhcp-140-138 ~]$ oc get po 
NAME                      READY   STATUS      RESTARTS   AGE
ruby-ex-1-build           0/1     Completed   0          2m20s
ruby-ex-76567d646-2bppr   1/1     Running     0          12s
ruby-ex-76567d646-q86k4   1/1     Running     0          12s
ruby-ex-76567d646-tw49k   1/1     Running     0          84s
[zhouying@dhcp-140-138 ~]$ oc delete po/ruby-ex-76567d646-tw49k 
pod "ruby-ex-76567d646-tw49k" deleted


[zhouying@dhcp-140-138 ~]$ oc scale deploy/ruby-ex --replicas=2
deployment.apps/ruby-ex scaled
[zhouying@dhcp-140-138 ~]$ oc get po 
NAME                      READY   STATUS        RESTARTS   AGE
ruby-ex-1-build           0/1     Completed     0          3m56s
ruby-ex-76567d646-2bppr   1/1     Running       0          108s
ruby-ex-76567d646-q86k4   1/1     Running       0          108s
ruby-ex-76567d646-tw49k   1/1     Terminating   0          3m

Comment 9 errata-xmlrpc 2020-07-13 17:42:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Comment 10 W. Trevor King 2021-04-05 17:45:55 UTC
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475


Note You need to log in before you can comment on or make changes to this bug.