Bug 2026488 - openshift-controller-manager - delete event is repeating pathologically
Summary: openshift-controller-manager - delete event is repeating pathologically
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-controller-manager
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.10.0
Assignee: Adam Kaplan
QA Contact: Jitendar Singh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-24 20:08 UTC by Adam Kaplan
Modified: 2022-03-10 16:30 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:30:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26719 0 None open Bug 2026488: Drop Early/Late Tests for Build Suite 2021-12-22 16:02:19 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:30:48 UTC

Internal Links: 2034984

Description Adam Kaplan 2021-11-24 20:08:15 UTC
Description of problem:

"[sig-arch] events should not repeat pathologically" test occasionally fails with the following flake:

```
event happened 22 times, something is wrong: ns/openshift-controller-manager daemonset/controller-manager - reason/SuccessfulDelete (combined from similar events): Deleted pod: controller-manager-74rw5
```


Version-Release number of selected component (if applicable): 4.10


How reproducible: Sometimes


Steps to Reproduce:
1.
2.
3.

Actual results:

Test fails - controller-manager pods are repeatedly being deleted

Expected results:

controller-manager pods are relatively stable on cluster install.


Additional info:

See https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_openshift-controller-manager/205/pull-ci-openshift-openshift-controller-manager-master-e2e-gcp-builds/1463500691103289344

Comment 1 W. Trevor King 2021-11-30 22:44:13 UTC
Sounds a lot like bug 2004127, which was fixed with some library-go bumps.  I dunno if it's the same root cause this time or not.

Comment 2 W. Trevor King 2021-11-30 22:48:43 UTC
Doesn't seem all that common, but there are a number of hits if I stretch back to the past 14d:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=ns%2Fopenshift-controller-manager+daemonset%2Fcontroller-manager+-+reason%2FSuccessfulDelete.*Deleted+pod%3A+controller-manager&maxAge=336h&ty
pe=junit' | grep 'failures match' | sort
pull-ci-openshift-builder-master-e2e-aws-builds (all) - 6 runs, 67% failed, 25% of failures match = 17% impact
pull-ci-openshift-builder-master-openshift-e2e-aws-builds-techpreview (all) - 5 runs, 60% failed, 33% of failures match = 20% impact
pull-ci-openshift-openshift-controller-manager-master-e2e-gcp-builds (all) - 7 runs, 71% failed, 40% of failures match = 29% impact
pull-ci-openshift-openshift-controller-manager-master-openshift-e2e-aws-builds-techpreview (all) - 6 runs, 67% failed, 25% of failures match = 17% impact
pull-ci-openshift-origin-master-e2e-gcp-builds (all) - 69 runs, 52% failed, 33% of failures match = 17% impact
pull-ci-openshift-origin-release-4.9-e2e-gcp-builds (all) - 9 runs, 33% failed, 33% of failures match = 11% impact
rehearse-23377-pull-ci-openshift-origin-release-4.9-e2e-gcp-builds (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
rehearse-23961-pull-ci-openshift-origin-release-4.9-e2e-gcp-builds (all) - 3 runs, 33% failed, 100% of failures match = 33% impact

Comment 3 Adam Kaplan 2021-12-01 15:42:05 UTC
I suspect that we have logic in the operator that is triggering unnecessary rollouts of the ocm DaemonSet. As you can see in the failure log, we're hitting this in the OCP build suite. We should have only one rollout after the internal registry publishes its hostname.

Comment 10 errata-xmlrpc 2022-03-10 16:30:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.