Bug 2026488
Summary: | openshift-controller-manager - delete event is repeating pathologically | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Adam Kaplan <adam.kaplan> |
Component: | openshift-controller-manager | Assignee: | Adam Kaplan <adam.kaplan> |
openshift-controller-manager sub component: | controller-manager | QA Contact: | Jitendar Singh <jitsingh> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | unspecified | CC: | aos-bugs, gmontero, wking |
Version: | 4.10 | ||
Target Milestone: | --- | ||
Target Release: | 4.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-03-10 16:30:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Adam Kaplan
2021-11-24 20:08:15 UTC
Sounds a lot like bug 2004127, which was fixed with some library-go bumps. I dunno if it's the same root cause this time or not. Doesn't seem all that common, but there are a number of hits if I stretch back to the past 14d: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=ns%2Fopenshift-controller-manager+daemonset%2Fcontroller-manager+-+reason%2FSuccessfulDelete.*Deleted+pod%3A+controller-manager&maxAge=336h&ty pe=junit' | grep 'failures match' | sort pull-ci-openshift-builder-master-e2e-aws-builds (all) - 6 runs, 67% failed, 25% of failures match = 17% impact pull-ci-openshift-builder-master-openshift-e2e-aws-builds-techpreview (all) - 5 runs, 60% failed, 33% of failures match = 20% impact pull-ci-openshift-openshift-controller-manager-master-e2e-gcp-builds (all) - 7 runs, 71% failed, 40% of failures match = 29% impact pull-ci-openshift-openshift-controller-manager-master-openshift-e2e-aws-builds-techpreview (all) - 6 runs, 67% failed, 25% of failures match = 17% impact pull-ci-openshift-origin-master-e2e-gcp-builds (all) - 69 runs, 52% failed, 33% of failures match = 17% impact pull-ci-openshift-origin-release-4.9-e2e-gcp-builds (all) - 9 runs, 33% failed, 33% of failures match = 11% impact rehearse-23377-pull-ci-openshift-origin-release-4.9-e2e-gcp-builds (all) - 1 runs, 100% failed, 100% of failures match = 100% impact rehearse-23961-pull-ci-openshift-origin-release-4.9-e2e-gcp-builds (all) - 3 runs, 33% failed, 100% of failures match = 33% impact I suspect that we have logic in the operator that is triggering unnecessary rollouts of the ocm DaemonSet. As you can see in the failure log, we're hitting this in the OCP build suite. We should have only one rollout after the internal registry publishes its hostname. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |