Description of problem: "[sig-arch] events should not repeat pathologically" test occasionally fails with the following flake: ``` event happened 22 times, something is wrong: ns/openshift-controller-manager daemonset/controller-manager - reason/SuccessfulDelete (combined from similar events): Deleted pod: controller-manager-74rw5 ``` Version-Release number of selected component (if applicable): 4.10 How reproducible: Sometimes Steps to Reproduce: 1. 2. 3. Actual results: Test fails - controller-manager pods are repeatedly being deleted Expected results: controller-manager pods are relatively stable on cluster install. Additional info: See https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_openshift-controller-manager/205/pull-ci-openshift-openshift-controller-manager-master-e2e-gcp-builds/1463500691103289344
Sounds a lot like bug 2004127, which was fixed with some library-go bumps. I dunno if it's the same root cause this time or not.
Doesn't seem all that common, but there are a number of hits if I stretch back to the past 14d: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=ns%2Fopenshift-controller-manager+daemonset%2Fcontroller-manager+-+reason%2FSuccessfulDelete.*Deleted+pod%3A+controller-manager&maxAge=336h&ty pe=junit' | grep 'failures match' | sort pull-ci-openshift-builder-master-e2e-aws-builds (all) - 6 runs, 67% failed, 25% of failures match = 17% impact pull-ci-openshift-builder-master-openshift-e2e-aws-builds-techpreview (all) - 5 runs, 60% failed, 33% of failures match = 20% impact pull-ci-openshift-openshift-controller-manager-master-e2e-gcp-builds (all) - 7 runs, 71% failed, 40% of failures match = 29% impact pull-ci-openshift-openshift-controller-manager-master-openshift-e2e-aws-builds-techpreview (all) - 6 runs, 67% failed, 25% of failures match = 17% impact pull-ci-openshift-origin-master-e2e-gcp-builds (all) - 69 runs, 52% failed, 33% of failures match = 17% impact pull-ci-openshift-origin-release-4.9-e2e-gcp-builds (all) - 9 runs, 33% failed, 33% of failures match = 11% impact rehearse-23377-pull-ci-openshift-origin-release-4.9-e2e-gcp-builds (all) - 1 runs, 100% failed, 100% of failures match = 100% impact rehearse-23961-pull-ci-openshift-origin-release-4.9-e2e-gcp-builds (all) - 3 runs, 33% failed, 100% of failures match = 33% impact
I suspect that we have logic in the operator that is triggering unnecessary rollouts of the ocm DaemonSet. As you can see in the failure log, we're hitting this in the OCP build suite. We should have only one rollout after the internal registry publishes its hostname.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056