Bug 2068613

Summary: ClusterRoleUpdated/ClusterRoleBindingUpdated Spamming Event Logs
Product: OpenShift Container Platform Reporter: Kirsten Garrison <kgarriso>
Component: Machine Config OperatorAssignee: John Kyros <jkyros>
Machine Config Operator sub component: Machine Config Operator QA Contact: Sergio <sregidor>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: medium CC: aos-bugs, jkyros, mharri, mkrejci, olexander.shtepa, rioliu
Version: 4.11   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:02:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2069798    

Description Kirsten Garrison 2022-03-25 19:28:04 UTC
Description of problem:


Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)?
(Y/N/Not sure):Y

How reproducible: Look at any e2e run

ClusterRoleUpdated/ClusterRoleBindingUpdated filling the event logs for the machine-config-operator namespace for 15+ pages of results. This makes it difficult to review and find meaningful (and actual) MCO events for debugging. These events are being fired every few _seconds_. 

If the events are firing to note normal behavior, they should stop as the freq is flooding logs and not helpful, conversely, if the behavior is abnormal then the underlying changes the eventing is noting should be fixed.

Examples: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-gcp-ovn-rt-upgrade/1507311742361800704/artifacts/e2e-gcp-ovn-rt-upgrade/gather-must-gather/artifacts/event-filter.html

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/3036/pull-ci-openshift-machine-config-operator-layering-e2e-gcp-op/1507099921260482560/artifacts/e2e-gcp-op/gather-must-gather/artifacts/event-filter.html

Comment 1 John Kyros 2022-03-29 05:39:40 UTC

So library-go does log these updates automatically (it is library-go that is doing the logging/eventing under the hood, not us directly), but there are waaaay more of them happening than there should be. Waaaaay more. 

I had to mess with the logging (klog vs glog is a whole other thing ), but the reason it keeps syncing those is apparently because it thinks the namespace is changing. 

I0329 02:53:09.930262       1 rbac.go:49] ClusterRole "machine-config-daemon-events" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}
I0329 02:53:09.973297       1 rbac.go:104] ClusterRoleBinding "machine-config-daemon" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}
I0329 02:53:11.030610       1 rbac.go:49] ClusterRole "machine-config-controller" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}
I0329 02:53:11.047133       1 rbac.go:49] ClusterRole "machine-config-controller-events" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}
I0329 02:53:11.070625       1 rbac.go:104] ClusterRoleBinding "machine-config-controller" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}
I0329 02:53:13.182776       1 rbac.go:49] ClusterRole "machine-config-server" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}
I0329 02:53:13.211074       1 rbac.go:104] ClusterRoleBinding "machine-config-server" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}

And...that makes sense because it is changing. The only two object types that are spamming it are ClusterRole and ClusterRoleBinding, which are cluster scoped and don't have namespaces. 

You can supply a namespace in the manifest, but it will just quietly ignore it. And...we do supply one. So it ignores it, and then we keep trying to "put it back" every time we sync. We're never gonna win! :) 

TL;DR Our manifests include useless namespaces for ClusterRoles and ClusterRoleBindings

Comment 3 Kirsten Garrison 2022-03-29 17:45:38 UTC
*** Bug 2063022 has been marked as a duplicate of this bug. ***

Comment 7 Sergio 2022-04-04 09:24:26 UTC
Verified using version 4.11.0-0.nightly-2022-04-01-172551:
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-01-172551   True        False         70m     Cluster version is 4.11.0-0.nightly-2022-04-01-172551
$ oc get co machine-config
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.11.0-0.nightly-2022-04-01-172551   True        False         False      81m     


Before the fix, in 4.11.0-0.nightly-2022-03-29-152521

we got all these events 
$ oc get events -n openshift-machine-config-operator | grep ClusterRoleUpdated | wc -l
424
$ oc get events -n openshift-machine-config-operator | grep ClusterRoleBindingUpdated | wc -l
260


After the fix the spam does not occur anymore, in 4.11.0-0.nightly-2022-04-01-172551:
$ oc get events -n openshift-machine-config-operator | grep ClusterRoleUpdated | wc -l
0
$ oc get events -n openshift-machine-config-operator | grep ClusterRoleBindingUpdated | wc -l
0


Moved to VERIFIED status

Comment 10 errata-xmlrpc 2022-08-10 11:02:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069