2068613 – ClusterRoleUpdated/ClusterRoleBindingUpdated Spamming Event Logs

Bug 2068613 - ClusterRoleUpdated/ClusterRoleBindingUpdated Spamming Event Logs

Summary: ClusterRoleUpdated/ClusterRoleBindingUpdated Spamming Event Logs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	4.11.0
Assignee:	John Kyros
QA Contact:	Sergio
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2063022 (view as bug list)
Depends On:
Blocks:	2069798
TreeView+	depends on / blocked

Reported:	2022-03-25 19:28 UTC by Kirsten Garrison
Modified:	2022-10-12 07:52 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 11:02:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 3040	0	None	open	Bug 2068613: ClusterRoleUpdated/ClusterRoleBindingUpdated Spamming Event Logs	2022-03-29 15:22:51 UTC
Red Hat Product Errata	RHSA-2022:5069	0	None	None	None	2022-08-10 11:02:47 UTC

Description Kirsten Garrison 2022-03-25 19:28:04 UTC

Description of problem:


Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)?
(Y/N/Not sure):Y

How reproducible: Look at any e2e run

ClusterRoleUpdated/ClusterRoleBindingUpdated filling the event logs for the machine-config-operator namespace for 15+ pages of results. This makes it difficult to review and find meaningful (and actual) MCO events for debugging. These events are being fired every few _seconds_. 

If the events are firing to note normal behavior, they should stop as the freq is flooding logs and not helpful, conversely, if the behavior is abnormal then the underlying changes the eventing is noting should be fixed.

Examples: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-gcp-ovn-rt-upgrade/1507311742361800704/artifacts/e2e-gcp-ovn-rt-upgrade/gather-must-gather/artifacts/event-filter.html

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/3036/pull-ci-openshift-machine-config-operator-layering-e2e-gcp-op/1507099921260482560/artifacts/e2e-gcp-op/gather-must-gather/artifacts/event-filter.html

Comment 1 John Kyros 2022-03-29 05:39:40 UTC


So library-go does log these updates automatically (it is library-go that is doing the logging/eventing under the hood, not us directly), but there are waaaay more of them happening than there should be. Waaaaay more. 

I had to mess with the logging (klog vs glog is a whole other thing ), but the reason it keeps syncing those is apparently because it thinks the namespace is changing. 

I0329 02:53:09.930262       1 rbac.go:49] ClusterRole "machine-config-daemon-events" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}
I0329 02:53:09.973297       1 rbac.go:104] ClusterRoleBinding "machine-config-daemon" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}
I0329 02:53:11.030610       1 rbac.go:49] ClusterRole "machine-config-controller" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}
I0329 02:53:11.047133       1 rbac.go:49] ClusterRole "machine-config-controller-events" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}
I0329 02:53:11.070625       1 rbac.go:104] ClusterRoleBinding "machine-config-controller" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}
I0329 02:53:13.182776       1 rbac.go:49] ClusterRole "machine-config-server" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}
I0329 02:53:13.211074       1 rbac.go:104] ClusterRoleBinding "machine-config-server" changes: {"metadata":{"namespace":"openshift-machine-config-operator"}}

And...that makes sense because it is changing. The only two object types that are spamming it are ClusterRole and ClusterRoleBinding, which are cluster scoped and don't have namespaces. 

You can supply a namespace in the manifest, but it will just quietly ignore it. And...we do supply one. So it ignores it, and then we keep trying to "put it back" every time we sync. We're never gonna win! :) 

TL;DR Our manifests include useless namespaces for ClusterRoles and ClusterRoleBindings

Comment 3 Kirsten Garrison 2022-03-29 17:45:38 UTC

*** Bug 2063022 has been marked as a duplicate of this bug. ***

Comment 7 Sergio 2022-04-04 09:24:26 UTC

Verified using version 4.11.0-0.nightly-2022-04-01-172551:
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-01-172551   True        False         70m     Cluster version is 4.11.0-0.nightly-2022-04-01-172551
$ oc get co machine-config
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.11.0-0.nightly-2022-04-01-172551   True        False         False      81m     


Before the fix, in 4.11.0-0.nightly-2022-03-29-152521

we got all these events 
$ oc get events -n openshift-machine-config-operator | grep ClusterRoleUpdated | wc -l
424
$ oc get events -n openshift-machine-config-operator | grep ClusterRoleBindingUpdated | wc -l
260


After the fix the spam does not occur anymore, in 4.11.0-0.nightly-2022-04-01-172551:
$ oc get events -n openshift-machine-config-operator | grep ClusterRoleUpdated | wc -l
0
$ oc get events -n openshift-machine-config-operator | grep ClusterRoleBindingUpdated | wc -l
0


Moved to VERIFIED status

Comment 10 errata-xmlrpc 2022-08-10 11:02:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.