Bug 1706637

Summary:	RoleBindingRestriction being listed 4 times a second during e2e runs
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	apiserver-auth	Assignee:	Sally <somalley>
Status:	CLOSED WONTFIX	QA Contact:	Chuan Yu <chuyu>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	4.1.0	CC:	aos-bugs, eparis, gblomqui, mkhan, nagrawal, somalley
Target Milestone:	---
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-14 14:03:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2019-05-05 20:32:17 UTC

https://www.dropbox.com/s/qlzylzfq43zciji/Screenshot%202019-05-05%2016.30.14.png?dl=0

It looks like it's not properly leveraging an informer, or the process doesn't have permission to watch the resource.

This looks new since last week likely due to the feature change

Comment 3 Mo 2019-05-07 17:08:22 UTC

Opened BZ1707516 BZ1707517 BZ1707519 to track fixes to noisy components.

Based on audit logs of a cluster that has been running for a day:

$ cat * | grep '"resource":"rolebindings"' | grep '"verb":"update"' | jq -r '.user.username+"\t->\t "+.objectRef.namespace+":"+.objectRef.name' | sort | uniq -c
   3577 system:serviceaccount:openshift-machine-config-operator:default	->	 default:machine-config-daemon-events
   3577 system:serviceaccount:openshift-machine-config-operator:default	->	 openshift-machine-config-operator:machine-config-daemon-events
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 default:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 kube-system:prometheus-k8s
    321 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 kube-system:resource-metrics-auth-reader
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-apiserver:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-cluster-version:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-etcd:prometheus-k8s
    321 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-kube-controller-manager:prometheus-k8s
    321 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-kube-scheduler:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-monitoring:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-monitoring:prometheus-k8s-config
    210 system:serviceaccount:openshift-network-operator:default	->	 openshift-infra:openshift-sdn-controller-account
    210 system:serviceaccount:openshift-network-operator:default	->	 openshift-sdn:openshift-sdn-controller-leaderelection
    210 system:serviceaccount:openshift-network-operator:default	->	 openshift-sdn:prometheus-k8s

The above is over 10k writes to role bindings.

Comment 4 Sally 2019-06-14 14:03:20 UTC

https://github.com/openshift/origin/pull/22783 merged however we'll be reverting this, as it's causing flakes.  Opening a new BZ to track those, and will close this in favor of the above 3 BZs opened.

The admission plugin for RoleBindingRestrictions is DefaultAllow, this was deemed ok bc they are seldom used.  When introduced, this DefaultAllow behavior caused least backward compatibility issues.  It's impossible to know if cache is up-to-date, so using an informer with RBRs will never work.  As @deads2k put it, ‘“am I up to date with the namespace I'm asserting has no restrictions"  is a question you can't answer’  Best we can do for this bug is to ensure components are not continuously updating role bindings.  Rather, they should only update when the role binding is different than the expected value.  BZs were opened against offending components https://bugzilla.redhat.com/show_bug.cgi?id=1706637#c3, this BZ can be closed.

Comment 5 Sally 2019-06-14 14:07:10 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1720678 opened to track test failures introduced by  https://github.com/openshift/origin/pull/22783