1706637 – RoleBindingRestriction being listed 4 times a second during e2e runs

Bug 1706637 - RoleBindingRestriction being listed 4 times a second during e2e runs

Summary: RoleBindingRestriction being listed 4 times a second during e2e runs

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	apiserver-auth
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Sally
QA Contact:	Chuan Yu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-05-05 20:32 UTC by Clayton Coleman
Modified:	2019-06-14 14:07 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-14 14:03:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Clayton Coleman 2019-05-05 20:32:17 UTC

https://www.dropbox.com/s/qlzylzfq43zciji/Screenshot%202019-05-05%2016.30.14.png?dl=0

It looks like it's not properly leveraging an informer, or the process doesn't have permission to watch the resource.

This looks new since last week likely due to the feature change

Comment 3 Mo 2019-05-07 17:08:22 UTC

Opened BZ1707516 BZ1707517 BZ1707519 to track fixes to noisy components.

Based on audit logs of a cluster that has been running for a day:

$ cat * | grep '"resource":"rolebindings"' | grep '"verb":"update"' | jq -r '.user.username+"\t->\t "+.objectRef.namespace+":"+.objectRef.name' | sort | uniq -c
   3577 system:serviceaccount:openshift-machine-config-operator:default	->	 default:machine-config-daemon-events
   3577 system:serviceaccount:openshift-machine-config-operator:default	->	 openshift-machine-config-operator:machine-config-daemon-events
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 default:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 kube-system:prometheus-k8s
    321 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 kube-system:resource-metrics-auth-reader
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-apiserver:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-cluster-version:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-etcd:prometheus-k8s
    321 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-kube-controller-manager:prometheus-k8s
    321 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-kube-scheduler:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-monitoring:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-monitoring:prometheus-k8s-config
    210 system:serviceaccount:openshift-network-operator:default	->	 openshift-infra:openshift-sdn-controller-account
    210 system:serviceaccount:openshift-network-operator:default	->	 openshift-sdn:openshift-sdn-controller-leaderelection
    210 system:serviceaccount:openshift-network-operator:default	->	 openshift-sdn:prometheus-k8s

The above is over 10k writes to role bindings.

Comment 4 Sally 2019-06-14 14:03:20 UTC

https://github.com/openshift/origin/pull/22783 merged however we'll be reverting this, as it's causing flakes.  Opening a new BZ to track those, and will close this in favor of the above 3 BZs opened.

The admission plugin for RoleBindingRestrictions is DefaultAllow, this was deemed ok bc they are seldom used.  When introduced, this DefaultAllow behavior caused least backward compatibility issues.  It's impossible to know if cache is up-to-date, so using an informer with RBRs will never work.  As @deads2k put it, ‘“am I up to date with the namespace I'm asserting has no restrictions"  is a question you can't answer’  Best we can do for this bug is to ensure components are not continuously updating role bindings.  Rather, they should only update when the role binding is different than the expected value.  BZs were opened against offending components https://bugzilla.redhat.com/show_bug.cgi?id=1706637#c3, this BZ can be closed.

Comment 5 Sally 2019-06-14 14:07:10 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1720678 opened to track test failures introduced by  https://github.com/openshift/origin/pull/22783

Note You need to log in before you can comment on or make changes to this bug.