Bug 1706637 - RoleBindingRestriction being listed 4 times a second during e2e runs
Summary: RoleBindingRestriction being listed 4 times a second during e2e runs
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.2.0
Assignee: Sally
QA Contact: Chuan Yu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-05 20:32 UTC by Clayton Coleman
Modified: 2019-06-14 14:07 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-14 14:03:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Clayton Coleman 2019-05-05 20:32:17 UTC
https://www.dropbox.com/s/qlzylzfq43zciji/Screenshot%202019-05-05%2016.30.14.png?dl=0

It looks like it's not properly leveraging an informer, or the process doesn't have permission to watch the resource.

This looks new since last week likely due to the feature change

Comment 3 Mo 2019-05-07 17:08:22 UTC
Opened BZ1707516 BZ1707517 BZ1707519 to track fixes to noisy components.

Based on audit logs of a cluster that has been running for a day:

$ cat * | grep '"resource":"rolebindings"' | grep '"verb":"update"' | jq -r '.user.username+"\t->\t "+.objectRef.namespace+":"+.objectRef.name' | sort | uniq -c
   3577 system:serviceaccount:openshift-machine-config-operator:default	->	 default:machine-config-daemon-events
   3577 system:serviceaccount:openshift-machine-config-operator:default	->	 openshift-machine-config-operator:machine-config-daemon-events
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 default:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 kube-system:prometheus-k8s
    321 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 kube-system:resource-metrics-auth-reader
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-apiserver:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-cluster-version:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-etcd:prometheus-k8s
    321 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-kube-controller-manager:prometheus-k8s
    321 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-kube-scheduler:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-monitoring:prometheus-k8s
    320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator	->	 openshift-monitoring:prometheus-k8s-config
    210 system:serviceaccount:openshift-network-operator:default	->	 openshift-infra:openshift-sdn-controller-account
    210 system:serviceaccount:openshift-network-operator:default	->	 openshift-sdn:openshift-sdn-controller-leaderelection
    210 system:serviceaccount:openshift-network-operator:default	->	 openshift-sdn:prometheus-k8s

The above is over 10k writes to role bindings.

Comment 4 Sally 2019-06-14 14:03:20 UTC
https://github.com/openshift/origin/pull/22783 merged however we'll be reverting this, as it's causing flakes.  Opening a new BZ to track those, and will close this in favor of the above 3 BZs opened.

The admission plugin for RoleBindingRestrictions is DefaultAllow, this was deemed ok bc they are seldom used.  When introduced, this DefaultAllow behavior caused least backward compatibility issues.  It's impossible to know if cache is up-to-date, so using an informer with RBRs will never work.  As @deads2k put it, ‘“am I up to date with the namespace I'm asserting has no restrictions"  is a question you can't answer’  Best we can do for this bug is to ensure components are not continuously updating role bindings.  Rather, they should only update when the role binding is different than the expected value.  BZs were opened against offending components https://bugzilla.redhat.com/show_bug.cgi?id=1706637#c3, this BZ can be closed.

Comment 5 Sally 2019-06-14 14:07:10 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1720678 opened to track test failures introduced by  https://github.com/openshift/origin/pull/22783


Note You need to log in before you can comment on or make changes to this bug.