https://www.dropbox.com/s/qlzylzfq43zciji/Screenshot%202019-05-05%2016.30.14.png?dl=0 It looks like it's not properly leveraging an informer, or the process doesn't have permission to watch the resource. This looks new since last week likely due to the feature change
Opened BZ1707516 BZ1707517 BZ1707519 to track fixes to noisy components. Based on audit logs of a cluster that has been running for a day: $ cat * | grep '"resource":"rolebindings"' | grep '"verb":"update"' | jq -r '.user.username+"\t->\t "+.objectRef.namespace+":"+.objectRef.name' | sort | uniq -c 3577 system:serviceaccount:openshift-machine-config-operator:default -> default:machine-config-daemon-events 3577 system:serviceaccount:openshift-machine-config-operator:default -> openshift-machine-config-operator:machine-config-daemon-events 320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator -> default:prometheus-k8s 320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator -> kube-system:prometheus-k8s 321 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator -> kube-system:resource-metrics-auth-reader 320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator -> openshift-apiserver:prometheus-k8s 320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator -> openshift-cluster-version:prometheus-k8s 320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator -> openshift-etcd:prometheus-k8s 321 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator -> openshift-kube-controller-manager:prometheus-k8s 321 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator -> openshift-kube-scheduler:prometheus-k8s 320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator -> openshift-monitoring:prometheus-k8s 320 system:serviceaccount:openshift-monitoring:cluster-monitoring-operator -> openshift-monitoring:prometheus-k8s-config 210 system:serviceaccount:openshift-network-operator:default -> openshift-infra:openshift-sdn-controller-account 210 system:serviceaccount:openshift-network-operator:default -> openshift-sdn:openshift-sdn-controller-leaderelection 210 system:serviceaccount:openshift-network-operator:default -> openshift-sdn:prometheus-k8s The above is over 10k writes to role bindings.
https://github.com/openshift/origin/pull/22783 merged however we'll be reverting this, as it's causing flakes. Opening a new BZ to track those, and will close this in favor of the above 3 BZs opened. The admission plugin for RoleBindingRestrictions is DefaultAllow, this was deemed ok bc they are seldom used. When introduced, this DefaultAllow behavior caused least backward compatibility issues. It's impossible to know if cache is up-to-date, so using an informer with RBRs will never work. As @deads2k put it, ‘“am I up to date with the namespace I'm asserting has no restrictions" is a question you can't answer’ Best we can do for this bug is to ensure components are not continuously updating role bindings. Rather, they should only update when the role binding is different than the expected value. BZs were opened against offending components https://bugzilla.redhat.com/show_bug.cgi?id=1706637#c3, this BZ can be closed.
https://bugzilla.redhat.com/show_bug.cgi?id=1720678 opened to track test failures introduced by https://github.com/openshift/origin/pull/22783