Bug 1822097

Summary: CMO fails to sync (cluster)rolebindings when roleRef has been changed
Product: OpenShift Container Platform Reporter: Simon Pasquier <spasquie>
Component: MonitoringAssignee: Sergiusz Urbaniak <surbania>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.5CC: alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: groom
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
no doc update needed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:26:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Simon Pasquier 2020-04-08 09:09:06 UTC
Description of problem:
If an admin updates manually one of the bindings managed by CMO, the operator fails to reconcile the resource to the expected state.

By design, a binding's roleRef can't be changed after creation (see https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrolebinding-example). To change the roleRef of an existing resource, it needs to be deleted and recreated.

Version-Release number of selected component (if applicable):
4.5

How reproducible:
Always

Steps to Reproduce:
1. cat <<EOF > custom-binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus-k8s
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: thanos-querier
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: openshift-monitoring
EOF
2. Update the "prometheus-k8s" cluster role binding to reference "thanos-querier" instead of "prometheus-k8s".
oc auth reconcile -f custom-binding.yaml
3. Verify that the binding has been updated
oc get clusterrolebindings prometheus-k8s -o jsonpath='{.roleRef.name}'
4. Check the CMO logs.

Actual results:
CMO fails to reconcile the binding.

I0408 09:01:46.767141       1 operator.go:340] Updating ClusterOperator status to failed. Err: running task Updating Prometheus-k8s failed: reconciling Prometheus ClusterRoleBinding failed: updating ClusterRoleBinding object failed: ClusterRoleBinding.rbac.authorization.k8s.io "prometheus-k8s" is invalid: roleRef: Invalid value: rbac.RoleRef{APIGroup:"rbac.authorization.k8s.io", Kind:"ClusterRole", Name:"prometheus-k8s"}: cannot change roleRef
E0408 09:01:46.779512       1 operator.go:272] Syncing "openshift-monitoring/cluster-monitoring-config" failed
E0408 09:01:46.779647       1 operator.go:273] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating Prometheus-k8s failed: reconciling Prometheus ClusterRoleBinding failed: updating ClusterRoleBinding object failed: ClusterRoleBinding.rbac.authorization.k8s.io "prometheus-k8s" is invalid: roleRef: Invalid value: rbac.RoleRef{APIGroup:"rbac.authorization.k8s.io", Kind:"ClusterRole", Name:"prometheus-k8s"}: cannot change roleRef

Expected results:
CMO reconciles the binding without error.

Additional info:
Relevant code in the CMO repository:
* https://github.com/openshift/cluster-monitoring-operator/blob/7a2264c8469aa8168b4f3c28d42f0982c200b538/pkg/client/client.go#L980-L1000
* https://github.com/openshift/cluster-monitoring-operator/blob/7a2264c8469aa8168b4f3c28d42f0982c200b538/pkg/client/client.go#L1032-L1045

Issue filed from https://bugzilla.redhat.com/show_bug.cgi?id=1820230#c8

Workaround:
Delete the offending (cluster)role binding and let CMO recreate it properly.

Comment 3 Junqi Zhao 2020-05-06 10:36:26 UTC
tested with 4.5.0-0.nightly-2020-05-05-205255 and followed steps in Comment 0, the rolebindings could be reconciled without error when roleRef has been changed
# oc -n openshift-monitoring logs cluster-monitoring-operator-57cb74c7ff-x8tgl  -c cluster-monitoring-operator | grep "cannot change roleRef"
no result

Comment 4 errata-xmlrpc 2020-07-13 17:26:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409