Bug 1822097 - CMO fails to sync (cluster)rolebindings when roleRef has been changed
Summary: CMO fails to sync (cluster)rolebindings when roleRef has been changed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Sergiusz Urbaniak
QA Contact: Junqi Zhao
URL:
Whiteboard: groom
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-08 09:09 UTC by Simon Pasquier
Modified: 2020-07-13 17:26 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
no doc update needed.
Clone Of:
Environment:
Last Closed: 2020-07-13 17:26:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 778 0 None closed Bug 1822097: pkg/client: delete and create clusterrolebindings instead of updating 2021-01-21 02:00:51 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:26:21 UTC

Description Simon Pasquier 2020-04-08 09:09:06 UTC
Description of problem:
If an admin updates manually one of the bindings managed by CMO, the operator fails to reconcile the resource to the expected state.

By design, a binding's roleRef can't be changed after creation (see https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrolebinding-example). To change the roleRef of an existing resource, it needs to be deleted and recreated.

Version-Release number of selected component (if applicable):
4.5

How reproducible:
Always

Steps to Reproduce:
1. cat <<EOF > custom-binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus-k8s
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: thanos-querier
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: openshift-monitoring
EOF
2. Update the "prometheus-k8s" cluster role binding to reference "thanos-querier" instead of "prometheus-k8s".
oc auth reconcile -f custom-binding.yaml
3. Verify that the binding has been updated
oc get clusterrolebindings prometheus-k8s -o jsonpath='{.roleRef.name}'
4. Check the CMO logs.

Actual results:
CMO fails to reconcile the binding.

I0408 09:01:46.767141       1 operator.go:340] Updating ClusterOperator status to failed. Err: running task Updating Prometheus-k8s failed: reconciling Prometheus ClusterRoleBinding failed: updating ClusterRoleBinding object failed: ClusterRoleBinding.rbac.authorization.k8s.io "prometheus-k8s" is invalid: roleRef: Invalid value: rbac.RoleRef{APIGroup:"rbac.authorization.k8s.io", Kind:"ClusterRole", Name:"prometheus-k8s"}: cannot change roleRef
E0408 09:01:46.779512       1 operator.go:272] Syncing "openshift-monitoring/cluster-monitoring-config" failed
E0408 09:01:46.779647       1 operator.go:273] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating Prometheus-k8s failed: reconciling Prometheus ClusterRoleBinding failed: updating ClusterRoleBinding object failed: ClusterRoleBinding.rbac.authorization.k8s.io "prometheus-k8s" is invalid: roleRef: Invalid value: rbac.RoleRef{APIGroup:"rbac.authorization.k8s.io", Kind:"ClusterRole", Name:"prometheus-k8s"}: cannot change roleRef

Expected results:
CMO reconciles the binding without error.

Additional info:
Relevant code in the CMO repository:
* https://github.com/openshift/cluster-monitoring-operator/blob/7a2264c8469aa8168b4f3c28d42f0982c200b538/pkg/client/client.go#L980-L1000
* https://github.com/openshift/cluster-monitoring-operator/blob/7a2264c8469aa8168b4f3c28d42f0982c200b538/pkg/client/client.go#L1032-L1045

Issue filed from https://bugzilla.redhat.com/show_bug.cgi?id=1820230#c8

Workaround:
Delete the offending (cluster)role binding and let CMO recreate it properly.

Comment 3 Junqi Zhao 2020-05-06 10:36:26 UTC
tested with 4.5.0-0.nightly-2020-05-05-205255 and followed steps in Comment 0, the rolebindings could be reconciled without error when roleRef has been changed
# oc -n openshift-monitoring logs cluster-monitoring-operator-57cb74c7ff-x8tgl  -c cluster-monitoring-operator | grep "cannot change roleRef"
no result

Comment 4 errata-xmlrpc 2020-07-13 17:26:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.