Description of problem (please be detailed as possible and provide log snippests): In a multicluster scenario, rook-ceph operator throws the following error message, which indicates that the operator is unable to create a ServiceMonitor on `openshift-storage-extended` namespace (external namespace in which rook-ceph operator is not deployed) ``` 2024-01-09 10:16:04.208143 E | ceph-cluster-controller: failed to enable external service monitor. service monitor could not be enabled: failed to retrieve servicemonitor. servicemonitors.monitoring.coreos.com "rook-ceph-mgr" is forbidden: User "system:serviceaccount:openshift-storage:rook-ceph-system" cannot get resource "servicemonitors" in API group "monitoring.coreos.com" in the namespace "openshift-storage-extended" ``` This ServiceMonitor is required to collect/get the external cluster metrics in a multicluster mode. Further, we need to make sure rook-ceph operator is configured correctly for monitoring external ceph-exporter. This might cause metrics to be missing even with fixes on the ceph-exporter (refer BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2257619). Version of all relevant components (if applicable): Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, it is currently blocking BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2255036 Is there any workaround available to the best of your knowledge? We can manually add permission or create a ServiceMonitor (manually) on the extended namespace. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: NA Steps to Reproduce: 1. Create a multicluster setup 2. Check the rook-ceph opreator log 3. We should get the above mentioned error message Actual results: Error creating the needed ServiceMonitor on 'openshift-storage-extended' namespace. Expected results: Rook operator should have enough permissions to manage ServiceMonitors in all managed namespaces and we should not be hitting any error during the SM creation on other namespaces. Additional info:
(In reply to arun kumar mohan from comment #0) > Description of problem (please be detailed as possible and provide log > snippests): > ``` > 2024-01-09 10:16:04.208143 E | ceph-cluster-controller: failed to enable > external service monitor. service monitor could not be enabled: failed to > retrieve servicemonitor. servicemonitors.monitoring.coreos.com > "rook-ceph-mgr" is forbidden: User > "system:serviceaccount:openshift-storage:rook-ceph-system" cannot get > resource "servicemonitors" in API group "monitoring.coreos.com" in the > namespace "openshift-storage-extended" > ``` > I addressing fixing this with https://github.com/rook/rook/pull/13338 and providing all the required permissions to `system:serviceaccount:openshift-storage:rook-ceph-system`
Thanks Santosh. Moving this to ocs-operator component, as the permissions for rook are managed through ocs-operator. Will check in the ocs-operator side to add the needed permissions.
PR added: https://github.com/red-hat-storage/ocs-operator/pull/2392
This are all the messages regrading openshift-storage-extended: 2024-02-21 09:55:16.498756 I | ceph-cluster-controller: reconciling ceph cluster in namespace "openshift-storage-extended" 2024-02-21 09:55:20.210664 I | op-bucket-prov: ceph bucket provisioner launched watching for provisioner "openshift-storage-extended.ceph.rook.io/bucket" I0221 09:55:20.211189 1 manager.go:135] "msg"="starting provisioner" "logger"="objectbucket.io/provisioner-manager" "name"="openshift-storage-extended.ceph.rook.io/bucket" 2024-02-21 09:55:22.809053 I | cephclient: writing config file /var/lib/rook/openshift-storage-extended/openshift-storage-extended.config 2024-02-21 09:55:22.809260 I | cephclient: generated admin config in /var/lib/rook/openshift-storage-extended 2024-02-21 09:55:26.903457 I | ceph-cluster-controller: cluster "openshift-storage-extended": version "17.2.6-170 quincy" detected for image "" 2024-02-21 09:55:31.213651 I | ceph-cluster-controller: enabling ceph mon monitoring goroutine for cluster "openshift-storage-extended" 2024-02-21 09:55:31.213687 I | ceph-cluster-controller: enabling ceph status monitoring goroutine for cluster "openshift-storage-extended" 2024-02-21 09:56:16.795942 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"openshift-storage-extended","monitors":["10.0.211.11:6789","10.0.211.41:6789","10.0.210.140:6789","10.0.209.48:6789","10.0.211.40:6789"],"namespace":""}] data:ceph-ci-srozen-msc4-jt9f8-as520x-node1-installer=10.0.211.11:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node2=10.0.211.41:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node3=10.0.210.140:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node4=10.0.209.48:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node7=10.0.211.40:6789 mapping:{"node":{}} maxMonId:0 outOfQuorum:] 2024-02-21 09:56:16.827327 I | cephclient: writing config file /var/lib/rook/openshift-storage-extended/openshift-storage-extended.config 2024-02-21 09:56:16.827599 I | cephclient: generated admin config in /var/lib/rook/openshift-storage-extended The error is not found. Moving to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383