Bug 2257674

Summary: rook-ceph operator don't have permissions to create ServiceMonitor on external namespaces
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: arun kumar mohan <amohan>
Component: ocs-operatorAssignee: arun kumar mohan <amohan>
Status: CLOSED ERRATA QA Contact: Shay Rozen <srozen>
Severity: high Docs Contact:
Priority: high    
Version: 4.14CC: muagarwa, nthomas, odf-bz-bot, skatiyar, tnielsen, uchapaga
Target Milestone: ---   
Target Release: ODF 4.15.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.15.0-130 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-03-19 15:31:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2255036    

Description arun kumar mohan 2024-01-10 12:55:26 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In a multicluster scenario, rook-ceph operator throws the following error message, which indicates that the operator is unable to create a ServiceMonitor on `openshift-storage-extended` namespace (external namespace in which rook-ceph operator is not deployed)

```
2024-01-09 10:16:04.208143 E | ceph-cluster-controller: failed to enable external service monitor. service monitor could not be enabled: failed to retrieve servicemonitor. servicemonitors.monitoring.coreos.com "rook-ceph-mgr" is forbidden: User "system:serviceaccount:openshift-storage:rook-ceph-system" cannot get resource "servicemonitors" in API group "monitoring.coreos.com" in the namespace "openshift-storage-extended"
```

This ServiceMonitor is required to collect/get the external cluster metrics in a multicluster mode.

Further, we need to make sure rook-ceph operator is configured correctly for monitoring external ceph-exporter. This might cause metrics to be missing even with fixes on the ceph-exporter (refer BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2257619).

Version of all relevant components (if applicable):


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, it is currently blocking BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2255036

Is there any workaround available to the best of your knowledge?
We can manually add permission or create a ServiceMonitor (manually) on the extended namespace.


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
NA

If this is a regression, please provide more details to justify this:
NA

Steps to Reproduce:
1. Create a multicluster setup
2. Check the rook-ceph opreator log
3. We should get the above mentioned error message


Actual results:
Error creating the needed ServiceMonitor on 'openshift-storage-extended' namespace.


Expected results:
Rook operator should have enough permissions to manage ServiceMonitors in all managed namespaces and we should not be hitting any error during the SM creation on other namespaces.

Additional info:

Comment 2 Santosh Pillai 2024-01-10 14:50:00 UTC
(In reply to arun kumar mohan from comment #0)
> Description of problem (please be detailed as possible and provide log
> snippests):

> ```
> 2024-01-09 10:16:04.208143 E | ceph-cluster-controller: failed to enable
> external service monitor. service monitor could not be enabled: failed to
> retrieve servicemonitor. servicemonitors.monitoring.coreos.com
> "rook-ceph-mgr" is forbidden: User
> "system:serviceaccount:openshift-storage:rook-ceph-system" cannot get
> resource "servicemonitors" in API group "monitoring.coreos.com" in the
> namespace "openshift-storage-extended"
> ```
> 


I addressing fixing this with https://github.com/rook/rook/pull/13338 and providing all the required permissions to `system:serviceaccount:openshift-storage:rook-ceph-system`

Comment 3 arun kumar mohan 2024-01-11 07:46:41 UTC
Thanks Santosh. Moving this to ocs-operator component, as the permissions for rook are managed through ocs-operator.
Will check in the ocs-operator side to add the needed permissions.

Comment 4 arun kumar mohan 2024-01-17 11:29:28 UTC
PR added: https://github.com/red-hat-storage/ocs-operator/pull/2392

Comment 9 Shay Rozen 2024-02-21 14:10:43 UTC
This are all the messages regrading openshift-storage-extended:
2024-02-21 09:55:16.498756 I | ceph-cluster-controller: reconciling ceph cluster in namespace "openshift-storage-extended"
2024-02-21 09:55:20.210664 I | op-bucket-prov: ceph bucket provisioner launched watching for provisioner "openshift-storage-extended.ceph.rook.io/bucket"
I0221 09:55:20.211189       1 manager.go:135] "msg"="starting provisioner" "logger"="objectbucket.io/provisioner-manager" "name"="openshift-storage-extended.ceph.rook.io/bucket"
2024-02-21 09:55:22.809053 I | cephclient: writing config file /var/lib/rook/openshift-storage-extended/openshift-storage-extended.config
2024-02-21 09:55:22.809260 I | cephclient: generated admin config in /var/lib/rook/openshift-storage-extended
2024-02-21 09:55:26.903457 I | ceph-cluster-controller: cluster "openshift-storage-extended": version "17.2.6-170 quincy" detected for image ""
2024-02-21 09:55:31.213651 I | ceph-cluster-controller: enabling ceph mon monitoring goroutine for cluster "openshift-storage-extended"
2024-02-21 09:55:31.213687 I | ceph-cluster-controller: enabling ceph status monitoring goroutine for cluster "openshift-storage-extended"
2024-02-21 09:56:16.795942 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"openshift-storage-extended","monitors":["10.0.211.11:6789","10.0.211.41:6789","10.0.210.140:6789","10.0.209.48:6789","10.0.211.40:6789"],"namespace":""}] data:ceph-ci-srozen-msc4-jt9f8-as520x-node1-installer=10.0.211.11:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node2=10.0.211.41:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node3=10.0.210.140:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node4=10.0.209.48:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node7=10.0.211.40:6789 mapping:{"node":{}} maxMonId:0 outOfQuorum:]
2024-02-21 09:56:16.827327 I | cephclient: writing config file /var/lib/rook/openshift-storage-extended/openshift-storage-extended.config
2024-02-21 09:56:16.827599 I | cephclient: generated admin config in /var/lib/rook/openshift-storage-extended

The error is not found. Moving to verified.

Comment 10 Shay Rozen 2024-02-21 14:11:35 UTC
This are all the messages regrading openshift-storage-extended:
2024-02-21 09:55:16.498756 I | ceph-cluster-controller: reconciling ceph cluster in namespace "openshift-storage-extended"
2024-02-21 09:55:20.210664 I | op-bucket-prov: ceph bucket provisioner launched watching for provisioner "openshift-storage-extended.ceph.rook.io/bucket"
I0221 09:55:20.211189       1 manager.go:135] "msg"="starting provisioner" "logger"="objectbucket.io/provisioner-manager" "name"="openshift-storage-extended.ceph.rook.io/bucket"
2024-02-21 09:55:22.809053 I | cephclient: writing config file /var/lib/rook/openshift-storage-extended/openshift-storage-extended.config
2024-02-21 09:55:22.809260 I | cephclient: generated admin config in /var/lib/rook/openshift-storage-extended
2024-02-21 09:55:26.903457 I | ceph-cluster-controller: cluster "openshift-storage-extended": version "17.2.6-170 quincy" detected for image ""
2024-02-21 09:55:31.213651 I | ceph-cluster-controller: enabling ceph mon monitoring goroutine for cluster "openshift-storage-extended"
2024-02-21 09:55:31.213687 I | ceph-cluster-controller: enabling ceph status monitoring goroutine for cluster "openshift-storage-extended"
2024-02-21 09:56:16.795942 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"openshift-storage-extended","monitors":["10.0.211.11:6789","10.0.211.41:6789","10.0.210.140:6789","10.0.209.48:6789","10.0.211.40:6789"],"namespace":""}] data:ceph-ci-srozen-msc4-jt9f8-as520x-node1-installer=10.0.211.11:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node2=10.0.211.41:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node3=10.0.210.140:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node4=10.0.209.48:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node7=10.0.211.40:6789 mapping:{"node":{}} maxMonId:0 outOfQuorum:]
2024-02-21 09:56:16.827327 I | cephclient: writing config file /var/lib/rook/openshift-storage-extended/openshift-storage-extended.config
2024-02-21 09:56:16.827599 I | cephclient: generated admin config in /var/lib/rook/openshift-storage-extended

The error is not found. Moving to verified.

Comment 12 errata-xmlrpc 2024-03-19 15:31:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383