Bug 2257674 - rook-ceph operator don't have permissions to create ServiceMonitor on external namespaces
Summary: rook-ceph operator don't have permissions to create ServiceMonitor on externa...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.14
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ODF 4.15.0
Assignee: arun kumar mohan
QA Contact: Shay Rozen
URL:
Whiteboard:
Depends On:
Blocks: 2255036
TreeView+ depends on / blocked
 
Reported: 2024-01-10 12:55 UTC by arun kumar mohan
Modified: 2024-03-19 15:31 UTC (History)
6 users (show)

Fixed In Version: 4.15.0-130
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-03-19 15:31:02 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 2392 0 None open Adding 'ClusterRole' and 'ClusterRoleBinding' to rook-ceph-mgr 2024-01-17 11:29:27 UTC
Github red-hat-storage ocs-operator pull 2426 0 None open Bug 2257674: [release-4.15] Adding 'ClusterRole' and 'ClusterRoleBinding' to rook-ceph-mgr 2024-01-29 07:34:32 UTC
Github red-hat-storage ocs-operator pull 2435 0 None open Bug 2257674: [release-4.15] split rook monitoring rbac into multiple files 2024-01-31 06:29:26 UTC
Red Hat Product Errata RHSA-2024:1383 0 None None None 2024-03-19 15:31:09 UTC

Description arun kumar mohan 2024-01-10 12:55:26 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In a multicluster scenario, rook-ceph operator throws the following error message, which indicates that the operator is unable to create a ServiceMonitor on `openshift-storage-extended` namespace (external namespace in which rook-ceph operator is not deployed)

```
2024-01-09 10:16:04.208143 E | ceph-cluster-controller: failed to enable external service monitor. service monitor could not be enabled: failed to retrieve servicemonitor. servicemonitors.monitoring.coreos.com "rook-ceph-mgr" is forbidden: User "system:serviceaccount:openshift-storage:rook-ceph-system" cannot get resource "servicemonitors" in API group "monitoring.coreos.com" in the namespace "openshift-storage-extended"
```

This ServiceMonitor is required to collect/get the external cluster metrics in a multicluster mode.

Further, we need to make sure rook-ceph operator is configured correctly for monitoring external ceph-exporter. This might cause metrics to be missing even with fixes on the ceph-exporter (refer BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2257619).

Version of all relevant components (if applicable):


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, it is currently blocking BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2255036

Is there any workaround available to the best of your knowledge?
We can manually add permission or create a ServiceMonitor (manually) on the extended namespace.


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
NA

If this is a regression, please provide more details to justify this:
NA

Steps to Reproduce:
1. Create a multicluster setup
2. Check the rook-ceph opreator log
3. We should get the above mentioned error message


Actual results:
Error creating the needed ServiceMonitor on 'openshift-storage-extended' namespace.


Expected results:
Rook operator should have enough permissions to manage ServiceMonitors in all managed namespaces and we should not be hitting any error during the SM creation on other namespaces.

Additional info:

Comment 2 Santosh Pillai 2024-01-10 14:50:00 UTC
(In reply to arun kumar mohan from comment #0)
> Description of problem (please be detailed as possible and provide log
> snippests):

> ```
> 2024-01-09 10:16:04.208143 E | ceph-cluster-controller: failed to enable
> external service monitor. service monitor could not be enabled: failed to
> retrieve servicemonitor. servicemonitors.monitoring.coreos.com
> "rook-ceph-mgr" is forbidden: User
> "system:serviceaccount:openshift-storage:rook-ceph-system" cannot get
> resource "servicemonitors" in API group "monitoring.coreos.com" in the
> namespace "openshift-storage-extended"
> ```
> 


I addressing fixing this with https://github.com/rook/rook/pull/13338 and providing all the required permissions to `system:serviceaccount:openshift-storage:rook-ceph-system`

Comment 3 arun kumar mohan 2024-01-11 07:46:41 UTC
Thanks Santosh. Moving this to ocs-operator component, as the permissions for rook are managed through ocs-operator.
Will check in the ocs-operator side to add the needed permissions.

Comment 4 arun kumar mohan 2024-01-17 11:29:28 UTC
PR added: https://github.com/red-hat-storage/ocs-operator/pull/2392

Comment 9 Shay Rozen 2024-02-21 14:10:43 UTC
This are all the messages regrading openshift-storage-extended:
2024-02-21 09:55:16.498756 I | ceph-cluster-controller: reconciling ceph cluster in namespace "openshift-storage-extended"
2024-02-21 09:55:20.210664 I | op-bucket-prov: ceph bucket provisioner launched watching for provisioner "openshift-storage-extended.ceph.rook.io/bucket"
I0221 09:55:20.211189       1 manager.go:135] "msg"="starting provisioner" "logger"="objectbucket.io/provisioner-manager" "name"="openshift-storage-extended.ceph.rook.io/bucket"
2024-02-21 09:55:22.809053 I | cephclient: writing config file /var/lib/rook/openshift-storage-extended/openshift-storage-extended.config
2024-02-21 09:55:22.809260 I | cephclient: generated admin config in /var/lib/rook/openshift-storage-extended
2024-02-21 09:55:26.903457 I | ceph-cluster-controller: cluster "openshift-storage-extended": version "17.2.6-170 quincy" detected for image ""
2024-02-21 09:55:31.213651 I | ceph-cluster-controller: enabling ceph mon monitoring goroutine for cluster "openshift-storage-extended"
2024-02-21 09:55:31.213687 I | ceph-cluster-controller: enabling ceph status monitoring goroutine for cluster "openshift-storage-extended"
2024-02-21 09:56:16.795942 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"openshift-storage-extended","monitors":["10.0.211.11:6789","10.0.211.41:6789","10.0.210.140:6789","10.0.209.48:6789","10.0.211.40:6789"],"namespace":""}] data:ceph-ci-srozen-msc4-jt9f8-as520x-node1-installer=10.0.211.11:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node2=10.0.211.41:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node3=10.0.210.140:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node4=10.0.209.48:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node7=10.0.211.40:6789 mapping:{"node":{}} maxMonId:0 outOfQuorum:]
2024-02-21 09:56:16.827327 I | cephclient: writing config file /var/lib/rook/openshift-storage-extended/openshift-storage-extended.config
2024-02-21 09:56:16.827599 I | cephclient: generated admin config in /var/lib/rook/openshift-storage-extended

The error is not found. Moving to verified.

Comment 10 Shay Rozen 2024-02-21 14:11:35 UTC
This are all the messages regrading openshift-storage-extended:
2024-02-21 09:55:16.498756 I | ceph-cluster-controller: reconciling ceph cluster in namespace "openshift-storage-extended"
2024-02-21 09:55:20.210664 I | op-bucket-prov: ceph bucket provisioner launched watching for provisioner "openshift-storage-extended.ceph.rook.io/bucket"
I0221 09:55:20.211189       1 manager.go:135] "msg"="starting provisioner" "logger"="objectbucket.io/provisioner-manager" "name"="openshift-storage-extended.ceph.rook.io/bucket"
2024-02-21 09:55:22.809053 I | cephclient: writing config file /var/lib/rook/openshift-storage-extended/openshift-storage-extended.config
2024-02-21 09:55:22.809260 I | cephclient: generated admin config in /var/lib/rook/openshift-storage-extended
2024-02-21 09:55:26.903457 I | ceph-cluster-controller: cluster "openshift-storage-extended": version "17.2.6-170 quincy" detected for image ""
2024-02-21 09:55:31.213651 I | ceph-cluster-controller: enabling ceph mon monitoring goroutine for cluster "openshift-storage-extended"
2024-02-21 09:55:31.213687 I | ceph-cluster-controller: enabling ceph status monitoring goroutine for cluster "openshift-storage-extended"
2024-02-21 09:56:16.795942 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"openshift-storage-extended","monitors":["10.0.211.11:6789","10.0.211.41:6789","10.0.210.140:6789","10.0.209.48:6789","10.0.211.40:6789"],"namespace":""}] data:ceph-ci-srozen-msc4-jt9f8-as520x-node1-installer=10.0.211.11:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node2=10.0.211.41:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node3=10.0.210.140:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node4=10.0.209.48:6789,ceph-ci-srozen-msc4-jt9f8-as520x-node7=10.0.211.40:6789 mapping:{"node":{}} maxMonId:0 outOfQuorum:]
2024-02-21 09:56:16.827327 I | cephclient: writing config file /var/lib/rook/openshift-storage-extended/openshift-storage-extended.config
2024-02-21 09:56:16.827599 I | cephclient: generated admin config in /var/lib/rook/openshift-storage-extended

The error is not found. Moving to verified.

Comment 12 errata-xmlrpc 2024-03-19 15:31:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383


Note You need to log in before you can comment on or make changes to this bug.