Bug 2188053
| Summary: | ocs-metrics-exporter cannot list/watch StorageCluster, StorageClass, CephBlockPool and other resources | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | suchita <sgatfane> | |
| Component: | ceph-monitoring | Assignee: | arun kumar mohan <amohan> | |
| Status: | CLOSED ERRATA | QA Contact: | suchita <sgatfane> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.13 | CC: | amohan, dbindra, muagarwa, nberry, nigoyal, ocs-bugs, odf-bz-bot, resoni, rohgupta, uchapaga | |
| Target Milestone: | --- | |||
| Target Release: | ODF 4.13.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2188228 (view as bug list) | Environment: | ||
| Last Closed: | 2023-06-21 15:25:08 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2188228 | |||
|
Description
suchita
2023-04-19 15:33:52 UTC
We are facing the same issue with ocs-operator.v4.13.0-168.stable Tested it on openshift-storage namespace with ocs-operator.v4.13.0-168.stable as well, and -------- I0420 09:09:21.025427 1 pv.go:92] Skipping non Ceph CSI RBD volume pvc-b92bc493-1986-47fe-b670-4365981cdd9f W0420 09:09:25.129800 1 reflector.go:424] /remote-source/app/metrics/internal/collectors/cluster-advance-feature-use.go:166: failed to list *v1.StorageClass: forbidden: User "system:serviceaccount:openshift-storage:ocs-metrics-exporter" cannot get path "/storageclasses" E0420 09:09:25.129833 1 reflector.go:140] /remote-source/app/metrics/internal/collectors/cluster-advance-feature-use.go:166: Failed to watch *v1.StorageClass: failed to list *v1.StorageClass: forbidden: User "system:serviceaccount:openshift-storage:ocs-metrics-exporter" cannot get path "/storageclasses" I0420 09:09:48.608998 1 rbd-mirror.go:282] RBD mirror store resync started at 2023-04-20 09:09:48.608988206 +0000 UTC m=+270.034122660 I0420 09:09:48.609039 1 rbd-mirror.go:307] RBD mirror store resync ended at 2023-04-20 09:09:48.609036093 +0000 UTC m=+270.034170558 W0420 09:09:51.323572 1 reflector.go:424] /remote-source/app/metrics/internal/collectors/storage-cluster.go:42: failed to list *v1.StorageCluster: the server could not find the requested resource (get storageclusters.ocs.openshift.io) E0420 09:09:51.323601 1 reflector.go:140] /remote-source/app/metrics/internal/collectors/storage-cluster.go:42: Failed to watch *v1.StorageCluster: failed to list *v1.StorageCluster: the server could not find the requested resource (get storageclusters.ocs.openshift.io) ------------------ On fusion-storage namespace we have one more error looking for rook-ceph-mon secret ---- I0420 08:46:00.537007 1 rbd-mirror.go:307] RBD mirror store resync ended at 2023-04-20 08:46:00.53700132 +0000 UTC m=+270.036117282 W0420 08:46:04.303569 1 reflector.go:424] /remote-source/app/metrics/internal/collectors/storage-cluster.go:42: failed to list *v1.StorageCluster: the server could not find the requested resource (get storageclusters.ocs.openshift.io) E0420 08:46:04.303600 1 reflector.go:140] /remote-source/app/metrics/internal/collectors/storage-cluster.go:42: Failed to watch *v1.StorageCluster: failed to list *v1.StorageCluster: the server could not find the requested resource (get storageclusters.ocs.openshift.io) W0420 08:46:05.777352 1 reflector.go:424] /remote-source/app/metrics/internal/collectors/cluster-advance-feature-use.go:166: failed to list *v1.StorageClass: forbidden: User "system:serviceaccount:fusion-storage:ocs-metrics-exporter" cannot get path "/storageclasses" E0420 08:46:05.777381 1 reflector.go:140] /remote-source/app/metrics/internal/collectors/cluster-advance-feature-use.go:166: Failed to watch *v1.StorageClass: failed to list *v1.StorageClass: forbidden: User "system:serviceaccount:fusion-storage:ocs-metrics-exporter" cannot get path "/storageclasses" I0420 08:46:15.856277 1 ceph-blocklist.go:103] Blocklist store sync started 2023-04-20 08:46:15.856258405 +0000 UTC m=+285.355374366 W0420 08:46:15.858938 1 reflector.go:347] /remote-source/app/metrics/internal/collectors/registry.go:86: watch of *v1.CephBlockPool ended with: failed to initialize ceph: failed to get secret in namespace "openshift-storage": secrets "rook-ceph-mon" not found I0420 08:46:30.537497 1 rbd-mirror.go:282] RBD mirror store resync started at 2023-04-20 08:46:30.537483786 +0000 UTC m=+300.036599737 I0420 08:46:30.537525 1 rbd-mirror.go:307] RBD mirror store resync ended at 2023-04-20 08:46:30.537522018 +0000 UTC m=+300.036637980 W0420 08:46:47.621098 1 reflector.go:424] /remote-source/app/metrics/internal/collectors/cluster-advance-feature-use.go:166: failed to list *v1.StorageClass: forbidden: User "system:serviceaccount:fusion-storage:ocs-metrics-exporter" cannot get path "/storageclasses" E0420 08:46:47.621141 1 reflector.go:140] /remote-source/app/metrics/internal/collectors/cluster-advance-feature-use.go:166: Failed to watch *v1.StorageClass: failed to list *v1.StorageClass: forbidden: User "system:serviceaccount:fusion-storage:ocs-metrics-exporter" cannot get path "/storageclasses" W0420 08:45:08.654920 1 reflector.go:347] /remote-source/app/metrics/internal/collectors/registry.go:86: watch of *v1.CephBlockPool ended with: failed to initialize ceph: failed to get secret in namespace "openshift-storage": secrets "rook-ceph-mon" not found --- Faced this issue with OCS v4.12.2 in `fusion-storage` and `openshift-storage` namespace A quick update. We have two different issues here, i) StorageCluster and CephBlockPool issue, where (the different) namespace passed is not taken. Code still accepting the default 'openshift-storage' namespace. ii) StorageClass issue: we had a fix which is backported to 4.12 (as well). Need to check why the added permissions didn't work for StorageClass resource. Tracking this issue under BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2150752 Working on the first (i) one now... PR raised to fix Ceph RBDMirror / BlockPool issue: https://github.com/red-hat-storage/ocs-operator/pull/2031 For StorageCluster issue, it might be a genuine issue (not a bug) as the metric-exporter could not find the 'StorageCluster' resource (it might be a case where StorageSystem is being created/installed and exporter might have started first). Verifying the case. Unable to list 'StorageCluster' issue still persist, verified it. Triaging further and making a fix. PR for StorageCluster and StorageClass issue is also up, by Umanga: https://github.com/red-hat-storage/ocs-operator/pull/2032 So all PRs are now up (for all the above issues)
---------------------------------------------------------------------------------------------------------------------------------------------
$ oc get pods| grep ocs-metrics-exporter
ocs-metrics-exporter-7bc87df98-q7pch 1/1 Running 0 3h33m
$ oc logs ocs-metrics-exporter-7bc87df98-q7pch
I0510 14:15:06.354028 1 main.go:29] using options: &{Apiserver: KubeconfigPath: Host:0.0.0.0 Port:8080 ExporterHost:0.0.0.0 ExporterPort:8081 Help:false AllowedNamespaces:[fusion-storage] flags:0xc00021a900 StopCh:<nil> Kubeconfig:<nil>}
W0510 14:15:06.354153 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
W0510 14:15:06.355082 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
W0510 14:15:06.356220 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0510 14:15:06.452126 1 main.go:73] Running metrics server on 0.0.0.0:8080
I0510 14:15:06.452145 1 main.go:74] Running telemetry server on 0.0.0.0:8081
I0510 14:15:06.854407 1 pv.go:102] PV store addition started at 2023-05-10 14:15:06.854397476 +0000 UTC m=+0.898865988 for PV pvc-198af0f3-d156-4bba-8c18-29ce9dcf7309
I0510 14:15:06.854444 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-198af0f3-d156-4bba-8c18-29ce9dcf7309
I0510 14:15:06.854449 1 pv.go:102] PV store addition started at 2023-05-10 14:15:06.854447296 +0000 UTC m=+0.898915767 for PV pvc-de717d45-ee26-49d2-baa2-56201bd620c2
I0510 14:15:06.854461 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-de717d45-ee26-49d2-baa2-56201bd620c2
I0510 14:15:06.854466 1 pv.go:102] PV store addition started at 2023-05-10 14:15:06.854463521 +0000 UTC m=+0.898931989 for PV pvc-eab19b91-4c90-40cd-9888-23a0e208d442
I0510 14:15:06.854475 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-eab19b91-4c90-40cd-9888-23a0e208d442
I0510 14:15:06.854479 1 pv.go:102] PV store addition started at 2023-05-10 14:15:06.854477429 +0000 UTC m=+0.898945900 for PV pvc-f5cc8875-3e93-47c4-b986-0e7fbd99ad35
I0510 14:15:06.854499 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-f5cc8875-3e93-47c4-b986-0e7fbd99ad35
I0510 14:15:34.048279 1 pv.go:102] PV store addition started at 2023-05-10 14:15:34.048269579 +0000 UTC m=+28.092738035 for PV pvc-b2d1def2-2ea3-47d8-9679-2c8c478d7a0e
I0510 14:15:34.048330 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-b2d1def2-2ea3-47d8-9679-2c8c478d7a0e
I0510 14:15:34.055333 1 pv.go:102] PV store addition started at 2023-05-10 14:15:34.05532566 +0000 UTC m=+28.099794120 for PV pvc-b2d1def2-2ea3-47d8-9679-2c8c478d7a0e
I0510 14:15:34.055353 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-b2d1def2-2ea3-47d8-9679-2c8c478d7a0e
I0510 14:15:34.383742 1 pv.go:102] PV store addition started at 2023-05-10 14:15:34.383729741 +0000 UTC m=+28.428198209 for PV pvc-c68d4e36-e479-4170-af79-bc41e6b1a660
I0510 14:15:34.383772 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-c68d4e36-e479-4170-af79-bc41e6b1a660
I0510 14:15:34.399455 1 pv.go:102] PV store addition started at 2023-05-10 14:15:34.399448904 +0000 UTC m=+28.443917360 for PV pvc-c68d4e36-e479-4170-af79-bc41e6b1a660
I0510 14:15:34.399476 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-c68d4e36-e479-4170-af79-bc41e6b1a660
I0510 14:15:34.777919 1 pv.go:102] PV store addition started at 2023-05-10 14:15:34.777908034 +0000 UTC m=+28.822376492 for PV pvc-b2d1def2-2ea3-47d8-9679-2c8c478d7a0e
I0510 14:15:34.777943 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-b2d1def2-2ea3-47d8-9679-2c8c478d7a0e
I0510 14:15:34.794492 1 pv.go:102] PV store addition started at 2023-05-10 14:15:34.794486035 +0000 UTC m=+28.838954492 for PV pvc-4edd0e09-caf2-471e-8c8d-ce1ef428abb3
I0510 14:15:34.794511 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-4edd0e09-caf2-471e-8c8d-ce1ef428abb3
I0510 14:15:34.801072 1 pv.go:102] PV store addition started at 2023-05-10 14:15:34.801067649 +0000 UTC m=+28.845536112 for PV pvc-4edd0e09-caf2-471e-8c8d-ce1ef428abb3
I0510 14:15:34.801090 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-4edd0e09-caf2-471e-8c8d-ce1ef428abb3
I0510 14:15:35.170277 1 pv.go:102] PV store addition started at 2023-05-10 14:15:35.170268103 +0000 UTC m=+29.214736561 for PV pvc-c68d4e36-e479-4170-af79-bc41e6b1a660
I0510 14:15:35.170298 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-c68d4e36-e479-4170-af79-bc41e6b1a660
I0510 14:15:35.573557 1 pv.go:102] PV store addition started at 2023-05-10 14:15:35.573546932 +0000 UTC m=+29.618015390 for PV pvc-4edd0e09-caf2-471e-8c8d-ce1ef428abb3
I0510 14:15:35.573579 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-4edd0e09-caf2-471e-8c8d-ce1ef428abb3
I0510 14:15:36.853108 1 rbd-mirror.go:296] RBD mirror store resync started at 2023-05-10 14:15:36.853095008 +0000 UTC m=+30.897563565
I0510 14:15:36.853168 1 rbd-mirror.go:321] RBD mirror store resync ended at 2023-05-10 14:15:36.853153016 +0000 UTC m=+30.897621486
I0510 14:15:36.855430 1 ceph-blocklist.go:105] Blocklist store sync started 2023-05-10 14:15:36.855420604 +0000 UTC m=+30.899889136
I0510 14:16:06.853978 1 rbd-mirror.go:296] RBD mirror store resync started at 2023-05-10 14:16:06.85396773 +0000 UTC m=+60.898436270
I0510 14:16:06.854011 1 rbd-mirror.go:321] RBD mirror store resync ended at 2023-05-10 14:16:06.854006883 +0000 UTC m=+60.898475343
I0510 14:16:36.854059 1 rbd-mirror.go:296] RBD mirror store resync started at 2023-05-10 14:16:36.854048962 +0000 UTC m=+90.898517491
I0510 14:16:36.854093 1 rbd-mirror.go:321] RBD mirror store resync ended at 2023-05-10 14:16:36.854089479 +0000 UTC m=+90.898557945
I0510 14:17:06.854611 1 rbd-mirror.go:296] RBD mirror store resync started at 2023-05-10 14:17:06.854594098 +0000 UTC m=+120.899062554
I0510 14:17:06.854644 1 rbd-mirror.go:321] RBD mirror store resync ended at 2023-05-10 14:17:06.854640686 +0000 UTC m=+120.899109153
I0510 14:17:06.855680 1 pv.go:244] PV store Resync started at 2023-05-10 14:17:06.855672843 +0000 UTC m=+120.900141367
I0510 14:17:06.951467 1 pv.go:255] now processing: pvc-198af0f3-d156-4bba-8c18-29ce9dcf7309
W0510 14:17:06.951576 1 reflector.go:347] /remote-source/app/metrics/internal/collectors/registry.go:63: watch of *v1.PersistentVolume ended with: failed to process PV: pvc-198af0f3-d156-4bba-8c18-29ce9dcf7309 err: unexpected object of type v1.PersistentVolume
I0510 14:17:08.394244 1 pv.go:102] PV store addition started at 2023-05-10 14:17:08.394237223 +0000 UTC m=+122.438705679 for PV pvc-f5cc8875-3e93-47c4-b986-0e7fbd99ad35
I0510 14:17:08.394268 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-f5cc8875-3e93-47c4-b986-0e7fbd99ad35
I0510 14:17:08.394273 1 pv.go:102] PV store addition started at 2023-05-10 14:17:08.394271089 +0000 UTC m=+122.438739557 for PV pvc-b2d1def2-2ea3-47d8-9679-2c8c478d7a0e
I0510 14:17:08.394279 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-b2d1def2-2ea3-47d8-9679-2c8c478d7a0e
I0510 14:17:08.394282 1 pv.go:102] PV store addition started at 2023-05-10 14:17:08.3942808 +0000 UTC m=+122.438749266 for PV pvc-c68d4e36-e479-4170-af79-bc41e6b1a660
I0510 14:17:08.394287 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-c68d4e36-e479-4170-af79-bc41e6b1a660
I0510 14:17:08.394290 1 pv.go:102] PV store addition started at 2023-05-10 14:17:08.394289253 +0000 UTC m=+122.438757718 for PV pvc-4edd0e09-caf2-471e-8c8d-ce1ef428abb3
I0510 14:17:08.394295 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-4edd0e09-caf2-471e-8c8d-ce1ef428abb3
I0510 14:17:08.394298 1 pv.go:102] PV store addition started at 2023-05-10 14:17:08.394297091 +0000 UTC m=+122.438765557 for PV pvc-198af0f3-d156-4bba-8c18-29ce9dcf7309
I0510 14:17:08.394321 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-198af0f3-d156-4bba-8c18-29ce9dcf7309
I0510 14:17:08.394326 1 pv.go:102] PV store addition started at 2023-05-10 14:17:08.394324258 +0000 UTC m=+122.438792732 for PV pvc-de717d45-ee26-49d2-baa2-56201bd620c2
I0510 14:17:08.394335 1 pv.go:106] Skipping non Ceph CSI RBD volume pvc-de717d45-ee26-49d2-baa2-56201bd620c2
I0510 14:17:08.394339 1 pv.go:102] PV store addition started at 2023-05-10 14:17:08.394337665 +0000 UTC m=+122.438806132 for PV pvc-eab19b91-4c90-40cd-9888-23a0e208d442
---------------------------------------------------------------------------------------------------------------------------------------
$ oc get clusterrolebindings -o custom-columns='KIND:kind,NAMESPACE:metadata.namespace,NAME:metadata.name,SERVICE_ACCOUNTS:subjects[?(@.kind=="ServiceAccount")].name' |grep "ocs-metrics-exporter"
ClusterRoleBinding <none> ocs-operator.v4.13.0-186.stable-844794cb44 ocs-metrics-exporter
ClusterRoleBinding <none> ocs-operator.v4.13.0-186.stable-f55465d46 ocs-metrics-exporter
$ oc get clusterrole ocs-operator.v4.13.0-186.stable-844794cb44 -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: "2023-05-10T14:14:29Z"
labels:
olm.owner: ocs-operator.v4.13.0-186.stable
olm.owner.kind: ClusterServiceVersion
olm.owner.namespace: fusion-storage
operators.coreos.com/ocs-operator.fusion-storage: ""
name: ocs-operator.v4.13.0-186.stable-844794cb44
resourceVersion: "100955"
uid: 2af27687-e16e-48a4-b3e6-1ecf0118a701
rules:
- apiGroups:
- ceph.rook.io
resources:
- cephobjectstores
- cephblockpools
- cephclusters
- cephrbdmirrors
verbs:
- get
- list
- watch
- apiGroups:
- quota.openshift.io
resources:
- clusterresourcequotas
verbs:
- get
- list
- watch
- apiGroups:
- objectbucket.io
resources:
- objectbuckets
verbs:
- get
- list
- apiGroups:
- ""
resources:
- configmaps
- secrets
verbs:
- get
- list
- apiGroups:
- ""
resources:
- persistentvolumes
- persistentvolumeclaims
- pods
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
verbs:
- get
- list
- watch
- apiGroups:
- ocs.openshift.io
resources:
- storageconsumers
- storageclusters
verbs:
- get
- list
- watch
ocs-metrics-exporter
$ oc get clusterrole ocs-operator.v4.13.0-186.stable-f55465d46 -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: "2023-05-10T14:14:29Z"
labels:
olm.owner: ocs-operator.v4.13.0-186.stable
olm.owner.kind: ClusterServiceVersion
olm.owner.namespace: fusion-storage
operators.coreos.com/ocs-operator.fusion-storage: ""
name: ocs-operator.v4.13.0-186.stable-f55465d46
resourceVersion: "100986"
uid: 52defeb5-cf0d-4923-8abb-9993f6f1cbb3
rules:
- apiGroups:
- monitoring.coreos.com
resources:
- '*'
verbs:
- '*'
---------------------------------------------------------------------------------------------------------------------------------------
Verified on :
$ oc get csv
oc NAME DISPLAY VERSION REPLACES PHASE
managed-fusion-agent.v2.0.11 Managed Fusion Agent 2.0.11 Succeeded
observability-operator.v0.0.20 Observability Operator 0.0.20 observability-operator.v0.0.19 Succeeded
ocs-operator.v4.13.0-186.stable OpenShift Container Storage 4.13.0-186.stable Succeeded
ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 Succeeded
route-monitor-operator.v0.1.500-6152b76 Route Monitor Operator 0.1.500-6152b76 route-monitor-operator.v0.1.498-e33e391 Succeeded
$ oc get csv ocs-operator.v4.13.0-186.stable -o yaml | grep full_version
full_version: 4.13.0-186
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.12.15 True False 4h12m Error while reconciling 4.12.15: the cluster operator monitoring is not available
Conclusion: ocs-metrics-exporter can list/watch StorageCluster, StorageClass, CephBlockPool and other resources except PersistentVolume
"W0510 14:17:06.951576 1 reflector.go:347] /remote-source/app/metrics/internal/collectors/registry.go:63: watch of *v1.PersistentVolume ended with: failed to process PV: pvc-198af0f3-d156-4bba-8c18-29ce9dcf7309 err: unexpected object of type v1.PersistentVolume"
As per Comment17 opened a separate bug https://bugzilla.redhat.com/show_bug.cgi?id=2208302 to track persistentvolume. Marking this BZ as verified, refering to ci=omment 17 and comment 15 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742 |