Description of problem (please be detailed as possible and provide log snippests): In an internal ceph cluster without rbd mirroring, the ocs-metrics-exporter shows the following logs: E0112 05:44:59.705009 1 ceph-block-pool.go:137] Invalid image health for pool ocs-storagecluster-cephblockpool. Must be OK, UNKNOWN, WARNING or ERROR I0112 05:45:07.332607 1 rbd-mirror.go:296] RBD mirror store resync started at 2024-01-12 05:45:07.332593909 +0000 UTC m=+2061519.616778751 I0112 05:45:07.332637 1 rbd-mirror.go:321] RBD mirror store resync ended at 2024-01-12 05:45:07.332633306 +0000 UTC m=+2061519.616818150 E0112 05:45:18.347842 1 rbd-mirror.go:371] command rbd timedout in 30 seconds I0112 05:45:18.347892 1 trace.go:236] Trace[1389586998]: "Reflector ListAndWatch" name:/remote-source/app/metrics/internal/collectors/registry.go:63 (12-Jan-2024 05:44:48.338) (total time: 30008ms): Trace[1389586998]: [30.008962884s] [30.008962884s] END E0112 05:45:18.347913 1 reflector.go:147] /remote-source/app/metrics/internal/collectors/registry.go:63: Failed to watch *v1.PersistentVolume: unable to sync list result: failed to get image status failed with output : , err: context deadline exceeded E0112 05:45:26.159054 1 ceph-block-pool.go:137] Invalid image health for pool ocs-storagecluster-cephblockpool. Must be OK, UNKNOWN, WARNING or ERROR When looking into the cluster, we can double check with the ceph tools in the cluster that it is not configured: sh-5.1$ rbd mirror pool status ocs-storagecluster-cephblockpool rbd: mirroring not enabled on the pool The relevant code is https://github.com/red-hat-storage/ocs-operator/blob/main/metrics/internal/collectors/ceph-block-pool.go#L107-L139 Version of all relevant components (if applicable): OCP 4.14.7, ODF 4.14.3 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Prometheus is randomly losing some metrics. Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? N/A If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Install an ODF cluster without RBD mirroring 2. See the logs from the exporter Actual results: Metrics from ocs-metrics exporter are sometimes missing Expected results: Metrics scraped and not errors on the container Additional info:
I beleive this has been fixed in the latest builds, I'll test it out on the latest and confirm if the fix is working
tested it out on 4.15, the logs don't give any issues regarding rbd. closing this, feel free to open, if you encounter it again