Bug 2257694 - multicluster mode ocs-metrics-exporter issues: unable to collect PV data or to get CSI config
Summary: multicluster mode ocs-metrics-exporter issues: unable to collect PV data or t...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph-monitoring
Version: 4.14
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ODF 4.15.0
Assignee: umanga
QA Contact: Shay Rozen
URL:
Whiteboard:
Depends On:
Blocks: 2255036
TreeView+ depends on / blocked
 
Reported: 2024-01-10 14:03 UTC by arun kumar mohan
Modified: 2024-03-19 15:31 UTC (History)
6 users (show)

Fixed In Version: 4.15.0-130
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-03-19 15:31:04 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 2407 0 None open [WIP] Enable ceph client and collectors for multicluster case 2024-01-22 13:43:43 UTC
Github red-hat-storage ocs-operator pull 2431 0 None open Bug 2257694: [release-4.15] Enable ceph client and collectors for multicluster case 2024-01-30 11:58:05 UTC
Red Hat Product Errata RHSA-2024:1383 0 None None None 2024-03-19 15:31:12 UTC

Description arun kumar mohan 2024-01-10 14:03:24 UTC
Description of problem (please be detailed as possible and provide log
snippests):

This BZ is raised to address following ocs-metrics-exporter issues (as noted by Umanga in BZ#2255036):

1. ocs-metrics-exporter deployed on `openshift-extended-storage` namespace does not have access to collect PV data. It causes some PV metrics to be missing.
```
{"level":"error","ts":1704795344.3449461,"caller":"cache/reflector.go:147","msg":"/remote-source/app/metrics/internal/collectors/registry.go:72: Failed to watch *v1.PersistentVolume: unable to sync list result: failed to initialize ceph: failed to get client key from secret in namespace \"openshift-storage-extended\""}
```

2. ocs-metrics-exporter deployed on `openshift-extended-storage` namespace can not find CSI config required to connect to Ceph to retrieve data. It causes metrics depending on Ceph commands to be missing.
   We might need to disable this exporter for external mode since we can not assume access to external clusters to execute ceph commands.
```
{"level":"info","ts":1704795373.87813,"caller":"cache/reflector.go:458","msg":"/remote-source/app/metrics/internal/collectors/registry.go:95: watch of *v1.CephBlockPool ended with: failed to initialize ceph: expected 1 or more CSI cluster config but found 0 from configmap in namespace \"openshift-storage-extended\""}
```

3. Some Prometheus query functions are not working as expected. There are multiple jobs exporting metrics. These should be considered when updating queries.

Above errors are part of the issues faced in populating 'openshift-storage-extended' namespace, thus blocking BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2255036

Version of all relevant components (if applicable):


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
NA

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:
ocs-metrics-exporter is currently facing the above errors

Expected results:
working of ocs-metrics-exporter should be smooth and there should not be any error messages in the operator logs.

Additional info:

Comment 6 Shay Rozen 2024-02-21 14:47:02 UTC
Verified on odf-operator.v4.15.0-147.stable

Comment 8 errata-xmlrpc 2024-03-19 15:31:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383


Note You need to log in before you can comment on or make changes to this bug.