Bug 2203795
Summary: | ODF Monitoring is missing some of the ceph_* metric values | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Vishakha Kathole <vkathole> |
Component: | rook | Assignee: | avan <athakkar> |
Status: | CLOSED ERRATA | QA Contact: | Filip Balák <fbalak> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.13 | CC: | akandath, athakkar, ebenahar, fbalak, hnallurv, jolmomar, muagarwa, nthomas, ocs-bugs, odf-bz-bot, paarora, sbalusu, tdesala, tnielsen |
Target Milestone: | --- | Keywords: | Automation, Regression |
Target Release: | ODF 4.13.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 4.13.0-214 | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-06-21 15:25:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Vishakha Kathole
2023-05-15 09:32:17 UTC
Not a 4.13 blocker Nishanth, please assign it to someone. Do we have the ODF cluster or must gather logs for this? After the fix there are still missing metrics 'ceph_bluestore_submit_lat_sum', 'ceph_bluestore_submit_lat_count', 'ceph_bluestore_throttle_lat_count', 'ceph_bluestore_commit_lat_sum', 'ceph_bluestore_throttle_lat_sum', 'ceph_rocksdb_get', 'ceph_bluestore_commit_lat_count'. Is this expected? Tested with odf 4.13.0-214. Hi @Filip, I did investigate the missing metrics you reported, it seems there's a discrepancy in metrics name on ocs-ci end. The metrics which are exported by ceph are actually named as for example ceph_bluestore_txc_submit_lat_count and similar for other metrics. https://github.com/ceph/ceph/blame/v17.2.6/src/os/bluestore/BlueStore.cc#L5076 I see that this was updated ~2 years ago in Ceph and the metrics.py for ocs-ci is last updated ~3 years ago, so it must be adopt the metrics name coming from ceph https://github.com/red-hat-storage/ocs-ci/blob/master/ocs_ci/ocs/metrics.py#L116 and the same goes for `ceph_rocksdb_get`, I don't see any metrics exported by Ceph with that name, so it must be removed from the file. Hope this helps. Thanks Also observing this bug as part of the following ocs-ci test execution on IBM Z tests/manage/monitoring/prometheusmetrics/test_monitoring_negative.py::test_ceph_metrics_presence_when_osd_down This is verified based on discussion in thread https://chat.google.com/room/AAAAREGEba8/KoCb6Izr65o. There will be needed a note in release notes. New metric names: ceph_bluestore_submit_lat_sum -> ceph_bluestore_txc_submit_lat_sum ceph_bluestore_submit_lat_count -> ceph_bluestore_txc_submit_lat_count ceph_bluestore_throttle_lat_count -> ceph_bluestore_txc_throttle_lat_count ceph_bluestore_commit_lat_sum -> ceph_bluestore_txc_commit_lat_sum ceph_bluestore_throttle_lat_sum -> ceph_bluestore_txc_throttle_lat_sum ceph_bluestore_commit_lat_count -> ceph_bluestore_txc_commit_lat_count Metric ceph_rocksdb_get was removed because it was redundant and its data can be accessed from metrics ceph_rocksdb_get_latency_sum and ceph_rocksdb_get_latency_count. --> VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742 |