Bug 1904302 - [GSS] ceph_daemon label includes references to a replaced OSD that cause a prometheus ruleset to fail
Summary: [GSS] ceph_daemon label includes references to a replaced OSD that cause a pr...
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: rook
Version: 4.5
Hardware: All
OS: All
Target Milestone: ---
: OCS 4.7.0
Assignee: Anmol Sachan
QA Contact: Martin Bukatovic
Depends On:
Blocks: 1938134
TreeView+ depends on / blocked
Reported: 2020-12-04 03:07 UTC by Jay Samson
Modified: 2024-03-25 17:23 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
.Errors in `must gather` due to failed rule evaluation Earlier, the recording rule record: `cluster:ceph_disk_latency:join_ceph_node_disk_irate1m` did not get evaluated because *many-to-many* match is not allowed in Prometheus. As a result, there were errors in the `must gather` and in the deployment due to this failed rule evaluation. With this release, the query for recording rule has been updated to eliminate the *many-to-many* match scenarios, and hence now the Prometheus rule evaluations should not fail and there should not be any errors seen in the deployment.
Clone Of:
Last Closed: 2021-05-19 09:16:33 UTC

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift rook pull 191 0 None closed Bug 1904302: ceph: fixed prometheus query to avoid many-to-many match error 2021-03-15 05:19:59 UTC
Github rook rook pull 7273 0 None open ceph: fixed promethues query to avoid many-to-many match error 2021-02-19 13:13:59 UTC
Red Hat Product Errata RHSA-2021:2041 0 None None None 2021-05-19 09:17:26 UTC

Comment 2 Mudit Agarwal 2020-12-04 04:30:05 UTC
Not a 4.6 blocker.

Comment 7 Nishanth Thomas 2021-02-03 07:56:14 UTC
Moving out to 4.8

Comment 21 Michael Adam 2021-03-15 08:22:34 UTC
fixing up acks

Comment 27 Mudit Agarwal 2021-05-10 07:59:33 UTC
Hi Disha,

Doc text looks good to me, please go ahead with this.


Comment 29 Martin Bukatovic 2021-05-13 10:42:59 UTC
Verification via regression testing only: In our CI results, I see no issues which seems to be caused
by this change, and I also don't see any PrometheusRuleFailures alerts there.

Comment 31 errata-xmlrpc 2021-05-19 09:16:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.