Bug 2262943
| Summary: | PrometheusRule evaluation failing for pool-quota.rules | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | umanga <uchapaga> |
| Component: | ceph-monitoring | Assignee: | arun kumar mohan <amohan> |
| Status: | CLOSED ERRATA | QA Contact: | Filip Balák <fbalak> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.15 | CC: | amohan, fbalak, kbg, muagarwa, nthomas, odf-bz-bot |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.16.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.16.0-102 | Doc Type: | Bug Fix |
| Doc Text: |
.PrometheusRule evaluation failing for pool-quota rule
Previously, none of the Ceph pool quota alerts were displayed because in a multi-cluster setup, 'PrometheusRuleFailures’ alert was fired due to `pool-quota` rules. The queries in the `pool-quota` section were unable to distinguish the cluster from which the alert was fired in a multi-cluster setup.
With this fix, a `managedBy` label was added to all the queries in the `pool-quota` to generate unique results from each cluster. As a result, `PrometheusRuleFailures` alert is no longer seen and all the alerts in `pool-quota` work as expected.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-07-17 13:13:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2260844, 2266316 | ||
|
Description
umanga
2024-02-06 07:23:31 UTC
The PR, https://github.com/red-hat-storage/ocs-operator/pull/2596, is merged now... Adding RDT details, please take a look. The alert is still present with: AWS IPI KMS THALES 1AZ RHCOS 3M 3W Cluster (https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/11930/) For more info https://bugzilla.redhat.com/show_bug.cgi?id=2266316 Tested with ODF 4.16.0-108 Hi Filip, I'm unable to repro this on a normal AWS cluster (without any KMS THALES configuration). Since both the related BZs (BZ#2262943, this one, and BZ#2266316) are happening on this specific KMS Thales configuration, can we open up a new BZ only for this particular combo and close these BZs? Please let me know, what you think. With new regression test results it looks like there is no progress in the issue. The alert is present for most IPI deployments. We can close some of those bzs to have some fix in (it looks to help in one instance but this might be a coincidence) but the issue is still present: https://docs.google.com/spreadsheets/d/1akrwspvWglSs905x2JcydJNH08WO6Ptri-hrkZ2VO80/edit#gid=40270420&range=F705 Arun/Filip, what are the next steps for this BZ and BZ#2266316. Can we please take a decision? Hi Mudit, Had a chat with Filip and (most probably) will get a setup by tomorrow. Will update with more details (on the fix). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591 |