Bug 2014167
| Summary: | MGR related alerts are not triggered when Ceph Manager is down | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Filip Balák <fbalak> | |
| Component: | ceph-monitoring | Assignee: | arun kumar mohan <amohan> | |
| Status: | CLOSED ERRATA | QA Contact: | Filip Balák <fbalak> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.8 | CC: | amohan, muagarwa, ocs-bugs, ratamir, shan, tnielsen | |
| Target Milestone: | --- | Keywords: | Regression, TestBlocker, ZStream | |
| Target Release: | OCS 4.8.3 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | 4.8.3-14 | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2014391 (view as bug list) | Environment: | ||
| Last Closed: | 2021-10-18 12:16:24 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2014391 | |||
|
Description
Filip Balák
2021-10-14 14:44:13 UTC
We **MUST** always open a new bug against ODF and current release (which is 4.9 currently) It does not matter where the bug was found, it will be decided that where the fix will go but we must always start with the current release. Added the following PR: https://github.com/rook/rook/pull/8985 Sebastian, Travis please take a look Adding RAC (same as commit message)
CephMgrIsAbsent
----------------
This alert initially had the following query
absent(up{job="rook-ceph-mgr"})
which will fire when the 'up' query is not present, but had two flows
a. it will not be fired if 'up' provides a result with ZERO value
b. it will not give any fields in the metric, so 'namespace' was missing
when the above query was replaced with the following,
up{job="rook-ceph-mgr"} == 0
query had the following shortage
a. whenever mgr pod is completely down (like 'replicas' set to ZERO
and 'mgr' is not coming up), 'up' query will not give any result.
Thus we had to combine both the queries to get results in both the scenarios.
CephMgrIsMissingReplicas
------------------------
This query previously was,
sum(up{job="rook-ceph-mgr"}) < 1
had the same structure as the above (Absent) query, but it's
intention was to check the no: of 'replicas' count for ceph mgr.
Now it is changed to a kube query which handles the replicas count.
The fix looks ok. -> VERIFIED Tested with: OCS 4.8.3-14 OCP 4.8.0-0.nightly-2021-10-16-024756 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OCS 4.8.3 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3881 |