Bug 2114835
| Summary: | prometheus reports an error during evaluation of CephPoolGrowthWarning alert rule | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Martin Bukatovic <mbukatov> |
| Component: | Ceph-Dashboard | Assignee: | Aashish sharma <aasharma> |
| Status: | CLOSED ERRATA | QA Contact: | Sayalee <saraut> |
| Severity: | low | Docs Contact: | Akash Raj <akraj> |
| Priority: | unspecified | ||
| Version: | 5.2 | CC: | aasharma, akraj, ceph-eng-bugs, cephqe-warriors, kdreyer, rmandyam, saraut, vereddy |
| Target Milestone: | --- | Keywords: | Rebase |
| Target Release: | 6.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
.No CephPoolGrowthWarning alerts are fired on the dashboard
Previously, incorrect query for CephPoolGrowthWarning alert caused “Evaluating rule failed” errors to repeat indefinitely in Prometheus logs of a stretch cluster.
With this release, the query is fixed and no errors are observed.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-06-15 09:15:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2192813 | ||
|
Description
Martin Bukatovic
2022-08-03 11:43:22 UTC
The query in question is:
```
(predict_linear(ceph_pool_percent_used[2d], 3600 * 24 * 5) * on(pool_id) group_right() ceph_pool_metadata) >= 95
```
Values of ceph_pool_metadata metric (via prometheus query) looks ok:
```
ceph_pool_metadata{compression_mode="none", description="replica:4", instance="10.1.161.69:9283", job="ceph", name=".rgw.root", pool_id="5", type="replicated"} 1
ceph_pool_metadata{compression_mode="none", description="replica:4", instance="10.1.161.69:9283", job="ceph", name="cephfs.cephfs.data", pool_id="4", type="replicated"} 1
ceph_pool_metadata{compression_mode="none", description="replica:4", instance="10.1.161.69:9283", job="ceph", name="cephfs.cephfs.meta", pool_id="3", type="replicated"} 1
ceph_pool_metadata{compression_mode="none", description="replica:4", instance="10.1.161.69:9283", job="ceph", name="default.rgw.buckets.data", pool_id="10", type="replicated"} 1
ceph_pool_metadata{compression_mode="none", description="replica:4", instance="10.1.161.69:9283", job="ceph", name="default.rgw.buckets.index", pool_id="9", type="replicated"} 1
ceph_pool_metadata{compression_mode="none", description="replica:4", instance="10.1.161.69:9283", job="ceph", name="default.rgw.control", pool_id="7", type="replicated"} 1
ceph_pool_metadata{compression_mode="none", description="replica:4", instance="10.1.161.69:9283", job="ceph", name="default.rgw.log", pool_id="6", type="replicated"} 1
ceph_pool_metadata{compression_mode="none", description="replica:4", instance="10.1.161.69:9283", job="ceph", name="default.rgw.meta", pool_id="8", type="replicated"} 1
ceph_pool_metadata{compression_mode="none", description="replica:4", instance="10.1.161.69:9283", job="ceph", name="device_health_metrics", pool_id="1", type="replicated"} 1
ceph_pool_metadata{compression_mode="none", description="replica:4", instance="10.1.161.69:9283", job="ceph", name="rbdpool", pool_id="2", type="replicated"} 1
```
But `predict_linear(ceph_pool_percent_used[2d], 3600 * 24 * 5)` expression contains duplicated pool ids:
```
{instance="10.1.161.69:9283", job="ceph", pool_id="1"} 0.00002235054085654088
{instance="10.1.161.69:9283", job="ceph", pool_id="10"} 0.0000005170602257106434
{instance="10.1.161.69:9283", job="ceph", pool_id="2"} 0.14350355292439385
{instance="10.1.161.69:9283", job="ceph", pool_id="3"} 0.000003241211518389691
{instance="10.1.161.69:9283", job="ceph", pool_id="4"} 0
{instance="10.1.161.69:9283", job="ceph", pool_id="5"} 0.0000009754056280110012
{instance="10.1.161.69:9283", job="ceph", pool_id="6"} 0.00000830233856533691
{instance="10.1.161.69:9283", job="ceph", pool_id="7"} 0
{instance="10.1.161.69:9283", job="ceph", pool_id="8"} 0.00001201077839360858
{instance="10.1.161.69:9283", job="ceph", pool_id="9"} 0
{instance="10.1.161.89:9283", job="ceph", pool_id="1"} 0
{instance="10.1.161.89:9283", job="ceph", pool_id="2"} 0
{instance="10.1.161.89:9283", job="ceph", pool_id="3"} 0.000001370114318888227
{instance="10.1.161.89:9283", job="ceph", pool_id="4"} 0
{instance="10.1.161.89:9283", job="ceph", pool_id="5"} 0.0000006850576141914644
{instance="10.1.161.89:9283", job="ceph", pool_id="6"} 0.000005822959792567417
{instance="10.1.161.89:9283", job="ceph", pool_id="7"} 0
{instance="10.1.161.89:9283", job="ceph", pool_id="8"} 0.0000003425289207825699
```
Merged to quincy upstream in https://github.com/ceph/ceph/pull/49475 . Will be in v17.2.6 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3623 |