Bug 2240132

Summary: monitoring: grafana mons out of quorum should be count - sum
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Justin <jherron>
Component: Ceph-DashboardAssignee: Aashish sharma <aasharma>
Status: CLOSED ERRATA QA Contact: Vinayak Papnoi <vpapnoi>
Severity: medium Docs Contact: Disha Walvekar <dwalveka>
Priority: unspecified    
Version: 6.1CC: aasharma, ceph-eng-bugs, cephqe-warriors, dwalveka, nia, pegonzal, tserlin
Target Milestone: ---   
Target Release: 6.1z4   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-17.2.6-193.el9cp; grafana-container-6-85 Doc Type: Bug Fix
Doc Text:
Cause: There was no threshold set in the ceph cluster dashboard's OSDs panel to show OSD's Out/Down in a danger state Consequence: Ceph Cluster Dashboard's OSD panel didn't have the thresholds set to show OSD's Out/Down ina danger state Fix: Added thresholds to the OSD panel Result: OSD panel shows correct panel colors when OSD's are down/out
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-02-08 18:13:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2261930    

Description Justin 2023-09-21 22:34:48 UTC
Description of problem:


[Grafana Ceph Cluster Dashboard]

[1] Default configurations Panels are somewhat off in image registry.redhat.io/rhceph/rhceph-6-dashboard-rhel9 (Confirmed in latest tag: 6-77 && 6-75).

The Panel: Monitors shows wrong information, When 5/5 mon's are in quorum, the "Out of Quorum" box shows 1.

This is due to the division operation as shown below. Which is the JSON model
for the Default Ceph Cluster dashboard.
```
-> # cat Ceph_-_Cluster-1695304085550.json | jq  '.panels[] | select( .title | contains("Monitors")) | .targets[] | .expr'
"sum(ceph_mon_quorum_status)"
"count(ceph_mon_quorum_status)"
"count(ceph_mon_quorum_status) / sum(ceph_mon_quorum_status)" <<< If 5 out of 5 monitors are up the this op will return 1. Which would show one monitor is out of quorum. 
``` 

Upstream pull request accepted.
monitoring: grafana mons out of quorum should be count
https://github.com/ceph/ceph/pull/52150/files

[2] Health is "OK" but the box is red

This is due to the default values in the panel configuration shown below. Value for red should be null and
green value should be 80.( correct me if I am wrong here.  

```
 $ >> cat Ceph_-_Cluster-1695304085550.json | jq  '.panels[] | select( .title | contains("OSDs")) | .fieldConfig' 
{
  "defaults": {
    "mappings": [],
    "thresholds": {
      "mode": "absolute",
      "steps": [
        {
          "color": "green",
          "value": null
        },
        {
          "color": "red",
          "value": 80
        }
      ]
    }
  },
  "overrides": []
}
```

How reproducible:
Default out of box for item [1] && for item [2] 


Additional info:

Comment 2 Justin 2023-09-22 10:48:12 UTC
Nizamudeen, 

Thank you for the info

Comment 4 Aashish sharma 2023-11-23 08:06:45 UTC
*** Bug 2239980 has been marked as a duplicate of this bug. ***

Comment 15 errata-xmlrpc 2024-02-08 18:13:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:0747