Bug 2218874 - CephClusterWarningState alert doesn't disappear from UI when storage cluster recovers
Summary: CephClusterWarningState alert doesn't disappear from UI when storage cluster ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph-monitoring
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: arun kumar mohan
QA Contact: Harish NV Rao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-30 11:28 UTC by Aman Agrawal
Modified: 2023-08-09 16:37 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-24 06:19:32 UTC
Embargoed:


Attachments (Terms of Use)

Comment 3 arun kumar mohan 2023-07-19 07:31:28 UTC
Hi,
The alert, `CephClusterWarningState`, is dependent on query `ceph_health_status == 1` (if it returns 1 means ceph cluster's health is in warning state).

In the above scenario, where the cluster was tested for BZ#2218593 - StorageCluster goes to an error state by it's own and here we recovered (using the workaround of restarting the OCS Operator) the StorageCluster is recovered, but we still see the `ceph_health_status` is still returning ONE (I believe, Aman, you were referring to some MDR crash). That is, ceph cluster health is still in warning state, so alert remains as expected.

@Aman, can you please try to repro the issue where we still see the alert, `CephClusterWarningState` in a cluster where ceph cluster is in HEALTH_OK state.

Comment 5 Filip Balák 2023-07-21 13:59:18 UTC
I was not able to reproduce the bug. CephClusterWarningState alert was correctly cleared when ceph health state was restored back to HEALTH_OK. I recommend to check ceph status directly with tools pod when this issue is encountered in future:
$ oc rsh -n openshift-storage $(oc get pods -n openshift-storage|grep tool|awk '{print$1}') ceph -s

It is possible that ceph was in a state that prevented it to return back to HEALTH_OK.

Tested with:
ODF 4.13.1-9
OCP 4.13.0-0.nightly-2023-07-20-222544


Note You need to log in before you can comment on or make changes to this bug.