Hi, The alert, `CephClusterWarningState`, is dependent on query `ceph_health_status == 1` (if it returns 1 means ceph cluster's health is in warning state). In the above scenario, where the cluster was tested for BZ#2218593 - StorageCluster goes to an error state by it's own and here we recovered (using the workaround of restarting the OCS Operator) the StorageCluster is recovered, but we still see the `ceph_health_status` is still returning ONE (I believe, Aman, you were referring to some MDR crash). That is, ceph cluster health is still in warning state, so alert remains as expected. @Aman, can you please try to repro the issue where we still see the alert, `CephClusterWarningState` in a cluster where ceph cluster is in HEALTH_OK state.
I was not able to reproduce the bug. CephClusterWarningState alert was correctly cleared when ceph health state was restored back to HEALTH_OK. I recommend to check ceph status directly with tools pod when this issue is encountered in future: $ oc rsh -n openshift-storage $(oc get pods -n openshift-storage|grep tool|awk '{print$1}') ceph -s It is possible that ceph was in a state that prevented it to return back to HEALTH_OK. Tested with: ODF 4.13.1-9 OCP 4.13.0-0.nightly-2023-07-20-222544