Description of problem (please be detailed as possible and provide log snippests): CephMonLowNumber alert is raised only when metrics exporter pod is restarted On a vmware cluster where more than five failure domains are added, CephMonLowNumberalert is not shown automatically, metrics exporter pod needs to be restarted to get alerts Version of all relevant components (if applicable): OCP 4.15 and ODF 4.15.0-122 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? NA Is there any workaround available to the best of your knowledge? Restart metrics pod to get alerts Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: NA Steps to Reproduce: 1.Install a 3master 6 worker node cluster 2.Label the worker nodes in different failure domains using command "oc label node compute-2 failure-domain.beta.kubernetes.io/zone=rack-2 --overwrite=true" 3. Wait for the "CephMonLowNumber" alert to be raised Actual results: Alert is raised only if metrics pod is restarted Expected results: Whenever failure domain count reaches above 5, alert should be raised automatically Additional info:
Verified with OCP 4.15.0-0.nightly-2024-02-05-224816 and ODF 4.15.0-134, If the cluster has five or more than five failure domains automatically CephMonLowNumber alert is raised and we are able to update Ceph Mon count to five through the configure modal. hence closing the bug as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383