Bug 2076670

Summary: CephMonQuorumLost is not triggered when 2 of 3 monitors are down
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Filip Balák <fbalak>
Component: odf-managed-serviceAssignee: Dhruv Bindra <dbindra>
Status: CLOSED CURRENTRELEASE QA Contact: Filip Balák <fbalak>
Severity: high Docs Contact:
Priority: high    
Version: 4.10CC: aeyal, muagarwa, ocs-bugs, odf-bz-bot, rcyriac
Target Milestone: ---Keywords: AutomationBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 2.0.3-2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-28 06:49:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Filip Balák 2022-04-19 15:40:39 UTC
Description of problem:
CephMonQuorumLost is not triggered when 2 of 3 monitors are down.

Version-Release number of selected component (if applicable):
ocs-operator.v4.10.0
OCP 4.10.8

How reproducible:
2/2

Steps to Reproduce:
1. Get list of monitor deployments.
2. Scale to 0 all of them except for one.
3. Check alerting in PagerDuty

Actual results:
There is not alert CephMonQuorumLost in time.
When ``$ oc port-forward svc/prometheus-operated 9090 -n openshift-storage`` is executed during testing, http://localhost:9090/alerts shows that no ceph alert is triggered.

Expected results:
Alert CephMonQuorumLost should be triggered.

Test run:
https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/11915

Comment 1 Filip Balák 2022-07-14 07:53:36 UTC
Verified as part of https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/14601

Tested with:
ocs-osd-deployer:2.0.3-2