Bug 2076670 - CephMonQuorumLost is not triggered when 2 of 3 monitors are down
Summary: CephMonQuorumLost is not triggered when 2 of 3 monitors are down
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-managed-service
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Dhruv Bindra
QA Contact: Filip Balák
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-19 15:40 UTC by Filip Balák
Modified: 2023-08-09 17:00 UTC (History)
5 users (show)

Fixed In Version: 2.0.3-2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-28 06:49:06 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-osd-deployer pull 176 0 None open Added new metrics to be federated according to new alertRules 2022-04-25 06:28:59 UTC

Description Filip Balák 2022-04-19 15:40:39 UTC
Description of problem:
CephMonQuorumLost is not triggered when 2 of 3 monitors are down.

Version-Release number of selected component (if applicable):
ocs-operator.v4.10.0
OCP 4.10.8

How reproducible:
2/2

Steps to Reproduce:
1. Get list of monitor deployments.
2. Scale to 0 all of them except for one.
3. Check alerting in PagerDuty

Actual results:
There is not alert CephMonQuorumLost in time.
When ``$ oc port-forward svc/prometheus-operated 9090 -n openshift-storage`` is executed during testing, http://localhost:9090/alerts shows that no ceph alert is triggered.

Expected results:
Alert CephMonQuorumLost should be triggered.

Test run:
https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/11915

Comment 1 Filip Balák 2022-07-14 07:53:36 UTC
Verified as part of https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/14601

Tested with:
ocs-osd-deployer:2.0.3-2


Note You need to log in before you can comment on or make changes to this bug.