Description of problem: When alert CephMonQuorumAtRisk is triggered in Prometheus, it is not propagated into PagerDuty. Version-Release number of selected component (if applicable): ocs-operator.v4.8.1 ocs-osd-deployer-qe.v1.1.0 How reproducible: 1/1 Steps to Reproduce: 1. Drain all nodes in one rack that contains one ceph monitor. Make sure that the monitor is not rescheduled elsewhere and that number of ceph monitors is even. 2. Check Prometheus that alert CephMonQuorumAtRisk is Pending. 3. Wait 15 minutes. 4. Check that the alert is propagated into PagerDuty. Actual results: Alert is not propagated into PagerDuty. Expected results: Alert is propagated into PagerDuty. Additional info: To check Prometheus, user needs to forward a port: $ oc port-forward svc/prometheus-operated 9090 -n openshift-storage Then user can access http://localhost:9090/alerts in browser and see managed alerts.
Alert is propagated correctly and when nodes are uncordoned again, the alert is cleared correctly. --> VERIFIED Tested with: ocs-operator.v4.8.5 ocs-osd-deployer-qe.v1.1.2 ocp 4.9.9