Description of problem: When alert CephDataRecoveryTakingTooLong is triggered in Prometheus, it is not propagated into PagerDuty. Version-Release number of selected component (if applicable): ocs-operator.v4.8.1 ocs-osd-deployer-qe.v1.1.0 How reproducible: 1/1 Steps to Reproduce: 1. Drain all nodes in one rack that contains osds. 2. Check Prometheus that alert CephDataRecoveryTakingTooLong is Pending. 3. Wait 2 hours. 4. Check that the alert is propagated into PagerDuty. Actual results: Alert is not propagated into PagerDuty. Expected results: Alert is propagated into PagerDuty. Additional info: To check Prometheus, user needs to forward a port: $ oc port-forward svc/prometheus-operated 9090 -n openshift-storage Then user can access http://localhost:9090/alerts in browser and see managed alerts.
Alert is propagated correctly after 2 hours and when nodes are uncordoned again, the alert is cleared correctly. --> VERIFIED Tested with: ocs-operator.v4.8.5 ocs-osd-deployer-qe.v1.1.2 ocp 4.9.9