Description of problem: When kubelet process is stopped on a node and CephNodeDown is triggered in Prometheus, it is not propagated into PagerDuty. Version-Release number of selected component (if applicable): ocs-operator.v4.8.1 ocs-osd-deployer-qe.v1.1.0 How reproducible: 2/2 Steps to Reproduce: 1. Execute to identify one of worker nodes: $ oc get nodes 2. Execute to log into a node: $ oc debug node/<some worker node> $ chroot /host 3. Execute to stop kubelet: $ systemctl stop kubelet 4. Execute to check for node to be in NotReady state: $ watch oc get nodes 5. Look into PagerDuty when node is in NotReady state. Actual results: Alert CephNodeDown is not in PagerDuty. There is no alert sent that indicates that there is any problem with cluster. Expected results: Appropriate alerts are raised in PagerDuty. Additional info:
Filip Balak ran all pagerduty tests with ocs-operator.v4.8.5 ocs-osd-deployer-qe.v1.1.2 ocp 4.9.9 Alerts are working correctly