Bug 2009397 - Alert CephMonQuorumAtRisk is not propagated to PagerDuty
Summary: Alert CephMonQuorumAtRisk is not propagated to PagerDuty
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: odf-managed-service
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Dhruv Bindra
QA Contact: Filip Balák
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-30 14:27 UTC by Filip Balák
Modified: 2022-06-08 12:21 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-16 19:50:20 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-osd-deployer pull 94 0 None open AlertRelabelConfigSecret added to apply namespace label to alerts 2021-10-04 10:27:48 UTC

Description Filip Balák 2021-09-30 14:27:54 UTC
Description of problem:
When alert CephMonQuorumAtRisk is triggered in Prometheus, it is not propagated into PagerDuty.

Version-Release number of selected component (if applicable):
ocs-operator.v4.8.1
ocs-osd-deployer-qe.v1.1.0

How reproducible:
1/1

Steps to Reproduce:
1. Drain all nodes in one rack that contains one ceph monitor. Make sure that the monitor is not rescheduled elsewhere and that number of ceph monitors is even.
2. Check Prometheus that alert CephMonQuorumAtRisk is Pending.
3. Wait 15 minutes.
4. Check that the alert is propagated into PagerDuty.

Actual results:
Alert is not propagated into PagerDuty.

Expected results:
Alert is propagated into PagerDuty.

Additional info:
To check Prometheus, user needs to forward a port:
 $ oc port-forward svc/prometheus-operated 9090 -n openshift-storage
Then user can access http://localhost:9090/alerts in browser and see managed alerts.

Comment 2 Filip Balák 2021-12-14 15:12:53 UTC
Alert is propagated correctly and when nodes are uncordoned again, the alert is cleared correctly. --> VERIFIED

Tested with:
ocs-operator.v4.8.5
ocs-osd-deployer-qe.v1.1.2
ocp 4.9.9


Note You need to log in before you can comment on or make changes to this bug.