2009397 – Alert CephMonQuorumAtRisk is not propagated to PagerDuty

Bug 2009397 - Alert CephMonQuorumAtRisk is not propagated to PagerDuty

Summary: Alert CephMonQuorumAtRisk is not propagated to PagerDuty

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	odf-managed-service
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Dhruv Bindra
QA Contact:	Filip Balák
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-09-30 14:27 UTC by Filip Balák
Modified:	2022-06-08 12:21 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-12-16 19:50:20 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift ocs-osd-deployer pull 94	0	None	open	AlertRelabelConfigSecret added to apply namespace label to alerts	2021-10-04 10:27:48 UTC

Description Filip Balák 2021-09-30 14:27:54 UTC

Description of problem:
When alert CephMonQuorumAtRisk is triggered in Prometheus, it is not propagated into PagerDuty.

Version-Release number of selected component (if applicable):
ocs-operator.v4.8.1
ocs-osd-deployer-qe.v1.1.0

How reproducible:
1/1

Steps to Reproduce:
1. Drain all nodes in one rack that contains one ceph monitor. Make sure that the monitor is not rescheduled elsewhere and that number of ceph monitors is even.
2. Check Prometheus that alert CephMonQuorumAtRisk is Pending.
3. Wait 15 minutes.
4. Check that the alert is propagated into PagerDuty.

Actual results:
Alert is not propagated into PagerDuty.

Expected results:
Alert is propagated into PagerDuty.

Additional info:
To check Prometheus, user needs to forward a port:
 $ oc port-forward svc/prometheus-operated 9090 -n openshift-storage
Then user can access http://localhost:9090/alerts in browser and see managed alerts.

Comment 2 Filip Balák 2021-12-14 15:12:53 UTC

Alert is propagated correctly and when nodes are uncordoned again, the alert is cleared correctly. --> VERIFIED

Tested with:
ocs-operator.v4.8.5
ocs-osd-deployer-qe.v1.1.2
ocp 4.9.9

Note You need to log in before you can comment on or make changes to this bug.