2006222 – CephNodeDown alert is not propagated into PagerDuty

Bug 2006222 - CephNodeDown alert is not propagated into PagerDuty

Summary: CephNodeDown alert is not propagated into PagerDuty

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	odf-managed-service
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Dhruv Bindra
QA Contact:	Elena Bondarenko
Docs Contact:
URL:
Whiteboard:
Depends On:	2006323
Blocks:
TreeView+	depends on / blocked

Reported:	2021-09-21 08:51 UTC by Filip Balák
Modified:	2022-06-08 12:16 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-12-16 19:49:42 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift ocs-osd-deployer pull 94	0	None	open	AlertRelabelConfigSecret added to apply namespace label to alerts	2021-10-04 10:25:01 UTC

Description Filip Balák 2021-09-21 08:51:58 UTC

Description of problem:
When kubelet process is stopped on a node and CephNodeDown is triggered in Prometheus, it is not propagated into PagerDuty.

Version-Release number of selected component (if applicable):
ocs-operator.v4.8.1
ocs-osd-deployer-qe.v1.1.0

How reproducible:
2/2

Steps to Reproduce:
1. Execute to identify one of worker nodes:
 $ oc get nodes
2. Execute to log into a node: 
 $ oc debug node/<some worker node>
 $ chroot /host
3. Execute to stop kubelet:
 $ systemctl stop kubelet
4. Execute to check for node to be in NotReady state:
 $ watch oc get nodes
5. Look into PagerDuty when node is in NotReady state.

Actual results:
Alert CephNodeDown is not in PagerDuty. There is no alert sent that indicates that there is any problem with cluster.

Expected results:
Appropriate alerts are raised in PagerDuty.

Additional info:

Comment 1 Elena Bondarenko 2021-12-14 15:39:40 UTC

Filip Balak ran all pagerduty tests with 
ocs-operator.v4.8.5
ocs-osd-deployer-qe.v1.1.2
ocp 4.9.9

Alerts are working correctly

Note You need to log in before you can comment on or make changes to this bug.