2308347 – CephMgrIsAbsent is not firing when scaling down one mgr deployment

Bug 2308347 - CephMgrIsAbsent is not firing when scaling down one mgr deployment

Summary: CephMgrIsAbsent is not firing when scaling down one mgr deployment

Keywords:
Status:	NEW
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ceph-monitoring
Sub Component:
Version:	4.16
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Divyansh Kamboj
QA Contact:	Harish NV Rao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-08-28 15:22 UTC by Daniel Osypenko
Modified:	2024-09-17 10:05 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OCSBZM-8883	0	None	None	None	2024-08-28 15:24:46 UTC

Description Daniel Osypenko 2024-08-28 15:22:58 UTC

Description of problem (please be detailed as possible and provide log
snippests):

This bug is created as a consequence of the change covered by epic https://issues.redhat.com/browse/RHSTOR-4139 (ODF 4.15)

After scaling down one of the two MGR pods CephMgrIsAbsent Alert does not fire.

Alert appears only after scaling down all mgr deployments. 
We may want to notify user with an alert of Warning lvl of CephMgrIsAbsent

scale down mgr pod and check out CephMgrIsAbsent
oc get deployment | grep mgr
rook-ceph-mgr-a                                               0/0     0            0           7h27m
rook-ceph-mgr-b                                               1/1     1            1           7h27m

Expression: 
label_replace((up{job="rook-ceph-mgr"} == 0 or absent(up{job="rook-ceph-mgr"})), "namespace", "odf-storage", "", "")
* namespace is dynamic, and both with original deployment on openshift-storage and custom odf-storage alert is not firing.


Version of all relevant components (if applicable):

Issue confirmed on ODF 4.15 and on ODF 4.16 (vSphere and ROSA HCP deployments)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
no

Is there any workaround available to the best of your knowledge?
no

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
yes

Can this issue reproduce from the UI?
yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. oc -n odf-storage scale --replicas=0 deployment/rook-ceph-mgr-a
2. wait 5 min
3. check alert via management-console / Observe / Alerts or using 
 curl -k -X GET "<route>/api/v1/alerts?silenced=False&inhibited=False" -H "Authorization: Bearer <token>" | jq '.data.alerts[] | select(.labels.alertname == "CephMgrIsAbsent")'


Actual results:
CephMgrIsAbsent does not fire

Expected results:
CephMgrIsAbsent fires to notify user risk of loosing all mgr pods

Additional info:
Latest changes in regards to CephMgrIsAbsent that I found -https://github.com/rook/rook/issues/12249

Comment 3 Sunil Kumar Acharya 2024-09-17 10:05:02 UTC

Moving the non-blocker BZs out of ODF-4.17.0 as part of Development Freeze.

Note You need to log in before you can comment on or make changes to this bug.