2297097 – Alert 'CephClusterCriticallyFull' not triggered when ceph filled 85% of its capacity.

Bug 2297097 - Alert 'CephClusterCriticallyFull' not triggered when ceph filled 85% of its capacity. [NEEDINFO]

Summary: Alert 'CephClusterCriticallyFull' not triggered when ceph filled 85% of its c...

Keywords:
Status:	ASSIGNED
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ceph-monitoring
Sub Component:
Version:	4.16
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	arun kumar mohan
QA Contact:	Harish NV Rao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-07-10 13:58 UTC by Nagendra Reddy
Modified:	2024-09-17 10:05 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
Flags:	amohan: needinfo? (nagreddy)

Attachments	(Terms of Use)
Pools 100% utilised when raw 85% used (42.09 KB, image/png) 2024-07-10 13:58 UTC, Nagendra Reddy	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OCSBZM-8668	0	None	None	None	2024-07-23 07:37:57 UTC

Description Nagendra Reddy 2024-07-10 13:58:37 UTC

Created attachment 2039415 [details]
Pools 100% utilised when raw 85% used

Description of problem (please be detailed as possible and provide log
snippests):

I have filled 85% of my cluster capacity in 100Gi OSDs using benchmark-operator 
io. I observed 'CephClusterCriticallyFull' didn't triggered in the cluster.


Version of all relevant components (if applicable):
ocp: 4.16.0-0.nightly-2024-07-09-093958
odf: 4.16.0-rhodf

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Y

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Intermittent

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Fill the cluster till 85% using benchmark-operator io.
2. No alerts will be seen for Clustercriticalfull.
3. I tried this on IBM cloud 100Gi OSD cluster.

There is a similar test [tests/cross_functional/system_test/test_cluster_full_and_recovery.py] automated which may help in reproducing this issue.


Actual results:
 Alert 'CephClusterCriticallyFull' not triggered.

Expected results:
Alert 'CephClusterCriticallyFull' should be triggered at 85% of cluster full.

Additional info:

I observed pools were 100% utilised when RAW used capacity was 85%. Is this the reason for not triggering the alert? Not sure.

Comment 6 Sunil Kumar Acharya 2024-09-17 10:05:02 UTC

Moving the non-blocker BZs out of ODF-4.17.0 as part of Development Freeze.

Note You need to log in before you can comment on or make changes to this bug.