Bug 2297097

Summary: Alert 'CephClusterCriticallyFull' not triggered when ceph filled 85% of its capacity.
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Nagendra Reddy <nagreddy>
Component: ceph-monitoringAssignee: arun kumar mohan <amohan>
Status: ASSIGNED --- QA Contact: Harish NV Rao <hnallurv>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.16CC: amohan, odf-bz-bot
Target Milestone: ---Flags: amohan: needinfo? (nagreddy)
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Pools 100% utilised when raw 85% used none

Description Nagendra Reddy 2024-07-10 13:58:37 UTC
Created attachment 2039415 [details]
Pools 100% utilised when raw 85% used

Description of problem (please be detailed as possible and provide log
snippests):

I have filled 85% of my cluster capacity in 100Gi OSDs using benchmark-operator 
io. I observed 'CephClusterCriticallyFull' didn't triggered in the cluster.


Version of all relevant components (if applicable):
ocp: 4.16.0-0.nightly-2024-07-09-093958
odf: 4.16.0-rhodf

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Y

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Intermittent

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Fill the cluster till 85% using benchmark-operator io.
2. No alerts will be seen for Clustercriticalfull.
3. I tried this on IBM cloud 100Gi OSD cluster.

There is a similar test [tests/cross_functional/system_test/test_cluster_full_and_recovery.py] automated which may help in reproducing this issue.


Actual results:
 Alert 'CephClusterCriticallyFull' not triggered.

Expected results:
Alert 'CephClusterCriticallyFull' should be triggered at 85% of cluster full.

Additional info:

I observed pools were 100% utilised when RAW used capacity was 85%. Is this the reason for not triggering the alert? Not sure.

Comment 6 Sunil Kumar Acharya 2024-09-17 10:05:02 UTC
Moving the non-blocker BZs out of ODF-4.17.0 as part of Development Freeze.