Bug 2297097

Summary:

Alert 'CephClusterCriticallyFull' not triggered when ceph filled 85% of its capacity.

Product:

[Red Hat Storage] Red Hat OpenShift Data Foundation

Reporter:

Nagendra Reddy <nagreddy>

Component:

ceph-monitoring

Assignee:

arun kumar mohan <amohan>

Status:

ASSIGNED ---

QA Contact:

Harish NV Rao <hnallurv>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

4.16

CC:

amohan, odf-bz-bot

Target Milestone:

---

Flags:

amohan: needinfo? (nagreddy)

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Pools 100% utilised when raw 85% used	none

Description Nagendra Reddy 2024-07-10 13:58:37 UTC

Created attachment 2039415 [details]
Pools 100% utilised when raw 85% used

Description of problem (please be detailed as possible and provide log
snippests):

I have filled 85% of my cluster capacity in 100Gi OSDs using benchmark-operator 
io. I observed 'CephClusterCriticallyFull' didn't triggered in the cluster.


Version of all relevant components (if applicable):
ocp: 4.16.0-0.nightly-2024-07-09-093958
odf: 4.16.0-rhodf

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Y

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Intermittent

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Fill the cluster till 85% using benchmark-operator io.
2. No alerts will be seen for Clustercriticalfull.
3. I tried this on IBM cloud 100Gi OSD cluster.

There is a similar test [tests/cross_functional/system_test/test_cluster_full_and_recovery.py] automated which may help in reproducing this issue.


Actual results:
 Alert 'CephClusterCriticallyFull' not triggered.

Expected results:
Alert 'CephClusterCriticallyFull' should be triggered at 85% of cluster full.

Additional info:

I observed pools were 100% utilised when RAW used capacity was 85%. Is this the reason for not triggering the alert? Not sure.

Comment 6 Sunil Kumar Acharya 2024-09-17 10:05:02 UTC

Moving the non-blocker BZs out of ODF-4.17.0 as part of Development Freeze.