Bug 2084014
| Summary: | Capacity utilization alerts on provider are not raised for large clusters | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Filip Balák <fbalak> |
| Component: | odf-managed-service | Assignee: | Pranshu Srivastava <prasriva> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Neha Berry <nberry> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.10 | CC: | aeyal, mbukatov, nthomas, ocs-bugs, odf-bz-bot, pcuzner, prasriva, sgatfane, tnielsen |
| Target Milestone: | --- | Keywords: | AutomationBlocker |
| Target Release: | --- | Flags: | prasriva:
needinfo?
(fbalak) |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-10-12 10:11:35 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2084534, 2084541 | ||
|
Description
Filip Balák
2022-05-11 08:20:34 UTC
I tested the scenario also with a cluster with 4 TiB size and 3 availability zones and I am still unable to get any Cluster level SendGrid notification. After utilizing fully a cluster with 4 TiB size and 3 availability zones I got SendGrid notification:
Your storage cluster utilization has crossed 80% and will become read-only at 85% utilized! Please free up some space or if possible expand the storage cluster immediately to prevent any service access issues. It is common to also be alerted to OSD devices entering near-full or full states prior to this alert.
After full utilization of the cluster, it's capacity looks like:
$ oc rsh -n openshift-storage $(oc get pods -n openshift-storage|grep tool|awk '{print$1}') ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 12 TiB 1.8 TiB 10 TiB 10 TiB 85.01
TOTAL 12 TiB 1.8 TiB 10 TiB 10 TiB 85.01
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 15 KiB 6 46 KiB 100.00 0 B
ocs-storagecluster-cephblockpool 2 32 19 B 1 12 KiB 100.00 0 B
ocs-storagecluster-cephfilesystem-metadata 3 32 18 KiB 22 138 KiB 100.00 0 B
ocs-storagecluster-cephfilesystem-data0 4 32 0 B 0 0 B 0 0 B
cephblockpool-storageconsumer-1318e613-2b6e-45e1-81e2-b25f67221e47 5 32 3.4 TiB 892.69k 10 TiB 100.00 0 B
This is not achievable with larger clusters as mentioned in description of the bug.
@fbalak Could you confirm if the ask here is to include existing pool alerts, namely CephPoolQuotaBytesNearExhaustion and CephPoolQuotaBytesCriticallyExhausted, as defined here [1], which would let the user know when the pools exceed the threshold limit? - [1]: https://github.com/ceph/ceph-mixins/blob/master/alerts/pool-quota.libsonnet#L7-L38 I don't think that would solve the issue. AFAIK there is no pool quota set for pools used in default ceph storageclasses. New pool capacity alerts (not quota alerts) that would be clearly communicated to users could help here but RFE that was created for it was closed as not needed and confusing: https://bugzilla.redhat.com/show_bug.cgi?id=1870083 |