+++ This bug was initially created as a clone of Bug #2084541 +++ Description of problem: There is no CephClusterReadOnly alert triggered when cluster reaches 85% of its utilization Version-Release number of selected component (if applicable): ocs-operator.v4.10.0 OCP 4.10.8 How reproducible: 1/1 Steps to Reproduce: 1. Deploy provider and consumer with 4 TiB cluster on ROSA (this is not reproducible on larger clusters: https://bugzilla.redhat.com/show_bug.cgi?id=2084014) 2. Set notification emails during deployment. 3. Fully utilize cluster capacity. 4. Check email. Actual results: There is an email: Your storage cluster utilization has crossed 80% and will become read-only at 85% utilized! Please free up some space or if possible expand the storage cluster immediately to prevent any service access issues. It is common to also be alerted to OSD devices entering near-full or full states prior to this alert. That was received on reaching 80% of utilized capacity. There is no notification that cluster is read only after reaching 85%. Expected results: User should be notified that the cluster is read only. Additional info: Output of ceph df command on fully utilized cluster: $ oc rsh -n openshift-storage $(oc get pods -n openshift-storage|grep tool|awk '{print$1}') ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 12 TiB 1.8 TiB 10 TiB 10 TiB 85.01 TOTAL 12 TiB 1.8 TiB 10 TiB 10 TiB 85.01 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL device_health_metrics 1 1 15 KiB 6 46 KiB 100.00 0 B ocs-storagecluster-cephblockpool 2 32 19 B 1 12 KiB 100.00 0 B ocs-storagecluster-cephfilesystem-metadata 3 32 18 KiB 22 138 KiB 100.00 0 B ocs-storagecluster-cephfilesystem-data0 4 32 0 B 0 0 B 0 0 B cephblockpool-storageconsumer-1318e613-2b6e-45e1-81e2-b25f67221e47 5 32 3.4 TiB 892.69k 10 TiB 100.00 0 B --- Additional comment from Red Hat Bugzilla on 2022-08-05 19:09:06 UTC --- remove performed by PnT Account Manager <pnt-expunge> --- Additional comment from Red Hat Bugzilla on 2022-08-05 19:09:34 UTC --- remove performed by PnT Account Manager <pnt-expunge> --- Additional comment from Red Hat Bugzilla on 2022-12-31 19:29:35 UTC --- remove performed by PnT Account Manager <pnt-expunge> --- Additional comment from Red Hat Bugzilla on 2022-12-31 19:49:58 UTC --- remove performed by PnT Account Manager <pnt-expunge> --- Additional comment from Red Hat Bugzilla on 2022-12-31 19:50:01 UTC --- remove performed by PnT Account Manager <pnt-expunge> --- Additional comment from Red Hat Bugzilla on 2022-12-31 22:31:45 UTC --- remove performed by PnT Account Manager <pnt-expunge> --- Additional comment from Red Hat Bugzilla on 2022-12-31 23:27:25 UTC --- remove performed by PnT Account Manager <pnt-expunge> --- Additional comment from Dhruv Bindra on 2023-01-20 09:49:53 UTC --- Try it on the latest build --- Additional comment from Filip Balák on 2023-03-02 14:16:15 UTC --- Notifications for cluster utilization including CephClusterReadOnly are not working: $ oc rsh -n openshift-storage $(oc get pods -n openshift-storage|grep tool|awk '{print$1}') ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 12 TiB 1.8 TiB 10 TiB 10 TiB 85.02 TOTAL 12 TiB 1.8 TiB 10 TiB 10 TiB 85.02 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL device_health_metrics 1 1 0 B 0 0 B 0 0 B ocs-storagecluster-cephfilesystem-metadata 2 32 16 KiB 22 131 KiB 100.00 0 B ocs-storagecluster-cephfilesystem-data0 3 256 0 B 0 0 B 0 0 B cephblockpool-storageconsumer-fddd8f1a-09e4-42fc-be0d-7d70e5f02f79 4 64 3.4 TiB 893.22k 10 TiB 100.00 0 B $ oc rsh -n openshift-storage $(oc get pods -n openshift-storage|grep tool|awk '{print$1}') ceph -s cluster: id: 9e2ee3a5-53ef-45f3-bbd7-2dc83b07993f health: HEALTH_ERR 3 full osd(s) 4 pool(s) full services: mon: 3 daemons, quorum a,b,c (age 5h) mgr: a(active, since 5h) mds: 1/1 daemons up, 1 hot standby osd: 3 osds: 3 up (since 5h), 3 in (since 5h) data: volumes: 1/1 healthy pools: 4 pools, 353 pgs objects: 893.24k objects, 3.4 TiB usage: 10 TiB used, 1.8 TiB / 12 TiB avail pgs: 353 active+clean io: client: 1.2 KiB/s rd, 2 op/s rd, 0 op/s wr $ rosa describe addon-installation --cluster fbalak03-1-pr --addon ocs-provider-qe Id: ocs-provider-qe Href: /api/clusters_mgmt/v1/clusters/226dcb9q8ric7euo2o73oo9k3jg73rjq/addons/ocs-provider-qe Addon state: ready Parameters: "size" : "4" "onboarding-validation-key" : (...) "notification-email-1" : "fbalak" "notification-email-2" : "odf-ms-qe" Tested with: ocs-osd-deployer.v2.0.11 must-gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/fbalak03-1-pr/fbalak03-1-pr_20230301T100351/logs/testcases_1677687913/ --- Additional comment from Rewant on 2023-07-03 11:04:29 UTC --- @kmajumde can you please provide the latest update? --- Additional comment from Red Hat Bugzilla on 2023-08-03 08:28:29 UTC --- Account disabled by LDAP Audit
@fbalak I tried this out on a 4.10.14 ODF, filled up the cluster (1.5TB) up to 85% and I'm getting all the alerts required, `CephClusterCriticallyFull`, `CephClusterNearFull`, and `CephClusterReadOnly` Can you provide a cluster where this issue is reproducible?