Description of problem: Alert message for Cluster utilization looks like: Your storage cluster utilization has crossed 80% and will become read-only at 85% utilized! Please free up some space or if possible expand the storage cluster immediately to prevent any service access issues. It is common to also be alerted to OSD devices entering near-full or full states prior to this alert. Sentence `It is common to also be alerted to OSD devices entering near-full or full states prior to this alert.` is not true because there are currently no OSD notifications available for users. Version-Release number of selected component (if applicable): ocs-operator.v4.10.0 OCP 4.10.8 How reproducible: 1/1 Steps to Reproduce: 1. Deploy provider and consumer with 4 TiB cluster on ROSA (don't deploy larger: https://bugzilla.redhat.com/show_bug.cgi?id=2084014) 2. Set notification emails during deployment. 3. Fully utilize cluster capacity. 4. Check email. Actual results: There will be an email with title `OpenShift Data Foundation Managed Service notification, Action required on your managed OpenShift cluster!` with message: ``` Hello! This notification is for your OpenShift managed cluster running OpenShift Data Foundation. Your storage cluster utilization has crossed 80% and will become read-only at 85% utilized! Please free up some space or if possible expand the storage cluster immediately to prevent any service access issues. It is common to also be alerted to OSD devices entering near-full or full states prior to this alert. If you have any questions, please contact us. Review the support process for guidance on working with Red Hat support. Thank you for choosing Red Hat OpenShift Data Foundation, ODF SRE ``` Expected results: There should be no mention about a notification for OSD devices because current release doesn't contain any OSD notifications for users. Additional info:
I tested the BZ with MS provider OCP 4.11, ODF 4.11 cluster, and MS consumer OCP 4.12, ODF 4.11 cluster. I performed the following steps: 1. Utilize the MS consumer cluster for 97 percent. To achieve this, I used a built-in fixture in the ocs-ci project. 2. I got three emails during the utilization: - First email, when it reached 75%: " Persistent Volume Usage is Nearly Full The utilization of one or more of the PVs in your cluster (e20c0f51-9b43-4a28-b0bf-6fe8bb44845d) has exceeded 75%. Please free up some space or expand the PV if possible. Failure to address this issue may lead to service interruptions. PVC Name: fio-target Namespace: namespace-test-1f0e855b37b94c00a1ca48587 " - Second email, when it reached 85%: " Persistent Volume Usage Critical The utilization of one or more of the PVs in your cluster (e20c0f51-9b43-4a28-b0bf-6fe8bb44845d) has exceeded 85%. Please free up some space immediately or expand the PV if possible. Failure to address this issue may lead to service interruptions. PVC Name: fio-target Namespace: namespace-test-1f0e855b37b94c00a1ca48587 " - Third email, again when it was in 85% or higher: " Ceph Cluster is Critically Full Your storage cluster (96cf3749-ede7-453e-a652-e7e7ce6700c8) utilization has crossed 80% and will move into a read-only state at 85%! Please free up some space or if possible expand the storage cluster immediately to prevent any service access issues. " 3. I checked the three emails above; they do not mention the osd devices or osd. 4. I also checked(as part of the ocs-ci test) that the space was reclaimed successfully. Link to the Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-odf-multicluster/1970/.
One more thing about the deployment: The OSD size was 4Ti: $ oc rsh -n openshift-storage $(oc get pods -o wide -n openshift-storage|grep tool|awk '{print$1}') ceph osd status ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE 0 ip-10-206-38-19.us-east-2.compute.internal 172G 3923G 81 79.4M 0 0 exists,up 1 ip-10-206-41-81.us-east-2.compute.internal 172G 3923G 27 105M 1 105 exists,up 2 ip-10-206-43-103.us-east-2.compute.internal 171G 3924G 15 56.0M 0 819 exists,up $ oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-069d2401-9fc2-4b45-89c3-b35e4c8d3cd6 50Gi RWO Delete Bound openshift-storage/rook-ceph-mon-c gp2 78m pvc-0ac224a5-612e-42fe-b193-1a9982961d8b 4Ti RWO Delete Bound openshift-storage/default-2-data-0l749v gp2 73m pvc-23f208f0-e729-468a-b896-1b672ddb8ccf 50Gi RWO Delete Bound openshift-storage/rook-ceph-mon-a gp2 80m pvc-4d8dde2e-255a-4f16-881e-3a578827ae82 50Gi RWO Delete Bound openshift-storage/rook-ceph-mon-b gp2 80m pvc-505e6d7d-01bd-4543-95cb-8dc43254e68c 4Ti RWO Delete Bound openshift-storage/default-1-data-0dxgvw gp2 74m pvc-5788ee83-3dc5-4b7b-b561-a210e9acadda 10Gi RWO Delete Bound openshift-monitoring/alertmanager-data-alertmanager-main-0 gp3 85m pvc-91bb56f3-eeab-44b3-9071-f602b2a5ba58 10Gi RWO Delete Bound openshift-monitoring/alertmanager-data-alertmanager-main-1 gp3 85m pvc-a0b51a12-f426-4149-9385-ad5d400f6294 100Gi RWO Delete Bound openshift-monitoring/prometheus-data-prometheus-k8s-0 gp3 85m pvc-c39e951a-597d-4a40-b3ea-a341751cad0a 4Ti RWO Delete Bound openshift-storage/default-0-data-04g4gt gp2 74m pvc-f2a6e0aa-4268-4f5b-93d9-f4f428aa8ee2 100Gi RWO Delete Bound openshift-monitoring/prometheus-data-prometheus-k8s-1 gp3 85m
The ODF Managed Service Project has sunset and is now consider obsolete