Bug 2084534

Summary: OSD utilization notifications are mentioned in Cluster utilization alert message
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Filip Balák <fbalak>
Component: odf-managed-serviceAssignee: Kaustav Majumder <kmajumde>
Status: CLOSED EOL QA Contact: Itzhak <ikave>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.10CC: aeyal, ebenahar, kmajumde, odf-bz-bot
Target Milestone: ---Keywords: UserExperience
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 2.0.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-07-11 10:26:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2084014, 2136854    
Bug Blocks:    

Description Filip Balák 2022-05-12 11:50:14 UTC
Description of problem:
Alert message for Cluster utilization looks like:

Your storage cluster utilization has crossed 80% and will become read-only at 85% utilized! Please free up some space or if possible expand the storage cluster immediately to prevent any service access issues. It is common to also be alerted to OSD devices entering near-full or full states prior to this alert. 

Sentence `It is common to also be alerted to OSD devices entering near-full or full states prior to this alert.` is not true because there are currently no OSD notifications available for users.

Version-Release number of selected component (if applicable):
ocs-operator.v4.10.0
OCP 4.10.8

How reproducible:
1/1

Steps to Reproduce:
1. Deploy provider and consumer with 4 TiB cluster on ROSA (don't deploy larger: https://bugzilla.redhat.com/show_bug.cgi?id=2084014)
2. Set notification emails during deployment.
3. Fully utilize cluster capacity.
4. Check email.

Actual results:
There will be an email with title `OpenShift Data Foundation Managed Service notification, Action required on your managed OpenShift cluster!` with message:

```
Hello!

This notification is for your OpenShift managed cluster running OpenShift Data Foundation.

Your storage cluster utilization has crossed 80% and will become read-only at 85% utilized! Please free up some space or if possible expand the storage cluster immediately to prevent any service access issues. It is common to also be alerted to OSD devices entering near-full or full states prior to this alert.

If you have any questions, please contact us. Review the support process for guidance on working with Red Hat support.

Thank you for choosing Red Hat OpenShift Data Foundation,
ODF SRE 
```

Expected results:
There should be no mention about a notification for OSD devices because current release doesn't contain any OSD notifications for users.

Additional info:

Comment 8 Itzhak 2023-09-27 15:06:55 UTC
I tested the BZ with MS provider OCP 4.11, ODF 4.11 cluster, and MS consumer OCP 4.12, ODF 4.11 cluster.
I performed the following steps:

1. Utilize the MS consumer cluster for 97 percent.
To achieve this, I used a built-in fixture in the ocs-ci project.

2. I got three emails during the utilization:
- First email, when it reached 75%: 
"
Persistent Volume Usage is Nearly Full

The utilization of one or more of the PVs in your cluster (e20c0f51-9b43-4a28-b0bf-6fe8bb44845d) has exceeded 75%. Please free up some space or expand the PV if possible. Failure to address this issue may lead to service interruptions.

PVC Name: fio-target
Namespace: namespace-test-1f0e855b37b94c00a1ca48587
"

- Second email, when it reached 85%:
"
Persistent Volume Usage Critical

The utilization of one or more of the PVs in your cluster (e20c0f51-9b43-4a28-b0bf-6fe8bb44845d) has exceeded 85%. Please free up some space immediately or expand the PV if possible. Failure to address this issue may lead to service interruptions.

PVC Name: fio-target
Namespace: namespace-test-1f0e855b37b94c00a1ca48587
"

- Third email, again when it was in 85% or higher:
"
Ceph Cluster is Critically Full

Your storage cluster (96cf3749-ede7-453e-a652-e7e7ce6700c8) utilization has crossed 80% and will move into a read-only state at 85%! Please free up some space or if possible expand the storage cluster immediately to prevent any service access issues.
"

3. I checked the three emails above; they do not mention the osd devices or osd.
4. I also checked(as part of the ocs-ci test) that the space was reclaimed successfully.

Link to the Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-odf-multicluster/1970/.

Comment 9 Itzhak 2023-09-27 15:11:08 UTC
One more thing about the deployment: 
The OSD size was 4Ti:
$ oc rsh -n openshift-storage $(oc get pods -o wide -n openshift-storage|grep tool|awk '{print$1}') ceph osd status
ID  HOST                                          USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE      
 0  ip-10-206-38-19.us-east-2.compute.internal    172G  3923G     81     79.4M      0        0   exists,up  
 1  ip-10-206-41-81.us-east-2.compute.internal    172G  3923G     27      105M      1      105   exists,up  
 2  ip-10-206-43-103.us-east-2.compute.internal   171G  3924G     15     56.0M      0      819   exists,up

$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                        STORAGECLASS   REASON   AGE
pvc-069d2401-9fc2-4b45-89c3-b35e4c8d3cd6   50Gi       RWO            Delete           Bound    openshift-storage/rook-ceph-mon-c                            gp2                     78m
pvc-0ac224a5-612e-42fe-b193-1a9982961d8b   4Ti        RWO            Delete           Bound    openshift-storage/default-2-data-0l749v                      gp2                     73m
pvc-23f208f0-e729-468a-b896-1b672ddb8ccf   50Gi       RWO            Delete           Bound    openshift-storage/rook-ceph-mon-a                            gp2                     80m
pvc-4d8dde2e-255a-4f16-881e-3a578827ae82   50Gi       RWO            Delete           Bound    openshift-storage/rook-ceph-mon-b                            gp2                     80m
pvc-505e6d7d-01bd-4543-95cb-8dc43254e68c   4Ti        RWO            Delete           Bound    openshift-storage/default-1-data-0dxgvw                      gp2                     74m
pvc-5788ee83-3dc5-4b7b-b561-a210e9acadda   10Gi       RWO            Delete           Bound    openshift-monitoring/alertmanager-data-alertmanager-main-0   gp3                     85m
pvc-91bb56f3-eeab-44b3-9071-f602b2a5ba58   10Gi       RWO            Delete           Bound    openshift-monitoring/alertmanager-data-alertmanager-main-1   gp3                     85m
pvc-a0b51a12-f426-4149-9385-ad5d400f6294   100Gi      RWO            Delete           Bound    openshift-monitoring/prometheus-data-prometheus-k8s-0        gp3                     85m
pvc-c39e951a-597d-4a40-b3ea-a341751cad0a   4Ti        RWO            Delete           Bound    openshift-storage/default-0-data-04g4gt                      gp2                     74m
pvc-f2a6e0aa-4268-4f5b-93d9-f4f428aa8ee2   100Gi      RWO            Delete           Bound    openshift-monitoring/prometheus-data-prometheus-k8s-1        gp3                     85m

Comment 10 Ohad 2024-07-11 10:26:45 UTC
The ODF Managed Service Project has sunset and is now consider obsolete