Bug 2084534 - OSD utilization notifications are mentioned in Cluster utilization alert message
Summary: OSD utilization notifications are mentioned in Cluster utilization alert message
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-managed-service
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Kaustav Majumder
QA Contact: Itzhak
URL:
Whiteboard:
Depends On: 2084014 2136854
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-12 11:50 UTC by Filip Balák
Modified: 2024-07-11 10:26 UTC (History)
4 users (show)

Fixed In Version: 2.0.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-07-11 10:26:45 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-osd-deployer pull 187 0 None Merged [BZ:2084534]Removed OSD utilization notifications in Cluster utiization alert message 2022-05-20 05:18:47 UTC

Description Filip Balák 2022-05-12 11:50:14 UTC
Description of problem:
Alert message for Cluster utilization looks like:

Your storage cluster utilization has crossed 80% and will become read-only at 85% utilized! Please free up some space or if possible expand the storage cluster immediately to prevent any service access issues. It is common to also be alerted to OSD devices entering near-full or full states prior to this alert. 

Sentence `It is common to also be alerted to OSD devices entering near-full or full states prior to this alert.` is not true because there are currently no OSD notifications available for users.

Version-Release number of selected component (if applicable):
ocs-operator.v4.10.0
OCP 4.10.8

How reproducible:
1/1

Steps to Reproduce:
1. Deploy provider and consumer with 4 TiB cluster on ROSA (don't deploy larger: https://bugzilla.redhat.com/show_bug.cgi?id=2084014)
2. Set notification emails during deployment.
3. Fully utilize cluster capacity.
4. Check email.

Actual results:
There will be an email with title `OpenShift Data Foundation Managed Service notification, Action required on your managed OpenShift cluster!` with message:

```
Hello!

This notification is for your OpenShift managed cluster running OpenShift Data Foundation.

Your storage cluster utilization has crossed 80% and will become read-only at 85% utilized! Please free up some space or if possible expand the storage cluster immediately to prevent any service access issues. It is common to also be alerted to OSD devices entering near-full or full states prior to this alert.

If you have any questions, please contact us. Review the support process for guidance on working with Red Hat support.

Thank you for choosing Red Hat OpenShift Data Foundation,
ODF SRE 
```

Expected results:
There should be no mention about a notification for OSD devices because current release doesn't contain any OSD notifications for users.

Additional info:

Comment 8 Itzhak 2023-09-27 15:06:55 UTC
I tested the BZ with MS provider OCP 4.11, ODF 4.11 cluster, and MS consumer OCP 4.12, ODF 4.11 cluster.
I performed the following steps:

1. Utilize the MS consumer cluster for 97 percent.
To achieve this, I used a built-in fixture in the ocs-ci project.

2. I got three emails during the utilization:
- First email, when it reached 75%: 
"
Persistent Volume Usage is Nearly Full

The utilization of one or more of the PVs in your cluster (e20c0f51-9b43-4a28-b0bf-6fe8bb44845d) has exceeded 75%. Please free up some space or expand the PV if possible. Failure to address this issue may lead to service interruptions.

PVC Name: fio-target
Namespace: namespace-test-1f0e855b37b94c00a1ca48587
"

- Second email, when it reached 85%:
"
Persistent Volume Usage Critical

The utilization of one or more of the PVs in your cluster (e20c0f51-9b43-4a28-b0bf-6fe8bb44845d) has exceeded 85%. Please free up some space immediately or expand the PV if possible. Failure to address this issue may lead to service interruptions.

PVC Name: fio-target
Namespace: namespace-test-1f0e855b37b94c00a1ca48587
"

- Third email, again when it was in 85% or higher:
"
Ceph Cluster is Critically Full

Your storage cluster (96cf3749-ede7-453e-a652-e7e7ce6700c8) utilization has crossed 80% and will move into a read-only state at 85%! Please free up some space or if possible expand the storage cluster immediately to prevent any service access issues.
"

3. I checked the three emails above; they do not mention the osd devices or osd.
4. I also checked(as part of the ocs-ci test) that the space was reclaimed successfully.

Link to the Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-odf-multicluster/1970/.

Comment 9 Itzhak 2023-09-27 15:11:08 UTC
One more thing about the deployment: 
The OSD size was 4Ti:
$ oc rsh -n openshift-storage $(oc get pods -o wide -n openshift-storage|grep tool|awk '{print$1}') ceph osd status
ID  HOST                                          USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE      
 0  ip-10-206-38-19.us-east-2.compute.internal    172G  3923G     81     79.4M      0        0   exists,up  
 1  ip-10-206-41-81.us-east-2.compute.internal    172G  3923G     27      105M      1      105   exists,up  
 2  ip-10-206-43-103.us-east-2.compute.internal   171G  3924G     15     56.0M      0      819   exists,up

$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                        STORAGECLASS   REASON   AGE
pvc-069d2401-9fc2-4b45-89c3-b35e4c8d3cd6   50Gi       RWO            Delete           Bound    openshift-storage/rook-ceph-mon-c                            gp2                     78m
pvc-0ac224a5-612e-42fe-b193-1a9982961d8b   4Ti        RWO            Delete           Bound    openshift-storage/default-2-data-0l749v                      gp2                     73m
pvc-23f208f0-e729-468a-b896-1b672ddb8ccf   50Gi       RWO            Delete           Bound    openshift-storage/rook-ceph-mon-a                            gp2                     80m
pvc-4d8dde2e-255a-4f16-881e-3a578827ae82   50Gi       RWO            Delete           Bound    openshift-storage/rook-ceph-mon-b                            gp2                     80m
pvc-505e6d7d-01bd-4543-95cb-8dc43254e68c   4Ti        RWO            Delete           Bound    openshift-storage/default-1-data-0dxgvw                      gp2                     74m
pvc-5788ee83-3dc5-4b7b-b561-a210e9acadda   10Gi       RWO            Delete           Bound    openshift-monitoring/alertmanager-data-alertmanager-main-0   gp3                     85m
pvc-91bb56f3-eeab-44b3-9071-f602b2a5ba58   10Gi       RWO            Delete           Bound    openshift-monitoring/alertmanager-data-alertmanager-main-1   gp3                     85m
pvc-a0b51a12-f426-4149-9385-ad5d400f6294   100Gi      RWO            Delete           Bound    openshift-monitoring/prometheus-data-prometheus-k8s-0        gp3                     85m
pvc-c39e951a-597d-4a40-b3ea-a341751cad0a   4Ti        RWO            Delete           Bound    openshift-storage/default-0-data-04g4gt                      gp2                     74m
pvc-f2a6e0aa-4268-4f5b-93d9-f4f428aa8ee2   100Gi      RWO            Delete           Bound    openshift-monitoring/prometheus-data-prometheus-k8s-1        gp3                     85m

Comment 10 Ohad 2024-07-11 10:26:45 UTC
The ODF Managed Service Project has sunset and is now consider obsolete


Note You need to log in before you can comment on or make changes to this bug.