Bug 2266583

Summary: Number failure domain value is hardcoded in CephMonLowNumber alert
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Joy John Pinto <jopinto>
Component: ocs-operatorAssignee: Nikhil Ladha <nladha>
Status: CLOSED ERRATA QA Contact: Joy John Pinto <jopinto>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.15CC: branto, muagarwa, odf-bz-bot
Target Milestone: ---   
Target Release: ODF 4.15.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.15.0-155 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-03-19 15:33:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joy John Pinto 2024-02-28 12:12:47 UTC
Description of problem (please be detailed as possible and provide log
snippests):
Number failure domain value is hardcoded in CephMonLowNumber alert

Version of all relevant components (if applicable):
OCP 4.15
ODF 4.15.0-150

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
NA

Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
NA

Steps to Reproduce:
1.Install 6 worker node cluster and label the worker nodes in 6 different racks
2. When more than five, say six failure domains are present wait for CephMonLowNumber alert.
3. Upon inspecting the alert message 'The number of zone failure domains available  (5) allow to increase the ceph monitors from 3 to 5 in order to improve cluster resilience'


Actual results:
The alert has hard coded value of failure domains available in 'The number of zone failure domains available  (5) allow to increase the ceph monitors from 3 to 5 in order to improve cluster resilience'

Expected results:
'The number of zone failure domains available  (5) allow to increase the ceph monitors from 3 to 5 in order to improve cluster resilience' failure domain value should not be hardcoded

Additional info:
Please refer mon_low_no_alert.jpg

failure domains available: [jopinto@jopinto ceph-csi]$     oc get storagecluster -o jsonpath='{.items[*].status.failureDomainValues}' -n openshift-storage | tr ',' '\n' | sort -u | wc -l
6
[jopinto@jopinto ceph-csi]$

Comment 9 Joy John Pinto 2024-03-08 04:30:49 UTC
Verifeid with OCP 4.15 and ODF 4.15.0-157

The alert text is changed to 'The number of failure zones available allow to increase the number of Ceph monitors from 3 to 5 in order to improve cluster resilience.'. Please refer monlow_alert_verified.png

Comment 11 errata-xmlrpc 2024-03-19 15:33:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383