2264553 – Max unavailable value is 1 in mon PDB even for five mon setup

Bug 2264553 - Max unavailable value is 1 in mon PDB even for five mon setup

Summary: Max unavailable value is 1 in mon PDB even for five mon setup

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.15
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.15.2
Assignee:	Santosh Pillai
QA Contact:	Joy John Pinto
Docs Contact:
URL:
Whiteboard:
Depends On:	2276823
Blocks:
TreeView+	depends on / blocked

Reported:	2024-02-16 14:28 UTC by Joy John Pinto
Modified:	2024-08-30 04:25 UTC (History)
CC List:	8 users (show)
Fixed In Version:	4.15.2-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-05-01 01:31:01 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage rook pull 577	None	open	Bug 2264553: mon: set mon pdb max unavailable as 2 in case of 5 or more mons	2024-04-10 12:38:17 UTC
Github	rook rook pull 13794	None	Merged	mon: set mon pdb max unavailable as 2 in case of 5 or more mons.	2024-04-10 12:38:18 UTC
Red Hat Product Errata	RHBA-2024:2636	None	None	None	2024-05-01 01:31:05 UTC

Description Joy John Pinto 2024-02-16 14:28:05 UTC

Description of problem (please be detailed as possible and provide log
snippests):
Max unavailable value is 1 in mon PDB even for five mon setup

Version of all relevant components (if applicable):


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
OCP 4.15.0 and ODF 4.15.0-14

Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
YEs

Can this issue reproduce from the UI?
NA

If this is a regression, please provide more details to justify this:
Yes

Steps to Reproduce:
Steps to Reproduce:
1. Install OCP 4.15 and ODF 4.15
2. Create a 5 or more nodes rack/host based failure domain cluster
3. Update monCount to 5 on configure modal and verify that five mons are running
4. Go to monPDB and check max unavailable and allowed disruptions value, currently Max unavailable is set as 1 and Alllowed disruption is set as 0 


Actual results:
In mon PDB at present Max unavailable is set as 1 and Alllowed disruption is set as 0

Expected results:
Max unavailable should be greater than 1 

Additional info:

[jopinto@jopinto 5mon]$ oc get pdb rook-ceph-mon-pdb -n openshift-storage
NAME                MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
rook-ceph-mon-pdb   N/A             1                 0                     68m

oc get pods| grep mon
rook-ceph-mon-a-df55985cc-7dscj                                   2/2     Running     0             30m   10.131.0.14    compute-2   <none>           <none>
rook-ceph-mon-b-588d9677bf-lzgp4                                  2/2     Running     0             30m   10.128.4.16    compute-0   <none>           <none>
rook-ceph-mon-c-54668c6b99-mzj75                                  2/2     Running     0             29m   10.131.2.20    compute-1   <none>           <none>
rook-ceph-mon-d-6fc8d98555-nhfqm                                  2/2     Running     0             21m   10.129.2.23    compute-3   <none>           <none>
rook-ceph-mon-e-7f8bc9c68f-4xc44                                  2/2     Running     0             21m   10.128.2.21    compute-5   <none>           <none>

Comment 3 Travis Nielsen 2024-02-16 14:48:07 UTC

The mon pdb max unavailable should be increased to 2 when there are at least 5 mons. 

Since we are just now adding support for 5 mons we should fix this at least for 4.15.z. It is low risk so we could consider it for 4.15.0 if there is still time before the RC, but I wouldn't consider it a blocker.

Comment 10 Joy John Pinto 2024-04-24 06:18:50 UTC

On a OCP 4.15 cluster with ODF 4.15.2-1 installed followed following steps,

1. Installed OCP 4.15 and ODF 4.15.2-1
2. Created a six nodes rack/host based failure domain cluster
3. Updated monCount to 5 on configure modal 

In the mon PodDisruptionBudget, the values for maxUnavailable and allowedDisruptions are set to 2. which is valid.
 

Upon performing node drain on three of the nodes out of six nodes, as expected three mons were up, But CephMonQuorumAtRisk Critical alert was shown. I have raised https://bugzilla.redhat.com/show_bug.cgi?id=2276823  for the same

Comment 11 Joy John Pinto 2024-04-29 04:31:16 UTC

Verified and closing the bug as monPDB values are updated and based on https://bugzilla.redhat.com/show_bug.cgi?id=2276823#c8.

Comment 15 errata-xmlrpc 2024-05-01 01:31:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.15.2 Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:2636

Comment 16 Red Hat Bugzilla 2024-08-30 04:25:07 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.