2265987 – The new active mds pod from mds scale up is sharing the load after 7hrs

Bug 2265987 - The new active mds pod from mds scale up is sharing the load after 7hrs

Summary: The new active mds pod from mds scale up is sharing the load after 7hrs

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.15
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Santosh Pillai
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-02-26 07:16 UTC by Nagendra Reddy
Modified:	2024-06-30 04:25 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-03-01 04:14:32 UTC
Embargoed:

Attachments	(Terms of Use)

Description Nagendra Reddy 2024-02-26 07:16:29 UTC

Created attachment 2018862 [details]
no load share on newly Active MDS

Description of problem (please be detailed as possible and provide log
snippests):
MDS-scale up performed with below patch cmd.

oc patch storagecluster ocs-storagecluster -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/managedResources/cephFilesystems/activeMetadataServers", "value": 2 }]'

This command is adding 1 Active and 1 standby-replay mds daemons. So there will be total 2-Active and 2 standby-replay mds after this patch command. 

But there is no Load share is happening on newly added mds pods.

Version of all relevant components (if applicable):

OCP: 4.15.0-rc.8
odf: 4.15.0-147.stable

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

2
Can this issue reproducible?

yes
Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Run file IO for longer time to utilise 67% of the CPU available on MDS.
2. Perform scale up.
3. There wont be any load share on newly brought Active mds pod.


Actual results:
 The new active mds pod from mds scale up is sharing the load after 7hrs 

Expected results:
 Both the active MDS pods should share the load as soon as possible.

Additional info:

Comment 3 Patrick Donnelly 2024-02-28 02:20:01 UTC

This is likely not a bug. The load sharing only occurs based consistent hashing of the subvolumes between the MDS ranks. A subvolume will only be authoritative on a single rank (and not "load shared"). Therefore, if you're only testing with one subvolume, then you will not see any load sharing.

Please ensure you have at least 4 subvolumes for your test and then re-evalutate.

Comment 6 Santosh Pillai 2024-03-01 04:14:32 UTC

Closing this BZ based on Patrick's response in comment #3. 

Nagendra, please reopen if you still see the issue after testing with 4 sub volumes are suggested.

Comment 7 Red Hat Bugzilla 2024-06-30 04:25:03 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.