Created attachment 2018862 [details] no load share on newly Active MDS Description of problem (please be detailed as possible and provide log snippests): MDS-scale up performed with below patch cmd. oc patch storagecluster ocs-storagecluster -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/managedResources/cephFilesystems/activeMetadataServers", "value": 2 }]' This command is adding 1 Active and 1 standby-replay mds daemons. So there will be total 2-Active and 2 standby-replay mds after this patch command. But there is no Load share is happening on newly added mds pods. Version of all relevant components (if applicable): OCP: 4.15.0-rc.8 odf: 4.15.0-147.stable Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Run file IO for longer time to utilise 67% of the CPU available on MDS. 2. Perform scale up. 3. There wont be any load share on newly brought Active mds pod. Actual results: The new active mds pod from mds scale up is sharing the load after 7hrs Expected results: Both the active MDS pods should share the load as soon as possible. Additional info:
This is likely not a bug. The load sharing only occurs based consistent hashing of the subvolumes between the MDS ranks. A subvolume will only be authoritative on a single rank (and not "load shared"). Therefore, if you're only testing with one subvolume, then you will not see any load sharing. Please ensure you have at least 4 subvolumes for your test and then re-evalutate.
Closing this BZ based on Patrick's response in comment #3. Nagendra, please reopen if you still see the issue after testing with 4 sub volumes are suggested.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days