Description of problem: MDS shutdown hung until "mds_bal_interval" was changed from 0 to something non-zero DRW reported this during and upgrade to RHCS 5.3z4: ~~~ Problem Statement MDS stuck in 'stopping' state during upgrade to 5.3Z4 from 5.3Z2 Description What are you experiencing? What are you expecting to happen? Ceph upgrade in progress, MDS count reducing to 1 but MDS stuck in 'stopping' state for an hour. ~~~ The case was opened as a Sev 1 and then shortly thereafter, the customer (Tyler) reported this: ~~~ Set ceph config mds_bal_interval from 0 to 10 (default) and failed the MDS, it successfully exited and restarted then drained. Investigating if we can get all the way to 1 mds, will update case. ~~~ Given changing "mds_bal_interval" to zero for multi-MDS sites is fashionable now, I decided to open this BZ to involve RHCS Engineering The case was opened (2023-07-29 @ 16:18), but Tyler only provided the MDS logs only recently. The logs are in Support Shell under case #03574915 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: That the MDS would shutdown without the need for this sort of intervention. Additional info: