Description of problem (please be detailed as possible and provide log snippests): Stand-by MDS stuck in 'client-replay' state only forever when the active gets MDS restarted and stand-by supposed to be active. Version of all relevant components (if applicable): 4.15.0-126 4.15.0-0.nightly-2024-01-25-051548 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes Is there any workaround available to the best of your knowledge? NA Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Run more IO on Active MDS to utilize High amount of cache. 2.Active MDS pod will be restarted. 3.Stand-by MDS supposed to be active. In that process, the MDS pod stuck in client-replay state forever. 4. Now, none of the mds pods are active. Actual results: Stand-by MDS pod stuck in the 'client-replay' state Expected results: Stand-by MDS should be Active when active MDS failed or restarted. Additional info: sh-5.1$ ceph -s cluster: id: a622f0f3-09a6-412b-9b06-e651e1d75e7f health: HEALTH_WARN 1 filesystem is degraded 1 MDSs report slow requests 1 MDSs behind on trimming services: mon: 3 daemons, quorum a,b,c (age 21h) mgr: a(active, since 21h), standbys: b mds: 1/1 daemons up, 1 standby osd: 3 osds: 3 up (since 21h), 3 in (since 6d) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 0/1 healthy, 1 recovering pools: 12 pools, 169 pgs objects: 1.86M objects, 71 GiB usage: 237 GiB used, 1.3 TiB / 1.5 TiB avail pgs: 169 active+clean io: client: 195 KiB/s wr, 0 op/s rd, 3 op/s wr sh-5.1$ ceph fs status ocs-storagecluster-cephfilesystem - 5 clients ================================= RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 clientreplay ocs-storagecluster-cephfilesystem-a 1699k 1652k 727 13.4k POOL TYPE USED AVAIL ocs-storagecluster-cephfilesystem-metadata metadata 9155M 356G ocs-storagecluster-cephfilesystem-data0 data 30.6G 356G STANDBY MDS ocs-storagecluster-cephfilesystem-b MDS version: ceph version 17.2.6-194.el9cp (d9f4aedda0fc0d99e7e0e06892a69523d2eb06dc) quincy (stable) sh-5.1$ ----------------------------------------------------------------- oc get pods NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-8649f7f85f-z77p5 2/2 Running 0 50s csi-cephfsplugin-czhph 2/2 Running 13 (21h ago) 6d csi-cephfsplugin-m4cwp 2/2 Running 40 (22h ago) 6d csi-cephfsplugin-provisioner-7f87d9556b-dqwl6 6/6 Running 40 (21h ago) 6d csi-cephfsplugin-provisioner-7f87d9556b-gdgpp 6/6 Running 64 6d csi-cephfsplugin-rqf2k 2/2 Running 12 (21h ago) 6d csi-rbdplugin-8x6j6 3/3 Running 54 (22h ago) 6d csi-rbdplugin-bt5dp 3/3 Running 16 (21h ago) 6d csi-rbdplugin-provisioner-78884f6f8c-jqhlz 6/6 Running 62 6d csi-rbdplugin-provisioner-78884f6f8c-lq8mg 6/6 Running 79 6d csi-rbdplugin-snhdl 3/3 Running 16 (21h ago) 6d noobaa-core-0 1/1 Running 3 2d6h noobaa-db-pg-0 1/1 Running 3 2d6h noobaa-endpoint-5456dd8bd-4shm8 1/1 Running 1 23h noobaa-operator-54d5fc85b8-qsr5l 2/2 Running 74 (10h ago) 2d4h ocs-metrics-exporter-b94d575ff-pjd6c 1/1 Running 3 6d ocs-operator-d57b464dd-4szrv 1/1 Running 232 (8m25s ago) 6d odf-console-6d664888c8-tbnqw 1/1 Running 3 6d odf-operator-controller-manager-67ff86cb69-2fwjx 2/2 Running 207 (8m27s ago) 6d rook-ceph-crashcollector-compute-0-5776bbfc8d-ll4gh 1/1 Running 0 22h rook-ceph-crashcollector-compute-1-7bb5565597-4pktq 1/1 Running 0 22h rook-ceph-crashcollector-compute-2-c4d75658b-l9frn 1/1 Running 0 21h rook-ceph-exporter-compute-0-d79bbf9b8-gmqs4 1/1 Running 1 (21h ago) 22h rook-ceph-exporter-compute-1-75fff6dcbf-tmrdc 1/1 Running 0 22h rook-ceph-exporter-compute-2-5d7ffc454-767tc 1/1 Running 0 21h rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-56c5cd89s6f9x 2/2 Running 1 (66m ago) 150m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-775ddcf88tv94 2/2 Running 1 (38m ago) 149m rook-ceph-mgr-a-59dcf4bbd9-6ccvn 3/3 Running 4 (21h ago) 22h rook-ceph-mgr-b-855b9c966b-gk57d 3/3 Running 1 (21h ago) 21h rook-ceph-mon-a-6d8d6595bf-rdv6m 2/2 Running 0 22h rook-ceph-mon-b-7f7775b869-bc68t 2/2 Running 0 22h rook-ceph-mon-c-6cc496dfd9-kbg42 2/2 Running 0 21h rook-ceph-operator-5b5c5d9b76-qwdkp 1/1 Running 9 6d rook-ceph-osd-0-76cc86458c-bmz6l 2/2 Running 0 22h rook-ceph-osd-1-6c469b7c87-krdc9 2/2 Running 0 22h rook-ceph-osd-2-6754d6657d-ws9vt 2/2 Running 0 21h rook-ceph-osd-prepare-ocs-deviceset-0-data-08bmpn-h4fbb 0/1 Completed 0 6d rook-ceph-osd-prepare-ocs-deviceset-1-data-0xztrp-srq9j 0/1 Completed 0 6d rook-ceph-osd-prepare-ocs-deviceset-2-data-0v4rs4-vplsv 0/1 Completed 0 6d rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-c5b686bfgp7t 2/2 Running 0 21h rook-ceph-tools-6c854d5d84-jmv7m 1/1 Running 3 6d ux-backend-server-7d5f748f7c-6mwb7 2/2 Running 6 6d