Description of problem: MDS failover on rank-0 mds returns EPERM error as below, but Ceph Health is OK. [root@rhel94client1 ~]# ceph health detail HEALTH_OK [root@rhel94client1 ~]# ceph mds fail cephfs_1.magna022.xclusw Error EPERM: MDS has one of two health warnings which could extend recovery: MDS_TRIM or MDS_CACHE_OVERSIZED. MDS failover is not recommended since it might cause unexpected file system unavailability. If you wish to proceed, pass --yes-i-really-mean-it Version-Release number of selected component (if applicable): 19.2.0-53.el9cp How reproducible: Steps to Reproduce: 1. Configure CephFS MDS as 2/2 daemons up, 2 standby, 2 hot standby 2. Upgrade to latest Squid build while IO in-progress 3. After few minutes of upgrade, attempt MDS failover on rank-0 MDS. Actual results: MDS failover returns EPERM Error as below but Ceph HEALTH is OK. [root@rhel94client1 ~]# ceph health detail HEALTH_OK [root@rhel94client1 ~]# ceph mds fail cephfs_1.magna022.xclusw Error EPERM: MDS has one of two health warnings which could extend recovery: MDS_TRIM or MDS_CACHE_OVERSIZED. MDS failover is not recommended since it might cause unexpected file system unavailability. If you wish to proceed, pass --yes-i-really-mean-it [root@rhel94client1 ~]# ceph -s cluster: id: 38f0f738-95d9-11ef-a651-002590fc2a2e health: HEALTH_OK Expected results: If rank-0 MDS had one of two warnings MDS_TRIM or MDS_CACHE_OVERSIZED, it should have been reported in ceph -s or ceph health detail. OR If ceph health is OK, then Ceph mds failover should have been allowed without prompting warning message as above. Additional info: [root@rhel94client1 ~]# ceph -s cluster: id: 38f0f738-95d9-11ef-a651-002590fc2a2e health: HEALTH_OK services: mon: 3 daemons, quorum magna021,magna023,magna022 (age 68m) mgr: magna022.ilqiwl(active, since 73m), standbys: magna021.uhexkq mds: 2/2 daemons up, 2 standby, 2 hot standby osd: 21 osds: 21 up (since 30m), 21 in (since 3w) data: volumes: 1/1 healthy pools: 4 pools, 561 pgs objects: 576.65k objects, 623 GiB usage: 1.8 TiB used, 17 TiB / 19 TiB avail pgs: 471 active+clean+snaptrim_wait 49 active+clean 41 active+clean+snaptrim io: client: 5.2 MiB/s rd, 94 MiB/s wr, 321 op/s rd, 215 op/s wr [root@rhel94client1 ~]# ceph fs status cephfs_1 - 47 clients ======== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs_1.magna022.xclusw Reqs: 109 /s 42.8k 26.1k 1651 8691 1 active cephfs_1.magna028.egaara Reqs: 37 /s 33.0k 20.2k 1553 7226 0-s standby-replay cephfs_1.magna024.xsvsbn Evts: 178 /s 57.3k 22.4k 1946 0 1-s standby-replay cephfs_1.magna027.liffty Evts: 144 /s 44.4k 17.2k 1555 0 POOL TYPE USED AVAIL cephfs.cephfs_1.meta metadata 3838M 5580G cephfs.cephfs_1.data data 1742G 5580G STANDBY MDS cephfs_1.magna023.bpteaq cephfs_1.magna026.kmtgip MDS version: ceph version 19.2.0-53.el9cp (677d8728b1c91c14d54eedf276ac61de636606f8) squid (stable) [root@rhel94client1 ~]# ceph health detail HEALTH_OK [root@rhel94client1 ~]# ceph mds fail cephfs_1.magna022.xclusw Error EPERM: MDS has one of two health warnings which could extend recovery: MDS_TRIM or MDS_CACHE_OVERSIZED. MDS failover is not recommended since it might cause unexpected file system unavailability. If you wish to proceed, pass --yes-i-really-mean-it [root@rhel94client1 ~]# ceph orch ps --refresh | grep mds mds.cephfs_1.magna022.xclusw magna022 running (19m) 10m ago 28h 115M - 19.2.0-53.el9cp e4177168bc51 6de21f253d2d mds.cephfs_1.magna023.bpteaq magna023 running (18m) 10m ago 28h 19.1M - 19.2.0-53.el9cp e4177168bc51 56c00f330ba8 mds.cephfs_1.magna024.xsvsbn magna024 running (17m) 10m ago 28h 59.4M - 19.2.0-53.el9cp e4177168bc51 a01dd90042a1 mds.cephfs_1.magna026.kmtgip magna026 running (17m) 10m ago 28h 15.3M - 19.2.0-53.el9cp e4177168bc51 c92e5c4b4e05 mds.cephfs_1.magna027.liffty magna027 running (16m) 9m ago 28h 41.4M - 19.2.0-53.el9cp e4177168bc51 ddbd9aeebeba mds.cephfs_1.magna028.egaara magna028 running (15m) 10m ago 28h 69.4M - 19.2.0-53.el9cp e4177168bc51 d5d9c87fc79f I will add perf dump and mds debug logs to BZ directory and share the link.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2025:9775