Description of problem (please be detailed as possible and provide log snippets): - MDS is behind on trimming and it's degraded. -------------------------------------- HEALTH_WARN 1 filesystem is degraded; 2 clients failing to respond to cache pressure; 1 MDSs behind on trimming [WRN] FS_DEGRADED: 1 filesystem is degraded fs ocs-storagecluster-cephfilesystem is degraded [WRN] MDS_CLIENT_RECALL: 2 clients failing to respond to cache pressure mds.ocs-storagecluster-cephfilesystem-a(mds.0): Client pqcn01w3354.isl.belastingdienst.nl:csi-cephfs-node failing to respond to cache pressure client_id: 19984715 mds.ocs-storagecluster-cephfilesystem-a(mds.0): Client pwcn01w3359.isl.belastingdienst.nl:csi-cephfs-node failing to respond to cache pressure client_id: 19995583 [WRN] MDS_TRIM: 1 MDSs behind on trimming mds.ocs-storagecluster-cephfilesystem-a(mds.0): Behind on trimming (514/256) max_segments: 256, num_segments: 514 -------------------------------------- - Cu has recently upgraded the cluster from v4.8.11 to v4.9.7, after that MDS is having issues. - The ceph status was healthy before the upgrade and the current status is healthy: -------------------------------------- cluster: id: 6e9995b1-8e3f-4bfe-b883-a92d1dfeb68d health: HEALTH_WARN 1 filesystem is degraded 2 clients failing to respond to cache pressure services: mon: 3 daemons, quorum b,f,i (age 25h) mgr: a(active, since 22h) mds: 1/1 daemons up, 1 standby osd: 3 osds: 3 up (since 25h), 3 in (since 9M) data: volumes: 0/1 healthy, 1 recovering pools: 11 pools, 273 pgs objects: 6.27M objects, 315 GiB usage: 1.9 TiB used, 10 TiB / 12 TiB avail pgs: 273 active+clean -------------------------------------- Version of all relevant components (if applicable): v4.9.7 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? - The PVC provisioning from cephfs is not working. - Also, some pods are failing to mount the cephfs. Is there any workaround available to the best of your knowledge? N/A Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? N/A Can this issue reproducible? N/A Can this issue reproduce from the UI? N/A If this is a regression, please provide more details to justify this: N/A Steps to Reproduce: N/A Actual results: - The MDS MDS is behind on trimming and degraded. Expected results: - MDS should be working fine. Additional info: In the next comments
Latest CEPH health detail output: ceph health detail HEALTH_WARN 1 filesystem is degraded; 2 clients failing to respond to cache pressure; 1 MDSs behind on trimming [WRN] FS_DEGRADED: 1 filesystem is degraded fs ocs-storagecluster-cephfilesystem is degraded [WRN] MDS_CLIENT_RECALL: 2 clients failing to respond to cache pressure mds.ocs-storagecluster-cephfilesystem-a(mds.0): Client pqcn01w3354.isl.belastingdienst.nl:csi-cephfs-node failing to respond to cache pressure client_id: 19984715 mds.ocs-storagecluster-cephfilesystem-a(mds.0): Client pwcn01w3359.isl.belastingdienst.nl:csi-cephfs-node failing to respond to cache pressure client_id: 19995583 [WRN] MDS_TRIM: 1 MDSs behind on trimming mds.ocs-storagecluster-cephfilesystem-a(mds.0): Behind on trimming (515/256) max_segments: 256, num_segments: 515
Team, Customer is eagerly looking for a fix on priority owing downtime since last Tuesday, Can you please help us with tentative update?
Clearing my NI as per https://bugzilla.redhat.com/show_bug.cgi?id=2107110#c41
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days