Description of problem (please be detailed as possible and provide log
snippets):
- MDS is behind on trimming and it's degraded.
--------------------------------------
HEALTH_WARN 1 filesystem is degraded; 2 clients failing to respond to cache pressure; 1 MDSs behind on trimming
[WRN] FS_DEGRADED: 1 filesystem is degraded
fs ocs-storagecluster-cephfilesystem is degraded
[WRN] MDS_CLIENT_RECALL: 2 clients failing to respond to cache pressure
mds.ocs-storagecluster-cephfilesystem-a(mds.0): Client pqcn01w3354.isl.belastingdienst.nl:csi-cephfs-node failing to respond to cache pressure client_id: 19984715
mds.ocs-storagecluster-cephfilesystem-a(mds.0): Client pwcn01w3359.isl.belastingdienst.nl:csi-cephfs-node failing to respond to cache pressure client_id: 19995583
[WRN] MDS_TRIM: 1 MDSs behind on trimming
mds.ocs-storagecluster-cephfilesystem-a(mds.0): Behind on trimming (514/256) max_segments: 256, num_segments: 514
--------------------------------------
- Cu has recently upgraded the cluster from v4.8.11 to v4.9.7, after that MDS is having issues.
- The ceph status was healthy before the upgrade and the current status is healthy:
--------------------------------------
cluster:
id: 6e9995b1-8e3f-4bfe-b883-a92d1dfeb68d
health: HEALTH_WARN
1 filesystem is degraded
2 clients failing to respond to cache pressure
services:
mon: 3 daemons, quorum b,f,i (age 25h)
mgr: a(active, since 22h)
mds: 1/1 daemons up, 1 standby
osd: 3 osds: 3 up (since 25h), 3 in (since 9M)
data:
volumes: 0/1 healthy, 1 recovering
pools: 11 pools, 273 pgs
objects: 6.27M objects, 315 GiB
usage: 1.9 TiB used, 10 TiB / 12 TiB avail
pgs: 273 active+clean
--------------------------------------
Version of all relevant components (if applicable):
v4.9.7
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
- The PVC provisioning from cephfs is not working.
- Also, some pods are failing to mount the cephfs.
Is there any workaround available to the best of your knowledge?
N/A
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
N/A
Can this issue reproducible?
N/A
Can this issue reproduce from the UI?
N/A
If this is a regression, please provide more details to justify this:
N/A
Steps to Reproduce:
N/A
Actual results:
- The MDS MDS is behind on trimming and degraded.
Expected results:
- MDS should be working fine.
Additional info:
In the next comments
Description of problem (please be detailed as possible and provide log snippets): - MDS is behind on trimming and it's degraded. -------------------------------------- HEALTH_WARN 1 filesystem is degraded; 2 clients failing to respond to cache pressure; 1 MDSs behind on trimming [WRN] FS_DEGRADED: 1 filesystem is degraded fs ocs-storagecluster-cephfilesystem is degraded [WRN] MDS_CLIENT_RECALL: 2 clients failing to respond to cache pressure mds.ocs-storagecluster-cephfilesystem-a(mds.0): Client pqcn01w3354.isl.belastingdienst.nl:csi-cephfs-node failing to respond to cache pressure client_id: 19984715 mds.ocs-storagecluster-cephfilesystem-a(mds.0): Client pwcn01w3359.isl.belastingdienst.nl:csi-cephfs-node failing to respond to cache pressure client_id: 19995583 [WRN] MDS_TRIM: 1 MDSs behind on trimming mds.ocs-storagecluster-cephfilesystem-a(mds.0): Behind on trimming (514/256) max_segments: 256, num_segments: 514 -------------------------------------- - Cu has recently upgraded the cluster from v4.8.11 to v4.9.7, after that MDS is having issues. - The ceph status was healthy before the upgrade and the current status is healthy: -------------------------------------- cluster: id: 6e9995b1-8e3f-4bfe-b883-a92d1dfeb68d health: HEALTH_WARN 1 filesystem is degraded 2 clients failing to respond to cache pressure services: mon: 3 daemons, quorum b,f,i (age 25h) mgr: a(active, since 22h) mds: 1/1 daemons up, 1 standby osd: 3 osds: 3 up (since 25h), 3 in (since 9M) data: volumes: 0/1 healthy, 1 recovering pools: 11 pools, 273 pgs objects: 6.27M objects, 315 GiB usage: 1.9 TiB used, 10 TiB / 12 TiB avail pgs: 273 active+clean -------------------------------------- Version of all relevant components (if applicable): v4.9.7 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? - The PVC provisioning from cephfs is not working. - Also, some pods are failing to mount the cephfs. Is there any workaround available to the best of your knowledge? N/A Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? N/A Can this issue reproducible? N/A Can this issue reproduce from the UI? N/A If this is a regression, please provide more details to justify this: N/A Steps to Reproduce: N/A Actual results: - The MDS MDS is behind on trimming and degraded. Expected results: - MDS should be working fine. Additional info: In the next comments