This bug was initially created as a copy of Bug #2272099 I am copying this bug because: commit lost that was backported to 7.0 and not in 7.1 This bug was initially created as a copy of Bug #2243105 I am copying this bug because: 7.0z2 backport Description of problem: MDS: "1 MDSs behind on trimming" and "2 clients failing to respond to cache pressure". The site is having the issue detailed below. I did have them gather some MDS data from what I've learned in the past and it's extremely strange that there are no Ops In Flight, there are only completed Ops. Not sure what that is indicating. I also scanned every file in the MG for anything indicating SELinux relabeling was at play, it seems that is NOT the case The data is loaded in SS under case 03632353. ==== drwxrwxrwx+ 3 yank yank 59 Oct 10 13:15 0040-odf-must-gather-2.tar.gz drwxrwxrwx+ 3 yank yank 26 Oct 10 15:29 0050-ceph-debug-logs.tar.gz ==== Attachment 0050 [details] is a tar.gz with get ops in flight, session ls and perf dump every few seconds. Again, I see not ops in flight, but certain clients with many completed ops. I'll apologize in advance, I feel like I've missed something obvious BR Manny -bash 5.1 $ cat ceph_health_detail HEALTH_WARN 1 filesystem is degraded; 2 clients failing to respond to cache pressure; 1 MDSs behind on trimming [WRN] FS_DEGRADED: 1 filesystem is degraded fs ocs-storagecluster-cephfilesystem is degraded [WRN] MDS_CLIENT_RECALL: 2 clients failing to respond to cache pressure mds.ocs-storagecluster-cephfilesystem-b(mds.0): Client ip-10-2-103-131:csi-cephfs-node failing to respond to cache pressure client_id: 31330220 mds.ocs-storagecluster-cephfilesystem-b(mds.0): Client ip-10-2-114-50:csi-cephfs-node failing to respond to cache pressure client_id: 34512838 [WRN] MDS_TRIM: 1 MDSs behind on trimming mds.ocs-storagecluster-cephfilesystem-b(mds.0): Behind on trimming (2026/256) max_segments: 256, num_segments: 2026 -bash 5.1 $ cat ceph_versions { "mon": { "ceph version 16.2.10-187.el8cp (5d6355e2bccd18b5c6457a34cb666d773f21823d) pacific (stable)": 3 }, "mgr": { "ceph version 16.2.10-187.el8cp (5d6355e2bccd18b5c6457a34cb666d773f21823d) pacific (stable)": 1 }, "osd": { "ceph version 16.2.10-187.el8cp (5d6355e2bccd18b5c6457a34cb666d773f21823d) pacific (stable)": 3 }, "mds": { "ceph version 16.2.10-187.el8cp (5d6355e2bccd18b5c6457a34cb666d773f21823d) pacific (stable)": 2 }, "overall": { "ceph version 16.2.10-187.el8cp (5d6355e2bccd18b5c6457a34cb666d773f21823d) pacific (stable)": 9 } } Searching for set extended attributes (while not zero, these are extremely low): -bash 5.1 $ find ./ -type f -exec zgrep -ic setxatt {} \; | grep -v ^0 456 332 362 45 10 4893 6601 1 5644 558 474 894 256 504 672 515 1127 619 940 1102 302 751 680 359 6617 406 470 908 545 664 282 928 520 644 534 312 912 630 Version-Release number of selected component (if applicable): RHCS 5.3z4 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 7.1 security and bug fix update.), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:5080