.Build up of log segments are now resolved
Previously, a bug in the MDS log trimming code caused a buildup of log segments when many directories were exported between MDSs. As a result, the buildup of log segments in the MDS triggered trim warnings.
With this fix, the log trimming issue is resolved, preventing the buildup of log segments and eliminating the trim warnings.
Description of problem:
Ceph cluster upgrade from 7 to 8 with single(1 of 2) client upgrade from 7 to 8 causes "1 MDSs behind on trimming" issue.
[root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph status
cluster:
id: 8aa47112-719a-11ef-a440-fa163eddfd8b
health: HEALTH_WARN
1 MDSs behind on trimming
services:
mon: 3 daemons, quorum ceph-sumar-automation-xbo3gq-node1-installer,ceph-sumar-automation-xbo3gq-node3,ceph-sumar-automation-xbo3gq-node2 (age 2d)
mgr: ceph-sumar-automation-xbo3gq-node2.taoxmj(active, since 2d), standbys: ceph-sumar-automation-xbo3gq-node1-installer.ekktqw
mds: 3/3 daemons up, 2 standby
osd: 16 osds: 16 up (since 2d), 16 in (since 2d)
data:
volumes: 2/2 healthy
pools: 6 pools, 689 pgs
objects: 13.44k objects, 4.5 GiB
usage: 36 GiB used, 204 GiB / 240 GiB avail
pgs: 689 active+clean
io:
client: 102 B/s rd, 0 op/s rd, 0 op/s wr
[root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph health detail
HEALTH_WARN 1 MDSs behind on trimming
[WRN] MDS_TRIM: 1 MDSs behind on trimming
mds.cephfs.ceph-sumar-automation-xbo3gq-node3.ddrtvq(mds.0): Behind on trimming (543/128) max_segments: 128, num_segments: 543
[root@ceph-sumar-automation-xbo3gq-node8 ~]#
[root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph fs status
cephfs - 11 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active cephfs.ceph-sumar-automation-xbo3gq-node3.ddrtvq Reqs: 0 /s 12.3k 12.2k 1672 60
1 active cephfs.ceph-sumar-automation-xbo3gq-node5.jhageh Reqs: 0 /s 14 17 15 5
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 6591M 61.8G
cephfs.cephfs.data data 5817M 61.8G
cephfs-ec - 6 clients
=========
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active cephfs.ceph-sumar-automation-xbo3gq-node4.vfobhj Reqs: 0 /s 2687 156 132 24
POOL TYPE USED AVAIL
cephfs-metadata metadata 118M 61.8G
cephfs-data-ec data 1000M 92.8G
STANDBY MDS
cephfs.ceph-sumar-automation-xbo3gq-node6.beekyl
cephfs.ceph-sumar-automation-xbo3gq-node7.glidzj
MDS version: ceph version 19.1.1-38.el9cp (2aa959bc6d7a4c9e601a74e7b8c1d79b5c467ae8) squid (rc)
[root@ceph-sumar-automation-xbo3gq-node8 ~]#
[root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph orch host ls
HOST ADDR LABELS STATUS
ceph-sumar-automation-xbo3gq-node1-installer 10.0.208.233 _admin,mgr,mon,installer
ceph-sumar-automation-xbo3gq-node2 10.0.209.246 mgr,mon
ceph-sumar-automation-xbo3gq-node3 10.0.209.48 mds,mon
ceph-sumar-automation-xbo3gq-node4 10.0.210.87 osd,mds
ceph-sumar-automation-xbo3gq-node5 10.0.209.145 osd,mds
ceph-sumar-automation-xbo3gq-node6 10.0.211.176 nfs,osd,mds
ceph-sumar-automation-xbo3gq-node7 10.0.210.55 nfs,mds,grafana,osd
Version-Release number of selected component (if applicable):
How reproducible: consistent
Steps to Reproduce:
1. Create setup on ceph7 cluster as below,
Two filesystems - cephfs, cephfs-ec
Subvolumes in non-default groups on each filesystem mounted across kernel, fuse and nfs
Create snap schedule, manual snapshots, clones from snapshots, Dir pinning configuration
2. Perform Cluster upgrade to ceph8 , and 1 of 2 client upgrade to 8
3. Verify cluster upgrade completes
Actual results: Cluster completes but ceph status reports "1 MDSs behind on trimming"
Ceph health detail reports,
[root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph health detail
HEALTH_WARN 1 MDSs behind on trimming
[WRN] MDS_TRIM: 1 MDSs behind on trimming
mds.cephfs.ceph-sumar-automation-xbo3gq-node3.ddrtvq(mds.0): Behind on trimming (559/128) max_segments: 128, num_segments: 559
This status os ceph continues to remain in state for >2days. Status is not auto-resolved.
Expected results: MDS trimming should succeed.
Additional info:
Existing mount points on clients with ceph tools version version as 8 on client1 and 7 on client2 continue to remain accessible in this state.
[root@ceph-sumar-automation-xbo3gq-node8 ~]# yum info ceph-fuse ceph-common
Updating Subscription Management repositories.
Last metadata expiration check: 2:45:49 ago on Sun Sep 15 23:55:41 2024.
Installed Packages
Name : ceph-common
Epoch : 2
Version : 19.1.1
Release : 38.el9cp
Architecture : x86_64
Size : 83 M
Source : ceph-19.1.1-38.el9cp.src.rpm
Repository : @System
From repo : download-01.beak-001.prod.iad2.dc.redhat.com_rhel-9_composes_auto_ceph-8.0-rhel-9_RHCEPH-8.0-RHEL-9-20240912.ci.3_compose_Tools_x86_64_os_
Summary : Ceph Common
URL : http://ceph.com/
License : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT
Description : Common utilities to mount and interact with a ceph storage cluster.
: Comprised of files that are common to Ceph clients and servers.
Name : ceph-fuse
Epoch : 2
Version : 19.1.1
Release : 38.el9cp
Architecture : x86_64
Size : 2.6 M
Source : ceph-19.1.1-38.el9cp.src.rpm
Repository : @System
From repo : download-01.beak-001.prod.iad2.dc.redhat.com_rhel-9_composes_auto_ceph-8.0-rhel-9_RHCEPH-8.0-RHEL-9-20240912.ci.3_compose_Tools_x86_64_os_
Summary : Ceph fuse-based client
URL : http://ceph.com/
License : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT
Description : FUSE based client for Ceph distributed network file system
[root@ceph-sumar-automation-xbo3gq-node8 ~]# ls /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_1_upgrade_sv_0
file_dstdir file_srcdir network_shared
[root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph version
ceph version 19.1.1-38.el9cp (2aa959bc6d7a4c9e601a74e7b8c1d79b5c467ae8) squid (rc)
Updating Subscription Management repositories.
Last metadata expiration check: 2:00:14 ago on Mon Sep 16 00:40:15 2024.
Installed Packages
Name : ceph-common
Epoch : 2
Version : 18.2.1
Release : 229.el9cp
Architecture : x86_64
Size : 74 M
Source : ceph-18.2.1-229.el9cp.src.rpm
Repository : @System
From repo : rhceph-7-tools-for-rhel-9-x86_64-rpms
Summary : Ceph Common
URL : http://ceph.com/
License : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT
Description : Common utilities to mount and interact with a ceph storage cluster.
: Comprised of files that are common to Ceph clients and servers.
Name : ceph-fuse
Epoch : 2
Version : 18.2.1
Release : 229.el9cp
Architecture : x86_64
Size : 2.5 M
Source : ceph-18.2.1-229.el9cp.src.rpm
Repository : @System
From repo : rhceph-7-tools-for-rhel-9-x86_64-rpms
Summary : Ceph fuse-based client
URL : http://ceph.com/
License : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT
Description : FUSE based client for Ceph distributed network file system
[root@ceph-sumar-automation-xbo3gq-node9 ~]# ls /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1
file_dstdir file_srcdir network_shared
[root@ceph-sumar-automation-xbo3gq-node9 ~]# ceph version
ceph version 19.1.1-38.el9cp (2aa959bc6d7a4c9e601a74e7b8c1d79b5c467ae8) squid (rc)
[root@ceph-sumar-automation-xbo3gq-node8 ~]# touch /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_1_upgrade_sv_0/file;echo "testing" > /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_1_upgrade_sv_0/file;cat /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_1_upgrade_sv_0/file
testing
[root@ceph-sumar-automation-xbo3gq-node8 ~]#
[root@ceph-sumar-automation-xbo3gq-node9 ~]# touch /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1/file;echo "testing" /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1/file;cat /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1/file
testing /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1/file
System Debug logs will be copied to magna002 server.
Automation logs for Upgrade test : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-XBO3GQ
Neha requested PR linke. Here it is: https://github.com/ceph/ceph/pull/60381
(available from the ceph project bug tracker and the redmine ticket has the PR id).
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fixes, and enhancement updates), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2025:2457
Comment 29Red Hat Bugzilla
2025-07-05 04:25:09 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days
Description of problem: Ceph cluster upgrade from 7 to 8 with single(1 of 2) client upgrade from 7 to 8 causes "1 MDSs behind on trimming" issue. [root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph status cluster: id: 8aa47112-719a-11ef-a440-fa163eddfd8b health: HEALTH_WARN 1 MDSs behind on trimming services: mon: 3 daemons, quorum ceph-sumar-automation-xbo3gq-node1-installer,ceph-sumar-automation-xbo3gq-node3,ceph-sumar-automation-xbo3gq-node2 (age 2d) mgr: ceph-sumar-automation-xbo3gq-node2.taoxmj(active, since 2d), standbys: ceph-sumar-automation-xbo3gq-node1-installer.ekktqw mds: 3/3 daemons up, 2 standby osd: 16 osds: 16 up (since 2d), 16 in (since 2d) data: volumes: 2/2 healthy pools: 6 pools, 689 pgs objects: 13.44k objects, 4.5 GiB usage: 36 GiB used, 204 GiB / 240 GiB avail pgs: 689 active+clean io: client: 102 B/s rd, 0 op/s rd, 0 op/s wr [root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph health detail HEALTH_WARN 1 MDSs behind on trimming [WRN] MDS_TRIM: 1 MDSs behind on trimming mds.cephfs.ceph-sumar-automation-xbo3gq-node3.ddrtvq(mds.0): Behind on trimming (543/128) max_segments: 128, num_segments: 543 [root@ceph-sumar-automation-xbo3gq-node8 ~]# [root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph fs status cephfs - 11 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.ceph-sumar-automation-xbo3gq-node3.ddrtvq Reqs: 0 /s 12.3k 12.2k 1672 60 1 active cephfs.ceph-sumar-automation-xbo3gq-node5.jhageh Reqs: 0 /s 14 17 15 5 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 6591M 61.8G cephfs.cephfs.data data 5817M 61.8G cephfs-ec - 6 clients ========= RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.ceph-sumar-automation-xbo3gq-node4.vfobhj Reqs: 0 /s 2687 156 132 24 POOL TYPE USED AVAIL cephfs-metadata metadata 118M 61.8G cephfs-data-ec data 1000M 92.8G STANDBY MDS cephfs.ceph-sumar-automation-xbo3gq-node6.beekyl cephfs.ceph-sumar-automation-xbo3gq-node7.glidzj MDS version: ceph version 19.1.1-38.el9cp (2aa959bc6d7a4c9e601a74e7b8c1d79b5c467ae8) squid (rc) [root@ceph-sumar-automation-xbo3gq-node8 ~]# [root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph orch host ls HOST ADDR LABELS STATUS ceph-sumar-automation-xbo3gq-node1-installer 10.0.208.233 _admin,mgr,mon,installer ceph-sumar-automation-xbo3gq-node2 10.0.209.246 mgr,mon ceph-sumar-automation-xbo3gq-node3 10.0.209.48 mds,mon ceph-sumar-automation-xbo3gq-node4 10.0.210.87 osd,mds ceph-sumar-automation-xbo3gq-node5 10.0.209.145 osd,mds ceph-sumar-automation-xbo3gq-node6 10.0.211.176 nfs,osd,mds ceph-sumar-automation-xbo3gq-node7 10.0.210.55 nfs,mds,grafana,osd Version-Release number of selected component (if applicable): How reproducible: consistent Steps to Reproduce: 1. Create setup on ceph7 cluster as below, Two filesystems - cephfs, cephfs-ec Subvolumes in non-default groups on each filesystem mounted across kernel, fuse and nfs Create snap schedule, manual snapshots, clones from snapshots, Dir pinning configuration 2. Perform Cluster upgrade to ceph8 , and 1 of 2 client upgrade to 8 3. Verify cluster upgrade completes Actual results: Cluster completes but ceph status reports "1 MDSs behind on trimming" Ceph health detail reports, [root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph health detail HEALTH_WARN 1 MDSs behind on trimming [WRN] MDS_TRIM: 1 MDSs behind on trimming mds.cephfs.ceph-sumar-automation-xbo3gq-node3.ddrtvq(mds.0): Behind on trimming (559/128) max_segments: 128, num_segments: 559 This status os ceph continues to remain in state for >2days. Status is not auto-resolved. Expected results: MDS trimming should succeed. Additional info: Existing mount points on clients with ceph tools version version as 8 on client1 and 7 on client2 continue to remain accessible in this state. [root@ceph-sumar-automation-xbo3gq-node8 ~]# yum info ceph-fuse ceph-common Updating Subscription Management repositories. Last metadata expiration check: 2:45:49 ago on Sun Sep 15 23:55:41 2024. Installed Packages Name : ceph-common Epoch : 2 Version : 19.1.1 Release : 38.el9cp Architecture : x86_64 Size : 83 M Source : ceph-19.1.1-38.el9cp.src.rpm Repository : @System From repo : download-01.beak-001.prod.iad2.dc.redhat.com_rhel-9_composes_auto_ceph-8.0-rhel-9_RHCEPH-8.0-RHEL-9-20240912.ci.3_compose_Tools_x86_64_os_ Summary : Ceph Common URL : http://ceph.com/ License : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT Description : Common utilities to mount and interact with a ceph storage cluster. : Comprised of files that are common to Ceph clients and servers. Name : ceph-fuse Epoch : 2 Version : 19.1.1 Release : 38.el9cp Architecture : x86_64 Size : 2.6 M Source : ceph-19.1.1-38.el9cp.src.rpm Repository : @System From repo : download-01.beak-001.prod.iad2.dc.redhat.com_rhel-9_composes_auto_ceph-8.0-rhel-9_RHCEPH-8.0-RHEL-9-20240912.ci.3_compose_Tools_x86_64_os_ Summary : Ceph fuse-based client URL : http://ceph.com/ License : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT Description : FUSE based client for Ceph distributed network file system [root@ceph-sumar-automation-xbo3gq-node8 ~]# ls /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_1_upgrade_sv_0 file_dstdir file_srcdir network_shared [root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph version ceph version 19.1.1-38.el9cp (2aa959bc6d7a4c9e601a74e7b8c1d79b5c467ae8) squid (rc) Updating Subscription Management repositories. Last metadata expiration check: 2:00:14 ago on Mon Sep 16 00:40:15 2024. Installed Packages Name : ceph-common Epoch : 2 Version : 18.2.1 Release : 229.el9cp Architecture : x86_64 Size : 74 M Source : ceph-18.2.1-229.el9cp.src.rpm Repository : @System From repo : rhceph-7-tools-for-rhel-9-x86_64-rpms Summary : Ceph Common URL : http://ceph.com/ License : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT Description : Common utilities to mount and interact with a ceph storage cluster. : Comprised of files that are common to Ceph clients and servers. Name : ceph-fuse Epoch : 2 Version : 18.2.1 Release : 229.el9cp Architecture : x86_64 Size : 2.5 M Source : ceph-18.2.1-229.el9cp.src.rpm Repository : @System From repo : rhceph-7-tools-for-rhel-9-x86_64-rpms Summary : Ceph fuse-based client URL : http://ceph.com/ License : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT Description : FUSE based client for Ceph distributed network file system [root@ceph-sumar-automation-xbo3gq-node9 ~]# ls /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1 file_dstdir file_srcdir network_shared [root@ceph-sumar-automation-xbo3gq-node9 ~]# ceph version ceph version 19.1.1-38.el9cp (2aa959bc6d7a4c9e601a74e7b8c1d79b5c467ae8) squid (rc) [root@ceph-sumar-automation-xbo3gq-node8 ~]# touch /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_1_upgrade_sv_0/file;echo "testing" > /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_1_upgrade_sv_0/file;cat /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_1_upgrade_sv_0/file testing [root@ceph-sumar-automation-xbo3gq-node8 ~]# [root@ceph-sumar-automation-xbo3gq-node9 ~]# touch /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1/file;echo "testing" /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1/file;cat /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1/file testing /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1/file System Debug logs will be copied to magna002 server. Automation logs for Upgrade test : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-XBO3GQ