Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2312513

Summary: [CephFS - MDS] MDS behind trimming warning after ceph 7 to 8 upgrade on cluster and single client upgrade
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: sumr
Component: CephFSAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: sumr
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.0CC: bhkaur, ceph-eng-bugs, cephqe-warriors, gfarnum, hyelloji, ngangadh, sbaldwin, tserlin, vshankar
Target Milestone: ---Keywords: Automation, Upgrades
Target Release: 8.0z2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-19.2.0-88.el9cp Doc Type: Bug Fix
Doc Text:
.Build up of log segments are now resolved Previously, a bug in the MDS log trimming code caused a buildup of log segments when many directories were exported between MDSs. As a result, the buildup of log segments in the MDS triggered trim warnings. With this fix, the log trimming issue is resolved, preventing the buildup of log segments and eliminating the trim warnings.
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-03-06 14:22:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description sumr 2024-09-16 06:45:58 UTC
Description of problem:
Ceph cluster upgrade from 7 to 8 with single(1 of 2) client upgrade from 7 to 8 causes "1 MDSs behind on trimming" issue.


[root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph status
  cluster:
    id:     8aa47112-719a-11ef-a440-fa163eddfd8b
    health: HEALTH_WARN
            1 MDSs behind on trimming
 
  services:
    mon: 3 daemons, quorum ceph-sumar-automation-xbo3gq-node1-installer,ceph-sumar-automation-xbo3gq-node3,ceph-sumar-automation-xbo3gq-node2 (age 2d)
    mgr: ceph-sumar-automation-xbo3gq-node2.taoxmj(active, since 2d), standbys: ceph-sumar-automation-xbo3gq-node1-installer.ekktqw
    mds: 3/3 daemons up, 2 standby
    osd: 16 osds: 16 up (since 2d), 16 in (since 2d)
 
  data:
    volumes: 2/2 healthy
    pools:   6 pools, 689 pgs
    objects: 13.44k objects, 4.5 GiB
    usage:   36 GiB used, 204 GiB / 240 GiB avail
    pgs:     689 active+clean
 
  io:
    client:   102 B/s rd, 0 op/s rd, 0 op/s wr

[root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph health detail
HEALTH_WARN 1 MDSs behind on trimming
[WRN] MDS_TRIM: 1 MDSs behind on trimming
    mds.cephfs.ceph-sumar-automation-xbo3gq-node3.ddrtvq(mds.0): Behind on trimming (543/128) max_segments: 128, num_segments: 543
[root@ceph-sumar-automation-xbo3gq-node8 ~]# 

[root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph fs status
cephfs - 11 clients
======
RANK  STATE                         MDS                            ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  cephfs.ceph-sumar-automation-xbo3gq-node3.ddrtvq  Reqs:    0 /s  12.3k  12.2k  1672     60   
 1    active  cephfs.ceph-sumar-automation-xbo3gq-node5.jhageh  Reqs:    0 /s    14     17     15      5   
       POOL           TYPE     USED  AVAIL  
cephfs.cephfs.meta  metadata  6591M  61.8G  
cephfs.cephfs.data    data    5817M  61.8G  
cephfs-ec - 6 clients
=========
RANK  STATE                         MDS                            ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  cephfs.ceph-sumar-automation-xbo3gq-node4.vfobhj  Reqs:    0 /s  2687    156    132     24   
      POOL         TYPE     USED  AVAIL  
cephfs-metadata  metadata   118M  61.8G  
 cephfs-data-ec    data    1000M  92.8G  
                  STANDBY MDS                     
cephfs.ceph-sumar-automation-xbo3gq-node6.beekyl  
cephfs.ceph-sumar-automation-xbo3gq-node7.glidzj  
MDS version: ceph version 19.1.1-38.el9cp (2aa959bc6d7a4c9e601a74e7b8c1d79b5c467ae8) squid (rc)
[root@ceph-sumar-automation-xbo3gq-node8 ~]# 

[root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph orch host ls
HOST                                          ADDR          LABELS                    STATUS  
ceph-sumar-automation-xbo3gq-node1-installer  10.0.208.233  _admin,mgr,mon,installer          
ceph-sumar-automation-xbo3gq-node2            10.0.209.246  mgr,mon                           
ceph-sumar-automation-xbo3gq-node3            10.0.209.48   mds,mon                           
ceph-sumar-automation-xbo3gq-node4            10.0.210.87   osd,mds                           
ceph-sumar-automation-xbo3gq-node5            10.0.209.145  osd,mds                           
ceph-sumar-automation-xbo3gq-node6            10.0.211.176  nfs,osd,mds                       
ceph-sumar-automation-xbo3gq-node7            10.0.210.55   nfs,mds,grafana,osd  


Version-Release number of selected component (if applicable):


How reproducible: consistent


Steps to Reproduce:
1. Create setup on ceph7 cluster as below,
Two filesystems - cephfs, cephfs-ec
Subvolumes in non-default groups on each filesystem mounted across kernel, fuse and nfs
Create snap schedule, manual snapshots, clones from snapshots, Dir pinning configuration
2. Perform Cluster upgrade to ceph8 , and 1 of 2 client upgrade to 8
3. Verify cluster upgrade completes

Actual results: Cluster completes but ceph status reports "1 MDSs behind on trimming"
Ceph health detail reports,
[root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph health detail
HEALTH_WARN 1 MDSs behind on trimming
[WRN] MDS_TRIM: 1 MDSs behind on trimming
    mds.cephfs.ceph-sumar-automation-xbo3gq-node3.ddrtvq(mds.0): Behind on trimming (559/128) max_segments: 128, num_segments: 559

This status os ceph continues to remain in state for >2days. Status is not auto-resolved.


Expected results: MDS trimming should succeed.


Additional info:
Existing mount points on clients with ceph tools version version as 8 on client1 and 7 on client2 continue to remain accessible in this state.

[root@ceph-sumar-automation-xbo3gq-node8 ~]# yum info ceph-fuse ceph-common
Updating Subscription Management repositories.
Last metadata expiration check: 2:45:49 ago on Sun Sep 15 23:55:41 2024.
Installed Packages
Name         : ceph-common
Epoch        : 2
Version      : 19.1.1
Release      : 38.el9cp
Architecture : x86_64
Size         : 83 M
Source       : ceph-19.1.1-38.el9cp.src.rpm
Repository   : @System
From repo    : download-01.beak-001.prod.iad2.dc.redhat.com_rhel-9_composes_auto_ceph-8.0-rhel-9_RHCEPH-8.0-RHEL-9-20240912.ci.3_compose_Tools_x86_64_os_
Summary      : Ceph Common
URL          : http://ceph.com/
License      : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT
Description  : Common utilities to mount and interact with a ceph storage cluster.
             : Comprised of files that are common to Ceph clients and servers.

Name         : ceph-fuse
Epoch        : 2
Version      : 19.1.1
Release      : 38.el9cp
Architecture : x86_64
Size         : 2.6 M
Source       : ceph-19.1.1-38.el9cp.src.rpm
Repository   : @System
From repo    : download-01.beak-001.prod.iad2.dc.redhat.com_rhel-9_composes_auto_ceph-8.0-rhel-9_RHCEPH-8.0-RHEL-9-20240912.ci.3_compose_Tools_x86_64_os_
Summary      : Ceph fuse-based client
URL          : http://ceph.com/
License      : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT
Description  : FUSE based client for Ceph distributed network file system

[root@ceph-sumar-automation-xbo3gq-node8 ~]# ls /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_1_upgrade_sv_0
file_dstdir  file_srcdir  network_shared
[root@ceph-sumar-automation-xbo3gq-node8 ~]# ceph version
ceph version 19.1.1-38.el9cp (2aa959bc6d7a4c9e601a74e7b8c1d79b5c467ae8) squid (rc)


Updating Subscription Management repositories.
Last metadata expiration check: 2:00:14 ago on Mon Sep 16 00:40:15 2024.
Installed Packages
Name         : ceph-common
Epoch        : 2
Version      : 18.2.1
Release      : 229.el9cp
Architecture : x86_64
Size         : 74 M
Source       : ceph-18.2.1-229.el9cp.src.rpm
Repository   : @System
From repo    : rhceph-7-tools-for-rhel-9-x86_64-rpms
Summary      : Ceph Common
URL          : http://ceph.com/
License      : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT
Description  : Common utilities to mount and interact with a ceph storage cluster.
             : Comprised of files that are common to Ceph clients and servers.

Name         : ceph-fuse
Epoch        : 2
Version      : 18.2.1
Release      : 229.el9cp
Architecture : x86_64
Size         : 2.5 M
Source       : ceph-18.2.1-229.el9cp.src.rpm
Repository   : @System
From repo    : rhceph-7-tools-for-rhel-9-x86_64-rpms
Summary      : Ceph fuse-based client
URL          : http://ceph.com/
License      : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT
Description  : FUSE based client for Ceph distributed network file system

[root@ceph-sumar-automation-xbo3gq-node9 ~]# ls /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1
file_dstdir  file_srcdir  network_shared
[root@ceph-sumar-automation-xbo3gq-node9 ~]# ceph version
ceph version 19.1.1-38.el9cp (2aa959bc6d7a4c9e601a74e7b8c1d79b5c467ae8) squid (rc)

[root@ceph-sumar-automation-xbo3gq-node8 ~]# touch /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_1_upgrade_sv_0/file;echo "testing" > /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_1_upgrade_sv_0/file;cat /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_1_upgrade_sv_0/file
testing
[root@ceph-sumar-automation-xbo3gq-node8 ~]#

[root@ceph-sumar-automation-xbo3gq-node9 ~]# touch /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1/file;echo "testing" /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1/file;cat /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1/file
testing /mnt/cephfs_fusesji0p2gz9j_cephfs_upgrade_svg_0_upgrade_sv_1/file

System Debug logs will be copied to magna002 server.

Automation logs for Upgrade test : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-XBO3GQ

Comment 14 Venky Shankar 2024-10-22 07:59:48 UTC
Neha requested PR linke. Here it is: https://github.com/ceph/ceph/pull/60381

(available from the ceph project bug tracker and the redmine ticket has the PR id).

Comment 28 errata-xmlrpc 2025-03-06 14:22:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fixes, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2025:2457

Comment 29 Red Hat Bugzilla 2025-07-05 04:25:09 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days