Bug 2304292

Summary: [cephfs] mds stuck in clienteply state
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Amarnath <amk>
Component: CephFSAssignee: Venky Shankar <vshankar>
Status: CLOSED DUPLICATE QA Contact: Hemanth Kumar <hyelloji>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.0CC: ceph-eng-bugs, cephqe-warriors
Target Milestone: ---   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-08-22 04:37:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Amarnath 2024-08-13 10:07:47 UTC
Description of problem:
We have 4 node cluster with below roles
[root@mero014 ~]# ceph orch host ls
HOST     ADDR          LABELS                                                 STATUS  
mero017  10.8.129.237  _admin,osd,mon,mgr,rgw,installer                               
mero018  10.8.129.238  osd,_admin,mon,mgr,rgw                                         
mero019  10.8.129.239  osd-bak,mgr,mon,mds                                            
mero020  10.8.129.240  node-exporter,alertmanager,osd,mds,grafana,prometheus          
4 hosts in cluster
[root@mero014 ~]#

Created filesystem and set max_mds to 2

[root@mero014 ~]# ceph fs status
cephfs - 27 clients
======
RANK     STATE               MDS              ACTIVITY     DNS    INOS   DIRS   CAPS  
 0       active     cephfs.mero020.rfboyy  Reqs:    0 /s   125     28     23     32   
 1    clientreplay  cephfs.mero020.pudtvz                   10     13     11      0   
       POOL           TYPE     USED  AVAIL  
cephfs.cephfs.meta  metadata   537M  96.0T  
cephfs.cephfs.data    data    30.0G  96.0T  
cephfs-ec - 87 clients
=========
RANK  STATE             MDS                ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  cephfs-ec.mero018.icqakl  Reqs:    0 /s    14     17     12     76   
 1    active  cephfs-ec.mero019.oisevx  Reqs:    0 /s    10     13     12     28   
         POOL            TYPE     USED  AVAIL  
cephfs.cephfs-ec.meta  metadata   816k  96.0T  
cephfs.cephfs-ec.data    data    3000M  96.0T  
cephfs_1 - 1 clients
========
RANK  STATE             MDS               ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  cephfs_1.mero020.rwomqq  Reqs:    0 /s    10     13     12      1   
        POOL            TYPE     USED  AVAIL  
cephfs.cephfs_1.meta  metadata  96.0k  96.0T  
cephfs.cephfs_1.data    data       0   96.0T  
cephfs_io - 1 clients
=========
RANK  STATE             MDS                ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  cephfs_io.mero017.cyawhn  Reqs:    0 /s    10     13     12      1   
         POOL            TYPE     USED  AVAIL  
cephfs.cephfs_io.meta  metadata   101k  96.0T  
cephfs.cephfs_io.data    data       0   96.0T  
      STANDBY MDS         
cephfs_1.mero017.vcsaum   
 cephfs.mero018.kkgluj    
 cephfs.mero020.paudfu    
 cephfs.mero019.ifxmuk    
cephfs-ec.mero020.ymclas  
cephfs-ec.mero019.urydro  
cephfs_io.mero019.mbyyis  
MDS version: ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)
[root@mero014 ~]# 


Both active nodes have been deployed in same mero020 node

The cluster came to this state after running baremetal suites.

mds Logs : http://magna002.ceph.redhat.com/ceph-qe-logs/amk_1/mds_clientreply/


Version-Release number of selected component (if applicable):
[root@mero014 ~]# ceph versions
{
    "mon": {
        "ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 3
    },
    "mgr": {
        "ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 3
    },
    "osd": {
        "ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 44
    },
    "mds": {
        "ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 13
    },
    "rgw": {
        "ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 2
    },
    "overall": {
        "ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 65
    }
}



How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Storage PM bot 2024-08-13 10:07:59 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.