2304292 – [cephfs] mds stuck in clienteply state

Bug 2304292 - [cephfs] mds stuck in clienteply state

Summary: [cephfs] mds stuck in clienteply state

Keywords:
Status:	CLOSED DUPLICATE of bug 2282097
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	CephFS
Sub Component:
Version:	8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	8.0
Assignee:	Venky Shankar
QA Contact:	Hemanth Kumar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-08-13 10:07 UTC by Amarnath
Modified:	2024-08-22 04:39 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-08-22 04:37:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-9499	0	None	None	None	2024-08-22 04:39:46 UTC

Description Amarnath 2024-08-13 10:07:47 UTC

Description of problem:
We have 4 node cluster with below roles
[root@mero014 ~]# ceph orch host ls
HOST     ADDR          LABELS                                                 STATUS  
mero017  10.8.129.237  _admin,osd,mon,mgr,rgw,installer                               
mero018  10.8.129.238  osd,_admin,mon,mgr,rgw                                         
mero019  10.8.129.239  osd-bak,mgr,mon,mds                                            
mero020  10.8.129.240  node-exporter,alertmanager,osd,mds,grafana,prometheus          
4 hosts in cluster
[root@mero014 ~]#

Created filesystem and set max_mds to 2

[root@mero014 ~]# ceph fs status
cephfs - 27 clients
======
RANK     STATE               MDS              ACTIVITY     DNS    INOS   DIRS   CAPS  
 0       active     cephfs.mero020.rfboyy  Reqs:    0 /s   125     28     23     32   
 1    clientreplay  cephfs.mero020.pudtvz                   10     13     11      0   
       POOL           TYPE     USED  AVAIL  
cephfs.cephfs.meta  metadata   537M  96.0T  
cephfs.cephfs.data    data    30.0G  96.0T  
cephfs-ec - 87 clients
=========
RANK  STATE             MDS                ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  cephfs-ec.mero018.icqakl  Reqs:    0 /s    14     17     12     76   
 1    active  cephfs-ec.mero019.oisevx  Reqs:    0 /s    10     13     12     28   
         POOL            TYPE     USED  AVAIL  
cephfs.cephfs-ec.meta  metadata   816k  96.0T  
cephfs.cephfs-ec.data    data    3000M  96.0T  
cephfs_1 - 1 clients
========
RANK  STATE             MDS               ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  cephfs_1.mero020.rwomqq  Reqs:    0 /s    10     13     12      1   
        POOL            TYPE     USED  AVAIL  
cephfs.cephfs_1.meta  metadata  96.0k  96.0T  
cephfs.cephfs_1.data    data       0   96.0T  
cephfs_io - 1 clients
=========
RANK  STATE             MDS                ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  cephfs_io.mero017.cyawhn  Reqs:    0 /s    10     13     12      1   
         POOL            TYPE     USED  AVAIL  
cephfs.cephfs_io.meta  metadata   101k  96.0T  
cephfs.cephfs_io.data    data       0   96.0T  
      STANDBY MDS         
cephfs_1.mero017.vcsaum   
 cephfs.mero018.kkgluj    
 cephfs.mero020.paudfu    
 cephfs.mero019.ifxmuk    
cephfs-ec.mero020.ymclas  
cephfs-ec.mero019.urydro  
cephfs_io.mero019.mbyyis  
MDS version: ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)
[root@mero014 ~]# 


Both active nodes have been deployed in same mero020 node

The cluster came to this state after running baremetal suites.

mds Logs : http://magna002.ceph.redhat.com/ceph-qe-logs/amk_1/mds_clientreply/


Version-Release number of selected component (if applicable):
[root@mero014 ~]# ceph versions
{
    "mon": {
        "ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 3
    },
    "mgr": {
        "ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 3
    },
    "osd": {
        "ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 44
    },
    "mds": {
        "ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 13
    },
    "rgw": {
        "ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 2
    },
    "overall": {
        "ceph version 19.1.0-22.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 65
    }
}



How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Storage PM bot 2024-08-13 10:07:59 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Note You need to log in before you can comment on or make changes to this bug.