Bug 2282097 - [IBM Support][ODF][4.11] Active MDS stuck in up:clientreplay state [NEEDINFO]
Summary: [IBM Support][ODF][4.11] Active MDS stuck in up:clientreplay state
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.11
Hardware: All
OS: All
unspecified
low
Target Milestone: ---
: ---
Assignee: Dhairya Parmar
QA Contact: Elad
URL:
Whiteboard:
: 2264031 2304292 (view as bug list)
Depends On:
Blocks: 2311741 2311743 2345561
TreeView+ depends on / blocked
 
Reported: 2024-05-21 08:22 UTC by Kritik Sachdeva
Modified: 2025-04-15 08:28 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2311741 2311743 (view as bug list)
Environment:
Last Closed:
Embargoed:
ksachdev: needinfo-
ksachdev: needinfo-
dparmar: needinfo? (muagarwa)


Attachments (Terms of Use)

Description Kritik Sachdeva 2024-05-21 08:22:30 UTC
Description of problem (please be detailed as possible and provide log
snippests):

We got a case where an active MDS is stuck at the up:clientreplay state and in the logs we were seeing the unhandled messages related to client_metrics. 

However, we do found a few trackers where an MDS stuck in up:clientreplay state but no diagnostic was available for the below one.
  - Additionally, there is no downstream BZ for this tracker as well.

https://tracker.ceph.com/issues/56577

We did collected the debug_mds 20 & debug_ms 1 logs from the active MDS for an RCA, and need help to confirm if we are hitting issue as mentioned in the tracker.

Version of all relevant components (if applicable): ODF 4.11 + Ceph version 5.3z5


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)? No


Is there any workaround available to the best of your knowledge? Currently, issue has been resolved by restarting the active MDS pod.


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)? 1


Actual results:


Expected results:


Additional info:

We found one more tracker for the similar symptoms but that is already fixed in the version customer is using i.e 5.3z5
- https://tracker.ceph.com/issues/61523


Note You need to log in before you can comment on or make changes to this bug.