Bug 1713527 - [GSS] clients not able to mount cephfs and mds stuck in up:replay
Summary: [GSS] clients not able to mount cephfs and mds stuck in up:replay
Keywords:
Status: CLOSED DUPLICATE of bug 1714810
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 3.1
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: 3.3
Assignee: Patrick Donnelly
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-24 01:26 UTC by Prashant Dhange
Modified: 2021-08-27 22:38 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-14 18:03:57 UTC
Embargoed:
pdhange: automate_bug?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-1100 0 None None None 2021-08-27 22:38:34 UTC

Internal Links: 1714810 1714812 1714814

Comment 3 Yan, Zheng 2019-05-24 02:38:04 UTC
"Behind on trimming (105235/128)max_segments" can explain this. There are lots of log segments, replaying them require long time.

Ask customer to not ignore 'MDSs behind on trimming' health warning next time

Comment 10 Yan, Zheng 2019-05-24 07:45:38 UTC
Connected clients do affect mds journal replay (It's unlikely that they do IOs on metadata pool). The best solution for now is wait until journal replay finishes. Because journal reset and scan whole filesystem may also require very long time. 

Disable all mds debug can speed up journal replay.

Comment 11 Yan, Zheng 2019-05-24 08:01:21 UTC
Sorry. I mean "Connected clients do not affect mds journal replay"

Comment 13 Yan, Zheng 2019-05-24 08:31:45 UTC
For /cases/02388834/ceph-mds.storageM3-STG-NGN1.log.tgz/ceph-mds.storageM3-STG-NGN1.log

The recovering mds had "heartbeat map not healthy" when it's in rejoin stage. It likely the mds was iterating all inodes. To prevent mds from being replaced by monitor, set mds_beacon_grace config of monitor to 300 or more.

Comment 18 Yan, Zheng 2019-05-25 01:35:54 UTC
mds_log_max_segments default is 128.  decrease it by 100 every 10 seconds, until it reach 128


There are lots of log segments in this case. when mds become active, it tries trimming all of them, which create lots of osd requests.

Comment 32 Yan, Zheng 2019-05-27 13:25:24 UTC
no new discover from the log. still looks like http://tracker.ceph.com/issues/40028


Note You need to log in before you can comment on or make changes to this bug.