Bug 1507629

Summary: mds gets significantly behind on trimming while creating millions of files
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Patrick Donnelly <pdonnell>
Component: CephFSAssignee: Yan, Zheng <zyan>
Status: CLOSED ERRATA QA Contact: Ramakrishnan Periyasamy <rperiyas>
Severity: medium Docs Contact: Erin Donnelly <edonnell>
Priority: high    
Version: 3.0CC: ceph-eng-bugs, ceph-qe-bugs, edonnell, hnallurv, john.spray, kdreyer, pdonnell, zyan
Target Milestone: z2   
Target Release: 3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.4-5.el7cp Ubuntu: ceph_12.2.1-6redhat1xenial Doc Type: Bug Fix
Doc Text:
Previously, Metadata Server (MDS) daemons could get behind on trimming for large metadata workloads in larger clusters. With this update, MDS no longer gets behind on trimming for large metadata workloads
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-26 17:38:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1548067    
Bug Blocks: 1557269    
Attachments:
Description Flags
ceph status logs
none
ceph status after upgrading cluster to ceph version 12.2.4-6.el7cp none

Comment 5 Yan, Zheng 2017-11-21 13:59:33 UTC
looks good

Comment 12 Ramakrishnan Periyasamy 2018-03-19 10:48:57 UTC
Provided qa_ack, clearing needinfo tag.

Comment 17 Ramakrishnan Periyasamy 2018-04-03 06:28:52 UTC
Created attachment 1416603 [details]
ceph status logs

Comment 19 Yan, Zheng 2018-04-04 02:28:45 UTC
It seems that there were lots slow osd requests. I don't think bumping mds_log_max_segments will help in this case.

Comment 20 Ramakrishnan Periyasamy 2018-04-04 05:19:47 UTC
Moving this bug to assigned state, since MDS journal trimming is slow.

Comment 21 Yan, Zheng 2018-04-04 07:29:49 UTC
Sorry, I spoke too early. I guest set mds_log_max_segments to around 100 may silent the warning.

Comment 23 Yan, Zheng 2018-04-05 04:24:16 UTC
The test always already had slow osd request warnings. In my option, having mds behind on trim warnings is somewhat expected.

Comment 24 Ramakrishnan Periyasamy 2018-04-05 08:36:51 UTC
In existing setup updated "mds_log_max_segments": "128" in all MDS, so far after running IO for more than 5hrs not seeing the trim message in logs.

In downstream setup always getting lot of slow request messages, is there any way it can be avoided ? do we have any upstream bug for this ?

Comment 29 Ramakrishnan Periyasamy 2018-04-09 09:13:35 UTC
Created attachment 1419159 [details]
ceph status after upgrading cluster to ceph version 12.2.4-6.el7cp

Comment 30 Ramakrishnan Periyasamy 2018-04-09 09:14:33 UTC
Moving this bug to verified state.

Not observed trim notifications in ceph status

[root@magna113 ceph]# ceph --admin-daemon ceph-mds.magna113.asok config get mds_log_max_segments 
{
    "mds_log_max_segments": "128"
}

Comment 34 errata-xmlrpc 2018-04-26 17:38:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1259