Bug 1656969

Summary: MDS busy handling reconnecting clients should extend the reconnect timeout
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Patrick Donnelly <pdonnell>
Component: CephFSAssignee: Yan, Zheng <zyan>
Status: CLOSED ERRATA QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: high Docs Contact: John Brier <jbrier>
Priority: high    
Version: 3.1CC: ceph-eng-bugs, edonnell, pasik, pdonnell, rperiyas, sweil, tchandra, tserlin, vumrao, zyan
Target Milestone: z1Keywords: CodeChange
Target Release: 3.2   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.8-64.el7cp Ubuntu: ceph_12.2.8-49redhat1 Doc Type: Bug Fix
Doc Text:
.The reconnect timeout for MDS clients has been extended When the Metadata Server (MDS) daemon was handling a large number of reconnecting clients with a huge number of capabilities to aggregate, the reconnect timeout was reached. Consequently, the MDS rejected clients that attempted to reconnect. With this update, the reconnect timeout has been extended, and MDS now handles reconnecting clients as expected in the described situation.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-07 15:51:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1629656    

Description Patrick Donnelly 2018-12-06 18:39:11 UTC
Description of problem:

When the MDS is handling hundreds of reconnecting clients with millions of caps in aggregate, it has been observed that the MDS will reject clients that have attempted to reconnect in time because the timeout has been reached simply due to the time it takes the MDS to handle reconnects.

A reproducer and test case TBW.

Comment 10 Yan, Zheng 2019-02-01 02:26:42 UTC
quite hard to verify. maybe mark it as code change

Comment 13 Yan, Zheng 2019-02-27 14:55:54 UTC
LGTM

Comment 15 errata-xmlrpc 2019-03-07 15:51:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0475