Bug 1656969 - MDS busy handling reconnecting clients should extend the reconnect timeout
Summary: MDS busy handling reconnecting clients should extend the reconnect timeout
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 3.1
Hardware: All
OS: All
high
high
Target Milestone: z1
: 3.2
Assignee: Yan, Zheng
QA Contact: ceph-qe-bugs
John Brier
URL:
Whiteboard:
Depends On:
Blocks: 1629656
TreeView+ depends on / blocked
 
Reported: 2018-12-06 18:39 UTC by Patrick Donnelly
Modified: 2019-03-07 15:51 UTC (History)
10 users (show)

Fixed In Version: RHEL: ceph-12.2.8-64.el7cp Ubuntu: ceph_12.2.8-49redhat1
Doc Type: Bug Fix
Doc Text:
.The reconnect timeout for MDS clients has been extended When the Metadata Server (MDS) daemon was handling a large number of reconnecting clients with a huge number of capabilities to aggregate, the reconnect timeout was reached. Consequently, the MDS rejected clients that attempted to reconnect. With this update, the reconnect timeout has been extended, and MDS now handles reconnecting clients as expected in the described situation.
Clone Of:
Environment:
Last Closed: 2019-03-07 15:51:12 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 37739 0 None None None 2019-01-03 22:54:06 UTC
Red Hat Product Errata RHBA-2019:0475 0 None None None 2019-03-07 15:51:24 UTC

Description Patrick Donnelly 2018-12-06 18:39:11 UTC
Description of problem:

When the MDS is handling hundreds of reconnecting clients with millions of caps in aggregate, it has been observed that the MDS will reject clients that have attempted to reconnect in time because the timeout has been reached simply due to the time it takes the MDS to handle reconnects.

A reproducer and test case TBW.

Comment 10 Yan, Zheng 2019-02-01 02:26:42 UTC
quite hard to verify. maybe mark it as code change

Comment 13 Yan, Zheng 2019-02-27 14:55:54 UTC
LGTM

Comment 15 errata-xmlrpc 2019-03-07 15:51:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0475


Note You need to log in before you can comment on or make changes to this bug.