Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1656969

Summary:	MDS busy handling reconnecting clients should extend the reconnect timeout
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Patrick Donnelly <pdonnell>
Component:	CephFS	Assignee:	Yan, Zheng <zyan>
Status:	CLOSED ERRATA	QA Contact:	ceph-qe-bugs <ceph-qe-bugs>
Severity:	high	Docs Contact:	John Brier <jbrier>
Priority:	high
Version:	3.1	CC:	ceph-eng-bugs, edonnell, pasik, pdonnell, rperiyas, sweil, tchandra, tserlin, vumrao, zyan
Target Milestone:	z1	Keywords:	CodeChange
Target Release:	3.2
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:	RHEL: ceph-12.2.8-64.el7cp Ubuntu: ceph_12.2.8-49redhat1	Doc Type:	Bug Fix
Doc Text:	.The reconnect timeout for MDS clients has been extended When the Metadata Server (MDS) daemon was handling a large number of reconnecting clients with a huge number of capabilities to aggregate, the reconnect timeout was reached. Consequently, the MDS rejected clients that attempted to reconnect. With this update, the reconnect timeout has been extended, and MDS now handles reconnecting clients as expected in the described situation.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-03-07 15:51:12 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1629656

Description Patrick Donnelly 2018-12-06 18:39:11 UTC

Description of problem:

When the MDS is handling hundreds of reconnecting clients with millions of caps in aggregate, it has been observed that the MDS will reject clients that have attempted to reconnect in time because the timeout has been reached simply due to the time it takes the MDS to handle reconnects.

A reproducer and test case TBW.

Comment 10 Yan, Zheng 2019-02-01 02:26:42 UTC

quite hard to verify. maybe mark it as code change

Comment 13 Yan, Zheng 2019-02-27 14:55:54 UTC

LGTM

Comment 15 errata-xmlrpc 2019-03-07 15:51:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0475