Bug 1718135

Summary: Multiple MDS crashing with assert(mds->sessionmap.get_version() == cmapv) in ESessions::replay while replaying journal
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Prashant Dhange <pdhange>
Component: CephFSAssignee: Yan, Zheng <zyan>
Status: CLOSED ERRATA QA Contact: subhash <vpoliset>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1CC: ceph-eng-bugs, ceph-qe-bugs, edonnell, gsitlani, pdonnell, sweil, tchandra, tserlin, vumrao, zyan
Target Milestone: rcKeywords: Reopened
Target Release: 3.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.12-18.el7cp Ubuntu: ceph_12.2.12-16redhat1xenial Doc Type: Bug Fix
Doc Text:
.Partially flushed `ESessions` log event no longer cause the MDS to fail Previously, when a Ceph Metadata Server (MDS) had more than 1024 client sessions, sessions in the `ESessions` log event could get flushed partially. The journal replay code expects sessions in the `ESessions` log event to either be all flushed or not flushed at all, so this would cause the MDS to fail. With this update, the journal replay code can handle a partially flushed `ESessions` log event.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-21 15:11:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1726135    

Comment 18 Yan, Zheng 2019-06-08 13:34:18 UTC
I was wrong.  The issue was not caused by commit 8048865b "mds: properly dirty sessions opened by journal replay". It should be caused by https://tracker.ceph.com/issues/40211

Comment 19 Vikhyat Umrao 2019-06-10 16:06:31 UTC
(In reply to Yan, Zheng from comment #18)
> I was wrong.  The issue was not caused by commit 8048865b "mds: properly
> dirty sessions opened by journal replay". It should be caused by
> https://tracker.ceph.com/issues/40211

Reopening as per comment#18.

Comment 36 Yan, Zheng 2019-06-13 16:45:34 UTC
luminous backport is at https://github.com/ceph/ceph/pull/28536

Comment 51 errata-xmlrpc 2019-08-21 15:11:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2538