2182566 – mds: force replay sessionmap version

Bug 2182566 - mds: force replay sessionmap version

Summary: mds: force replay sessionmap version

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	CephFS
Sub Component:
Version:	6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	5.3z3
Assignee:	Xiubo Li
QA Contact:	Hemanth Kumar
Docs Contact:
URL:
Whiteboard:
Depends On:	2182564
Blocks:
TreeView+	depends on / blocked

Reported:	2023-03-29 03:21 UTC by Xiubo Li
Modified:	2023-05-23 00:19 UTC (History)
CC List:	7 users (show)
Fixed In Version:	ceph-16.2.10-163.el8cp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2182564
Environment:
Last Closed:	2023-05-23 00:19:10 UTC
Embargoed:
Dependent Products:
Flags:	vereddy: needinfo+

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	58489	None	None	None	2023-03-29 03:43:02 UTC
Red Hat Issue Tracker	RHCEPH-6340	None	None	None	2023-03-29 03:24:00 UTC
Red Hat Product Errata	RHBA-2023:3259	None	None	None	2023-05-23 00:19:46 UTC

Description Xiubo Li 2023-03-29 03:21:05 UTC

+++ This bug was initially created as a clone of Bug #2182564 +++

The issue is reported by upstream community user.

The cluster had two filesystems and the active mds of both the filesystems were stuck in 'up:replay'.
This was the case for around 2 days. Later, one of the active mds (stuck in up:replay) state crashed
with below stack trace.

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/mds/journal.cc:
In function 'void EMetaBlob::replay(MDSRank*, LogSegment*,
MDPeerUpdate*)' thread 7fccc7153700 time 2023-01-17T10:05:15.420191+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/mds/journal.cc:
1625: FAILED ceph_assert(g_conf()->mds_wipe_sessions)

  ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy
(stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x135) [0x7fccd759943f]
  2: /usr/lib64/ceph/libceph-common.so.2(+0x269605) [0x7fccd7599605]
  3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*)+0x5e5c)
[0x55fb2b98e89c]
  4: (EUpdate::replay(MDSRank*)+0x40) [0x55fb2b98f5a0]
  5: (MDLog::_replay_thread()+0x9b3) [0x55fb2b915443]
  6: (MDLog::ReplayThread::entry()+0x11) [0x55fb2b5d1e31]
  7: /lib64/libpthread.so.0(+0x81ca) [0x7fccd65891ca]
  8: clone()

The ceph tracker is https://tracker.ceph.com/issues/58489.

Comment 13 errata-xmlrpc 2023-05-23 00:19:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.3 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3259

Note You need to log in before you can comment on or make changes to this bug.