Bug 2248825 - [cee/sd][cephfs] mds pods are crashing with ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE || state == LOCK_XLOCKSNAP || state == LOCK_LOCK_XLOCK || state == LOCK_LOCK || is_locallock())
Summary: [cee/sd][cephfs] mds pods are crashing with ceph_assert(state == LOCK_XLOCK |...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 5.3z6
Assignee: Xiubo Li
QA Contact: Hemanth Kumar
Ranjini M N
URL:
Whiteboard:
: 2228251 (view as bug list)
Depends On:
Blocks: 2258797
TreeView+ depends on / blocked
 
Reported: 2023-11-09 08:10 UTC by Lijo Stephen Thomas
Modified: 2024-03-29 12:57 UTC (History)
18 users (show)

Fixed In Version: ceph-16.2.10-219.el8cp
Doc Type: Bug Fix
Doc Text:
.The MDS no longer crashes when the journal logs are flushed Previously, when the journal logs were successfully flushed, you could set the lockers’ state to `LOCK_SYNC` or `LOCK_PREXLOCK` when the `xclock` count was non-zero. However, the MDS would not allow that and would crash. With this fix, MDS allows the lockers’ state to `LOCK_SYNC` or `LOCK_PREXLOCK` when the `xclock` count is non-zero and the MDS does not crash.
Clone Of:
Environment:
Last Closed: 2024-02-08 16:56:42 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 62523 0 None None None 2023-11-10 05:18:42 UTC
Red Hat Issue Tracker RHCEPH-7887 0 None None None 2023-11-10 05:09:55 UTC
Red Hat Knowledge Base (Solution) 7045353 0 None None None 2023-11-18 20:18:33 UTC
Red Hat Product Errata RHSA-2024:0745 0 None None None 2024-02-08 16:56:46 UTC

Description Lijo Stephen Thomas 2023-11-09 08:10:56 UTC
Description of problem (please be detailed as possible and provide log snippets):
---------------------------------------------------------------------------------
MDS pods are crashing frequently with ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE || state == LOCK_XLOCKSNAP || state == LOCK_LOCK_XLOCK || state == LOCK_LOCK || is_locallock()) and MDS pod crashes are observed everyday from Oct 16 2023.  

This BZ is created to track this issue downstream.

Version of all relevant components (if applicable):
---------------------------------------------------
RHCS - 16.2.10-187.el8cp  / RHODF 4.11


Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)?
----------------------------------------------------------------------------------------------------------------------------

Is there any workaround available to the best of your knowledge?
----------------------------------------------------------------
N/A

Can this issue reproducible?
----------------------------
No, at customer environment it is present


Can this issue reproduce from the UI?
-------------------------------------
No

Additional info:
----------------
Upstream trackers: https://tracker.ceph.com/issues/44565
Backport trackers:
quincy - https://tracker.ceph.com/issues/62522
pacific - https://tracker.ceph.com/issues/62523

Comment 25 Greg Farnum 2023-12-20 04:05:01 UTC
*** Bug 2228251 has been marked as a duplicate of this bug. ***

Comment 34 errata-xmlrpc 2024-02-08 16:56:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.3 Security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:0745


Note You need to log in before you can comment on or make changes to this bug.