2158304 – mds: switch submit_mutex to fair mutex for MDLog

Bug 2158304 - mds: switch submit_mutex to fair mutex for MDLog

Summary: mds: switch submit_mutex to fair mutex for MDLog

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	CephFS
Sub Component:
Version:	6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	6.1z1
Assignee:	Xiubo Li
QA Contact:	Hemanth Kumar
Docs Contact:	Akash Raj
URL:
Whiteboard:
Depends On:
Blocks:	2221020
TreeView+	depends on / blocked

Reported:	2023-01-05 03:26 UTC by Xiubo Li
Modified:	2023-08-03 16:46 UTC (History)
CC List:	6 users (show)
Fixed In Version:	ceph-17.2.6-84.el9cp
Doc Type:	Enhancement
Doc Text:	.Switch the unfair Mutex lock to fair mutex Previously, the implementations of the _Mutex_, for example, `std::mutex` in _C++_, would not guarantee fairness and would not guarantee that the lock would be acquired by threads in the order called `lock()`. In most cases, this worked well but in an overloaded case, the client requests handling thread and _submit_ thread would always successfully acquire the _submit_mutex_ in a long time, causing `MDLog::trim()` to get stuck. That meant the MDS daemons would fill journal logs into the metadata pool, but could not trim the expired segments in time. With this enhancement, the unfair Mutex lock is switched to fair mutex and all the _submit_mutex_ waiters are woken up one by one in FIFO mode.
Clone Of:
Environment:
Last Closed:	2023-08-03 16:45:09 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	58000	None	None	None	2023-01-05 03:48:16 UTC
Red Hat Issue Tracker	RHCEPH-5885	None	None	None	2023-01-05 03:31:22 UTC
Red Hat Product Errata	RHBA-2023:4473	None	None	None	2023-08-03 16:46:06 UTC

Description Xiubo Li 2023-01-05 03:26:45 UTC

Description of problem:


    The implementations of the Mutex (e.g. std::mutex in C++) do not
    guarantee fairness, they do not guarantee that the lock will be
    acquired by threads in the order that they called the lock().
    
    In most case this works well, but in overload case the client
    requests handling thread and _submit_thread could always successfully
    acquire the submit_mutex in a long time, which could make the
    MDLog::trim() get stuck. That means the MDS daemons will fill journal
    logs into the metadata pool, but couldn't trim the expired segments
    in time.
    
    This will switch the submit_mutex to fair mutex and it could make
    sure that the all the submit_mutex waiters are in FIFO order and
    could get a change to be excuted in time.
    
    Fixes: https://tracker.ceph.com/issues/58000

Comment 1 RHEL Program Management 2023-01-05 03:26:56 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 10 errata-xmlrpc 2023-08-03 16:45:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:4473

Note You need to log in before you can comment on or make changes to this bug.