.Switch the unfair Mutex lock to fair mutex
Previously, the implementations of the _Mutex_, for example, `std::mutex` in _C++_, would not
guarantee fairness and would not guarantee that the lock would be acquired by threads in the order called `lock()`. In most cases, this worked well but in an overloaded case, the client
requests handling thread and _submit_ thread would always successfully acquire the _submit_mutex_ in a long time, causing `MDLog::trim()` to get stuck. That meant the MDS daemons would fill journal logs into the metadata pool, but could not trim the expired segments in time.
With this enhancement, the unfair Mutex lock is switched to fair mutex and all the _submit_mutex_ waiters are woken up one by one in FIFO mode.
Description of problem:
The implementations of the Mutex (e.g. std::mutex in C++) do not
guarantee fairness, they do not guarantee that the lock will be
acquired by threads in the order that they called the lock().
In most case this works well, but in overload case the client
requests handling thread and _submit_thread could always successfully
acquire the submit_mutex in a long time, which could make the
MDLog::trim() get stuck. That means the MDS daemons will fill journal
logs into the metadata pool, but couldn't trim the expired segments
in time.
This will switch the submit_mutex to fair mutex and it could make
sure that the all the submit_mutex waiters are in FIFO order and
could get a change to be excuted in time.
Fixes: https://tracker.ceph.com/issues/58000
Comment 1RHEL Program Management
2023-01-05 03:26:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2023:4473
Description of problem: The implementations of the Mutex (e.g. std::mutex in C++) do not guarantee fairness, they do not guarantee that the lock will be acquired by threads in the order that they called the lock(). In most case this works well, but in overload case the client requests handling thread and _submit_thread could always successfully acquire the submit_mutex in a long time, which could make the MDLog::trim() get stuck. That means the MDS daemons will fill journal logs into the metadata pool, but couldn't trim the expired segments in time. This will switch the submit_mutex to fair mutex and it could make sure that the all the submit_mutex waiters are in FIFO order and could get a change to be excuted in time. Fixes: https://tracker.ceph.com/issues/58000