Back to bug 2060989

Who When What Removed Added
Greg Farnum 2022-03-04 20:21:47 UTC Blocks 2042863
Depends On 2041660
Target Release 5.1 6.1
Status NEW ASSIGNED
Assignee vshankar xiubli
Red Hat One Jira (issues.redhat.com) 2022-03-04 20:26:46 UTC Link ID Red Hat Issue Tracker RHCEPH-3652
Red Hat Bugzilla 2022-05-26 08:30:39 UTC CC ceph-qe-bugs
Hemanth Kumar 2022-06-27 05:32:47 UTC QA Contact hyelloji amk
Xiubo Li 2022-07-11 07:32:36 UTC Status ASSIGNED POST
errata-xmlrpc 2022-08-15 21:13:56 UTC Status POST MODIFIED
Fixed In Version ceph-17.2.3-1.el9cp
Target Release 6.1 6.0
Status MODIFIED ON_QA
Masauso Lungu 2022-09-09 08:50:12 UTC Flags needinfo?(xiubli)
Docs Contact mlungu
CC mlungu
Xiubo Li 2022-09-09 09:01:37 UTC Doc Text Cause: In MDS daemon there has a global mds_lock mutex and has multiple threads. If one thread has a lot of work to do it will hold the mds_lock for a long time.

Consequence: The other threads will be starve and be stuck for a long time to be scheduled, and the MDS daemon may fail to report the heartbeat to monitor in time and then be kicked off from the cluster.

Fix: Reset heartbeat in each thread after each queued work.

Result: No heartbeat timedout will happen.
Flags needinfo?(xiubli)
Doc Type If docs needed, set a value Bug Fix
Masauso Lungu 2022-09-21 18:25:49 UTC Blocks 2126050
Masauso Lungu 2022-09-29 14:46:58 UTC Flags needinfo?(xiubli)
Doc Text Cause: In MDS daemon there has a global mds_lock mutex and has multiple threads. If one thread has a lot of work to do it will hold the mds_lock for a long time.

Consequence: The other threads will be starve and be stuck for a long time to be scheduled, and the MDS daemon may fail to report the heartbeat to monitor in time and then be kicked off from the cluster.

Fix: Reset heartbeat in each thread after each queued work.

Result: No heartbeat timedout will happen.
.‘MDS daemon’ now resets the heartbeat in each thread after each queued work

Previously, a thread would hold the `mds_lock` for a longtime if it had a lot of work to do. This caused other threads to be starved of resources and be stuck for a longtime, as a result MDS daemon would fail to report the heartbeat to monitor in time and be kicked out of the cluster.

With this fix, the MDS daemon resets the heartbeat in each thread after each queued work.
Venky Shankar 2022-09-29 15:49:31 UTC Flags needinfo?(xiubli)
Hemanth Kumar 2022-11-18 04:48:57 UTC Status ON_QA VERIFIED
Red Hat Bugzilla 2022-12-31 16:21:21 UTC Assignee xiubli vshankar
CC xiubli
Red Hat Bugzilla 2022-12-31 19:50:26 UTC CC hyelloji
Red Hat Bugzilla 2023-01-01 05:39:34 UTC CC tserlin
Red Hat Bugzilla 2023-01-01 08:28:28 UTC QA Contact amk
Red Hat Bugzilla 2023-01-01 08:48:05 UTC CC vereddy
Red Hat Bugzilla 2023-01-01 08:49:21 UTC CC vshankar
Assignee vshankar nobody
Alasdair Kergon 2023-01-04 04:36:31 UTC QA Contact amk
Alasdair Kergon 2023-01-04 04:55:53 UTC Assignee nobody xiubli
Alasdair Kergon 2023-01-04 04:57:17 UTC CC hyelloji
Alasdair Kergon 2023-01-04 05:57:59 UTC CC vshankar
Alasdair Kergon 2023-01-04 06:01:29 UTC CC xiubli
Alasdair Kergon 2023-01-04 06:25:53 UTC CC tserlin
Alasdair Kergon 2023-01-04 06:59:12 UTC CC vereddy
Red Hat Bugzilla 2023-01-09 08:30:44 UTC CC ceph-eng-bugs
Alasdair Kergon 2023-01-09 19:43:36 UTC CC ceph-eng-bugs
errata-xmlrpc 2023-03-20 18:37:29 UTC Status VERIFIED RELEASE_PENDING
errata-xmlrpc 2023-03-20 18:55:34 UTC Status RELEASE_PENDING CLOSED
Resolution --- ERRATA
Last Closed 2023-03-20 18:55:34 UTC
errata-xmlrpc 2023-03-20 18:56:03 UTC Link ID Red Hat Product Errata RHBA-2023:1360

Back to bug 2060989