Back to bug 2114612

Who When What Removed Added
Vikhyat Umrao 2022-08-02 23:45:54 UTC Link ID Github ceph/ceph/pull/47216
Red Hat One Jira (issues.redhat.com) 2022-08-02 23:49:57 UTC Link ID Red Hat Issue Tracker RHCEPH-4986
Vikhyat Umrao 2022-08-02 23:50:10 UTC Keywords Performance
Veera Raghava Reddy 2022-08-03 03:41:54 UTC CC vereddy
Severity unspecified high
Sridhar Seshasayee 2022-08-09 17:24:41 UTC Link ID Github 47490
Sridhar Seshasayee 2022-08-09 17:25:29 UTC Link ID Github 47490
Sridhar Seshasayee 2022-08-09 17:26:12 UTC Link ID Github ceph/ceph/pull/47490
Sridhar Seshasayee 2022-08-10 11:30:01 UTC Status ASSIGNED POST
Sunil Angadi 2022-08-11 04:45:16 UTC Status POST MODIFIED
Fixed In Version ceph-17.2.3-3.el9cp
CC tserlin
CC sangadi
errata-xmlrpc 2022-08-15 15:30:08 UTC Status MODIFIED ON_QA
Eliska 2022-09-08 07:24:50 UTC CC skanta
QA Contact pdhiran skanta
CC ekristov
Flags needinfo?(sseshasa)
Sridhar Seshasayee 2022-09-08 13:35:04 UTC Doc Type If docs needed, set a value Bug Fix
Flags needinfo?(sseshasa)
Doc Text Cause:
A worker thread with the smallest index in a OSD shard may
sometimes have to wait for future work items from the mClock
queue. In such a case, oncommit callbacks are called. But
after the callback was invoked, the thread did not continue
waiting but returned back to the main worker loop.

Consequence:
This resulted in the threads with the smallest index across
all OSD shards to busy loop causing very high CPU utilization.

Fix:
The fix involved reacquiring the appropriate lock and waiting
until notified or until time period as indicated by the mclock
scheduler lapsed.

Result:
The worker thread with the smallest thread index waits until
an item can be scheduled from the mClock queue or until
notified and then returns back to the main worker loop. This
eliminates the busy loop and solves the high CPU utilization
issue.
Eliska 2022-09-13 08:53:03 UTC Blocks 2126050
Pawan 2022-09-19 09:20:14 UTC CC pdhiran
Eliska 2022-09-21 13:06:45 UTC Flags needinfo?(sseshasa)
Doc Text Cause:
A worker thread with the smallest index in a OSD shard may
sometimes have to wait for future work items from the mClock
queue. In such a case, oncommit callbacks are called. But
after the callback was invoked, the thread did not continue
waiting but returned back to the main worker loop.

Consequence:
This resulted in the threads with the smallest index across
all OSD shards to busy loop causing very high CPU utilization.

Fix:
The fix involved reacquiring the appropriate lock and waiting
until notified or until time period as indicated by the mclock
scheduler lapsed.

Result:
The worker thread with the smallest thread index waits until
an item can be scheduled from the mClock queue or until
notified and then returns back to the main worker loop. This
eliminates the busy loop and solves the high CPU utilization
issue.
.Slow progress and high CPU utilization during backfill is resolved

Previously, the worker thread with the smallest index in an OSD shard would return to the main worker loop, instead of waiting until an item could be scheduled from the mClock queue or until notified.
This resulted in the busy loop and high CPU utilization.

With this fix, the worker thread with the smallest thread index reacquires the appropriate lock and waits until notified, or until time period lapses as indicated by the mClock scheduler.
The worker thread now waits until an item can be scheduled from the mClock queue or until notified and then returns to the main worker loop thereby eliminating the busy loop and solving the high CPU utilization issue.
Docs Contact ekristov
Sridhar Seshasayee 2022-09-21 13:45:33 UTC Flags needinfo?(sseshasa)
Red Hat Bugzilla 2022-12-31 19:09:52 UTC Status ON_QA VERIFIED
QA Contact skanta pdhiran
CC skanta
Red Hat Bugzilla 2022-12-31 19:13:28 UTC CC amathuri
Red Hat Bugzilla 2022-12-31 19:32:34 UTC CC pdhiran
QA Contact pdhiran
Red Hat Bugzilla 2022-12-31 20:00:00 UTC CC sseshasa
Assignee sseshasa nojha
Red Hat Bugzilla 2022-12-31 22:43:30 UTC CC rfriedma
Red Hat Bugzilla 2022-12-31 23:43:32 UTC CC rzarzyns
Red Hat Bugzilla 2022-12-31 23:45:52 UTC CC akupczyk
Red Hat Bugzilla 2023-01-01 05:35:20 UTC CC ksirivad
Red Hat Bugzilla 2023-01-01 05:39:36 UTC CC tserlin
Red Hat Bugzilla 2023-01-01 06:27:07 UTC CC lflores
Red Hat Bugzilla 2023-01-01 06:29:01 UTC CC choffman
Red Hat Bugzilla 2023-01-01 08:38:30 UTC CC nojha
Assignee nojha nobody
Red Hat Bugzilla 2023-01-01 08:39:33 UTC CC pdhange
Red Hat Bugzilla 2023-01-01 08:40:54 UTC CC sangadi
Red Hat Bugzilla 2023-01-01 08:48:15 UTC CC vereddy
Red Hat Bugzilla 2023-01-01 08:49:51 UTC CC vumrao
Alasdair Kergon 2023-01-04 04:40:45 UTC CC akupczyk
Alasdair Kergon 2023-01-04 04:43:34 UTC CC amathuri
Alasdair Kergon 2023-01-04 04:53:27 UTC Assignee nobody sseshasa
Alasdair Kergon 2023-01-04 04:55:59 UTC QA Contact skanta
Alasdair Kergon 2023-01-04 05:08:58 UTC CC ksirivad
Alasdair Kergon 2023-01-04 05:10:58 UTC CC lflores
Alasdair Kergon 2023-01-04 05:21:38 UTC CC nojha
Alasdair Kergon 2023-01-04 05:28:18 UTC CC pdhange
Alasdair Kergon 2023-01-04 05:30:13 UTC CC pdhiran
Alasdair Kergon 2023-01-04 05:34:52 UTC CC rfriedma
Alasdair Kergon 2023-01-04 05:37:37 UTC CC rzarzyns
Alasdair Kergon 2023-01-04 05:39:59 UTC CC sangadi
Alasdair Kergon 2023-01-04 05:41:45 UTC CC skanta
Alasdair Kergon 2023-01-04 05:59:30 UTC CC vumrao
Alasdair Kergon 2023-01-04 06:13:47 UTC CC choffman
Alasdair Kergon 2023-01-04 06:56:31 UTC CC sseshasa
Alasdair Kergon 2023-01-04 06:59:12 UTC CC vereddy
Red Hat Bugzilla 2023-01-09 08:28:53 UTC CC ceph-eng-bugs
Alasdair Kergon 2023-01-09 19:43:36 UTC CC ceph-eng-bugs
errata-xmlrpc 2023-03-20 18:38:12 UTC Status VERIFIED RELEASE_PENDING
errata-xmlrpc 2023-03-20 18:57:13 UTC Resolution --- ERRATA
Status RELEASE_PENDING CLOSED
Last Closed 2023-03-20 18:57:13 UTC
errata-xmlrpc 2023-03-20 18:57:56 UTC Link ID Red Hat Product Errata RHBA-2023:1360

Back to bug 2114612