Bug 2350375 (CVE-2024-58057) - CVE-2024-58057 kernel: idpf: convert workqueues to unbound
Summary: CVE-2024-58057 kernel: idpf: convert workqueues to unbound
Keywords:
Status: NEW
Alias: CVE-2024-58057
Product: Security Response
Classification: Other
Component: vulnerability
Version: unspecified
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Product Security DevOps Team
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-03-06 16:02 UTC by OSIDB Bzimport
Modified: 2025-03-06 17:59 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description OSIDB Bzimport 2025-03-06 16:02:19 UTC
In the Linux kernel, the following vulnerability has been resolved:

idpf: convert workqueues to unbound

When a workqueue is created with `WQ_UNBOUND`, its work items are
served by special worker-pools, whose host workers are not bound to
any specific CPU. In the default configuration (i.e. when
`queue_delayed_work` and friends do not specify which CPU to run the
work item on), `WQ_UNBOUND` allows the work item to be executed on any
CPU in the same node of the CPU it was enqueued on. While this
solution potentially sacrifices locality, it avoids contention with
other processes that might dominate the CPU time of the processor the
work item was scheduled on.

This is not just a theoretical problem: in a particular scenario
misconfigured process was hogging most of the time from CPU0, leaving
less than 0.5% of its CPU time to the kworker. The IDPF workqueues
that were using the kworker on CPU0 suffered large completion delays
as a result, causing performance degradation, timeouts and eventual
system crash.


* I have also run a manual test to gauge the performance
  improvement. The test consists of an antagonist process
  (`./stress --cpu 2`) consuming as much of CPU 0 as possible. This
  process is run under `taskset 01` to bind it to CPU0, and its
  priority is changed with `chrt -pQ 9900 10000 ${pid}` and
  `renice -n -20 ${pid}` after start.

  Then, the IDPF driver is forced to prefer CPU0 by editing all calls
  to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0.

  Finally, `ktraces` for the workqueue events are collected.

  Without the current patch, the antagonist process can force
  arbitrary delays between `workqueue_queue_work` and
  `workqueue_execute_start`, that in my tests were as high as
  `30ms`. With the current patch applied, the workqueue can be
  migrated to another unloaded CPU in the same node, and, keeping
  everything else equal, the maximum delay I could see was `6us`.

Comment 1 Mauro Matteo Cascella 2025-03-06 17:55:03 UTC
Upstream advisory:
https://lore.kernel.org/linux-cve-announce/2025030607-CVE-2024-58057-5c91@gregkh/T


Note You need to log in before you can comment on or make changes to this bug.