2298136 – (CVE-2022-48800) CVE-2022-48800 kernel: mm: vmscan: remove deadlock due to throttling failing to make progress

Bug 2298136 (CVE-2022-48800) - CVE-2022-48800 kernel: mm: vmscan: remove deadlock due to throttling failing to make progress

Summary: CVE-2022-48800 kernel: mm: vmscan: remove deadlock due to throttling failing ...

Keywords:
Status:	NEW
Alias:	CVE-2022-48800
Product:	Security Response
Classification:	Other
Component:	vulnerability
Sub Component:
Version:	unspecified
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Product Security DevOps Team
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-07-16 12:28 UTC by OSIDB Bzimport
Modified:	2024-07-30 16:57 UTC (History)
CC List:	4 users (show)
Fixed In Version:	kernel 5.16.10, kernel 5.17
Clone Of:
Environment:
Last Closed:
Embargoed:

Attachments	(Terms of Use)

Description OSIDB Bzimport 2024-07-16 12:28:12 UTC

In the Linux kernel, the following vulnerability has been resolved:

mm: vmscan: remove deadlock due to throttling failing to make progress

A soft lockup bug in kcompactd was reported in a private bugzilla with
the following visible in dmesg;

  watchdog: BUG: soft lockup - CPU#33 stuck for 26s! [kcompactd0:479]
  watchdog: BUG: soft lockup - CPU#33 stuck for 52s! [kcompactd0:479]
  watchdog: BUG: soft lockup - CPU#33 stuck for 78s! [kcompactd0:479]
  watchdog: BUG: soft lockup - CPU#33 stuck for 104s! [kcompactd0:479]

The machine had 256G of RAM with no swap and an earlier failed
allocation indicated that node 0 where kcompactd was run was potentially
unreclaimable;

  Node 0 active_anon:29355112kB inactive_anon:2913528kB active_file:0kB
    inactive_file:0kB unevictable:64kB isolated(anon):0kB isolated(file):0kB
    mapped:8kB dirty:0kB writeback:0kB shmem:26780kB shmem_thp:
    0kB shmem_pmdmapped: 0kB anon_thp: 23480320kB writeback_tmp:0kB
    kernel_stack:2272kB pagetables:24500kB all_unreclaimable? yes

Vlastimil Babka investigated a crash dump and found that a task
migrating pages was trying to drain PCP lists;

  PID: 52922  TASK: ffff969f820e5000  CPU: 19  COMMAND: "kworker/u128:3"
  Call Trace:
     __schedule
     schedule
     schedule_timeout
     wait_for_completion
     __flush_work
     __drain_all_pages
     __alloc_pages_slowpath.constprop.114
     __alloc_pages
     alloc_migration_target
     migrate_pages
     migrate_to_node
     do_migrate_pages
     cpuset_migrate_mm_workfn
     process_one_work
     worker_thread
     kthread
     ret_from_fork

This failure is specific to CONFIG_PREEMPT=n builds.  The root of the
problem is that kcompact0 is not rescheduling on a CPU while a task that
has isolated a large number of the pages from the LRU is waiting on
kcompact0 to reschedule so the pages can be released.  While
shrink_inactive_list() only loops once around too_many_isolated, reclaim
can continue without rescheduling if sc->skipped_deactivate == 1 which
could happen if there was no file LRU and the inactive anon list was not
low.

Note You need to log in before you can comment on or make changes to this bug.