Bug 2339132 (CVE-2025-21664)

Summary:	CVE-2025-21664 kernel: dm thin: make get_first_thin use rcu-safe list first function
Product:	[Other] Security Response	Reporter:	OSIDB Bzimport <bzimport>
Component:	vulnerability	Assignee:	Product Security DevOps Team <prodsec-dev>
Status:	NEW ---	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	unspecified	CC:	dfreiber, drow, jburrell, vkumar
Target Milestone:	---	Keywords:	Security
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description OSIDB Bzimport 2025-01-21 13:02:23 UTC

In the Linux kernel, the following vulnerability has been resolved:

dm thin: make get_first_thin use rcu-safe list first function

The documentation in rculist.h explains the absence of list_empty_rcu()
and cautions programmers against relying on a list_empty() ->
list_first() sequence in RCU safe code.  This is because each of these
functions performs its own READ_ONCE() of the list head.  This can lead
to a situation where the list_empty() sees a valid list entry, but the
subsequent list_first() sees a different view of list head state after a
modification.

In the case of dm-thin, this author had a production box crash from a GP
fault in the process_deferred_bios path.  This function saw a valid list
head in get_first_thin() but when it subsequently dereferenced that and
turned it into a thin_c, it got the inside of the struct pool, since the
list was now empty and referring to itself.  The kernel on which this
occurred printed both a warning about a refcount_t being saturated, and
a UBSAN error for an out-of-bounds cpuid access in the queued spinlock,
prior to the fault itself.  When the resulting kdump was examined, it
was possible to see another thread patiently waiting in thin_dtr's
synchronize_rcu.

The thin_dtr call managed to pull the thin_c out of the active thins
list (and have it be the last entry in the active_thins list) at just
the wrong moment which lead to this crash.

Fortunately, the fix here is straight forward.  Switch get_first_thin()
function to use list_first_or_null_rcu() which performs just a single
READ_ONCE() and returns NULL if the list is already empty.

This was run against the devicemapper test suite's thin-provisioning
suites for delete and suspend and no regressions were observed.

Comment 1 Robb Gatica 2025-01-21 17:01:00 UTC

Upstream advisory:
https://lore.kernel.org/linux-cve-announce/2025012135-CVE-2025-21664-3744@gregkh/T