Bug 2420417 (CVE-2025-40329)

Summary:	CVE-2025-40329 kernel: drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb
Product:	[Other] Security Response	Reporter:	OSIDB Bzimport <bzimport>
Component:	vulnerability	Assignee:	Product Security DevOps Team <prodsec-dev>
Status:	NEW ---	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	unspecified	Keywords:	Security
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	---
Doc Text:	A deadlock flaw was found in the Linux kernel's DRM GPU scheduler. In drm_sched_entity_kill_jobs_cb(), improper locking order between the xarray lock (xa_lock) and fence lock can cause a deadlock. When CPU0 holds xa_lock and an interrupt triggers on CPU1 that requires both fence->lock and xa_lock, the system deadlocks. Additionally, if two fences share the same spinlock, calling dma_fence_add_callback() from within dma_fence_signal() creates another deadlock scenario. A local user with access to GPU resources could trigger this condition, causing a system hang.	Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description OSIDB Bzimport 2025-12-09 05:01:59 UTC

In the Linux kernel, the following vulnerability has been resolved:

drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb

The Mesa issue referenced below pointed out a possible deadlock:

[ 1231.611031]  Possible interrupt unsafe locking scenario:

[ 1231.611033]        CPU0                    CPU1
[ 1231.611034]        ----                    ----
[ 1231.611035]   lock(&xa->xa_lock#17);
[ 1231.611038]                                local_irq_disable();
[ 1231.611039]                                lock(&fence->lock);
[ 1231.611041]                                lock(&xa->xa_lock#17);
[ 1231.611044]   <Interrupt>
[ 1231.611045]     lock(&fence->lock);
[ 1231.611047]
                *** DEADLOCK ***

In this example, CPU0 would be any function accessing job->dependencies
through the xa_* functions that don't disable interrupts (eg:
drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()).

CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling
callback so in an interrupt context. It will deadlock when trying to
grab the xa_lock which is already held by CPU0.

Replacing all xa_* usage by their xa_*_irq counterparts would fix
this issue, but Christian pointed out another issue: dma_fence_signal
takes fence.lock and so does dma_fence_add_callback.

  dma_fence_signal() // locks f1.lock
  -> drm_sched_entity_kill_jobs_cb()
  -> foreach dependencies
     -> dma_fence_add_callback() // locks f2.lock

This will deadlock if f1 and f2 share the same spinlock.

To fix both issues, the code iterating on dependencies and re-arming them
is moved out to drm_sched_entity_kill_jobs_work().

[phasta: commit message nits]

Comment 1 Alexander B 2025-12-09 13:26:55 UTC

Upstream advisory:
https://lore.kernel.org/linux-cve-announce/2025120910-CVE-2025-40329-1ead@gregkh/T