Bug 2420417 (CVE-2025-40329)

Summary: CVE-2025-40329 kernel: drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb
Product: [Other] Security Response Reporter: OSIDB Bzimport <bzimport>
Component: vulnerabilityAssignee: Product Security DevOps Team <prodsec-dev>
Status: NEW --- QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedKeywords: Security
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
A deadlock flaw was found in the Linux kernel's DRM GPU scheduler. In drm_sched_entity_kill_jobs_cb(), improper locking order between the xarray lock (xa_lock) and fence lock can cause a deadlock. When CPU0 holds xa_lock and an interrupt triggers on CPU1 that requires both fence->lock and xa_lock, the system deadlocks. Additionally, if two fences share the same spinlock, calling dma_fence_add_callback() from within dma_fence_signal() creates another deadlock scenario. A local user with access to GPU resources could trigger this condition, causing a system hang.
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description OSIDB Bzimport 2025-12-09 05:01:59 UTC
In the Linux kernel, the following vulnerability has been resolved:

drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb

The Mesa issue referenced below pointed out a possible deadlock:

[ 1231.611031]  Possible interrupt unsafe locking scenario:

[ 1231.611033]        CPU0                    CPU1
[ 1231.611034]        ----                    ----
[ 1231.611035]   lock(&xa->xa_lock#17);
[ 1231.611038]                                local_irq_disable();
[ 1231.611039]                                lock(&fence->lock);
[ 1231.611041]                                lock(&xa->xa_lock#17);
[ 1231.611044]   <Interrupt>
[ 1231.611045]     lock(&fence->lock);
[ 1231.611047]
                *** DEADLOCK ***

In this example, CPU0 would be any function accessing job->dependencies
through the xa_* functions that don't disable interrupts (eg:
drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()).

CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling
callback so in an interrupt context. It will deadlock when trying to
grab the xa_lock which is already held by CPU0.

Replacing all xa_* usage by their xa_*_irq counterparts would fix
this issue, but Christian pointed out another issue: dma_fence_signal
takes fence.lock and so does dma_fence_add_callback.

  dma_fence_signal() // locks f1.lock
  -> drm_sched_entity_kill_jobs_cb()
  -> foreach dependencies
     -> dma_fence_add_callback() // locks f2.lock

This will deadlock if f1 and f2 share the same spinlock.

To fix both issues, the code iterating on dependencies and re-arming them
is moved out to drm_sched_entity_kill_jobs_work().

[phasta: commit message nits]