Bug 2420417 (CVE-2025-40329)
| Summary: | CVE-2025-40329 kernel: drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb | ||
|---|---|---|---|
| Product: | [Other] Security Response | Reporter: | OSIDB Bzimport <bzimport> |
| Component: | vulnerability | Assignee: | Product Security DevOps Team <prodsec-dev> |
| Status: | NEW --- | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | unspecified | Keywords: | Security |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | --- | |
| Doc Text: |
A deadlock flaw was found in the Linux kernel's DRM GPU scheduler. In drm_sched_entity_kill_jobs_cb(), improper locking order between the xarray lock (xa_lock) and fence lock can cause a deadlock. When CPU0 holds xa_lock and an interrupt triggers on CPU1 that requires both fence->lock and xa_lock, the system deadlocks. Additionally, if two fences share the same spinlock, calling dma_fence_add_callback() from within dma_fence_signal() creates another deadlock scenario. A local user with access to GPU resources could trigger this condition, causing a system hang.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
In the Linux kernel, the following vulnerability has been resolved: drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb The Mesa issue referenced below pointed out a possible deadlock: [ 1231.611031] Possible interrupt unsafe locking scenario: [ 1231.611033] CPU0 CPU1 [ 1231.611034] ---- ---- [ 1231.611035] lock(&xa->xa_lock#17); [ 1231.611038] local_irq_disable(); [ 1231.611039] lock(&fence->lock); [ 1231.611041] lock(&xa->xa_lock#17); [ 1231.611044] <Interrupt> [ 1231.611045] lock(&fence->lock); [ 1231.611047] *** DEADLOCK *** In this example, CPU0 would be any function accessing job->dependencies through the xa_* functions that don't disable interrupts (eg: drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()). CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling callback so in an interrupt context. It will deadlock when trying to grab the xa_lock which is already held by CPU0. Replacing all xa_* usage by their xa_*_irq counterparts would fix this issue, but Christian pointed out another issue: dma_fence_signal takes fence.lock and so does dma_fence_add_callback. dma_fence_signal() // locks f1.lock -> drm_sched_entity_kill_jobs_cb() -> foreach dependencies -> dma_fence_add_callback() // locks f2.lock This will deadlock if f1 and f2 share the same spinlock. To fix both issues, the code iterating on dependencies and re-arming them is moved out to drm_sched_entity_kill_jobs_work(). [phasta: commit message nits]