Bug 2488725

Summary: kernel-7.0.12-200.fc44: ROCm/SDXL performance regression, 42x slower than kernel-7.0.11-200.fc44
Product: [Fedora] Fedora Reporter: Lotte <fossanon>
Component: kernelAssignee: Justin M. Forbes <jforbes>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 44CC: acaringi, adscvr, airlied, hans, hpa, jforbes, kernel-maint, linville, masami256, mchehab, nickolasjcarr, ptalbert, steved, suraj.ghimire7
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
URL: https://github.com/ROCm/ROCm/issues/6358
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lotte 2026-06-14 06:21:36 UTC
After updating from kernel 7.0.11-200.fc44 to 7.0.12-200.fc44, SDXL model inference in ComfyUI became extremely slow, taking around 388 seconds instead of around 9 seconds. The regression is present on both 7.0.12-200.fc44 and 7.0.12-201.fc44. Booting back to 7.0.11-200 restores normal performance immediately.

Steps to Reproduce:
1. Boot into kernel 7.0.12-200.fc44 or kernel 7.0.12-201.fc44
2. Start ComfyUI with ROCm (AMD Radeon RX 6950 XT, gfx1030)
3. Run a workflow with an SD1.5 model
4. Switch to an SDXL model and run the same workflow

Actual Results:
SDXL inference takes 388 seconds instead of around 9 seconds.

Expected Results:
SDXL inference takes around 9 seconds, as it does on kernel 7.0.11-200.fc44.

Environment:
GPU: AMD Radeon RX 6950 XT (gfx1030)
ROCm: 7.1.1-4.fc44
PyTorch: 2.12.0+rocm7.2
ComfyUI: 0.24.0

Logs:
https://gist.github.com/VibeCoding1337/8af1355ecf29ccb7d713ecadc14ed8d4

Reproducible: Always

This issue has also been reported to the ROCm Github:
https://github.com/ROCm/ROCm/issues/6358