Bug 2488725

Summary:	kernel-7.0.12-200.fc44: ROCm/SDXL performance regression, 42x slower than kernel-7.0.11-200.fc44
Product:	[Fedora] Fedora	Reporter:	Lotte <fossanon>
Component:	kernel	Assignee:	Justin M. Forbes <jforbes>
Status:	NEW ---	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	44	CC:	acaringi, adscvr, airlied, hans, hpa, jforbes, kernel-maint, linville, masami256, mchehab, nickolasjcarr, ptalbert, steved, suraj.ghimire7
Target Milestone:	---	Keywords:	Regression
Target Release:	---
Hardware:	x86_64
OS:	Linux
URL:	https://github.com/ROCm/ROCm/issues/6358
Whiteboard:
Fixed In Version:		Doc Type:	---
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Lotte 2026-06-14 06:21:36 UTC

After updating from kernel 7.0.11-200.fc44 to 7.0.12-200.fc44, SDXL model inference in ComfyUI became extremely slow, taking around 388 seconds instead of around 9 seconds. The regression is present on both 7.0.12-200.fc44 and 7.0.12-201.fc44. Booting back to 7.0.11-200 restores normal performance immediately.

Steps to Reproduce:
1. Boot into kernel 7.0.12-200.fc44 or kernel 7.0.12-201.fc44
2. Start ComfyUI with ROCm (AMD Radeon RX 6950 XT, gfx1030)
3. Run a workflow with an SD1.5 model
4. Switch to an SDXL model and run the same workflow

Actual Results:
SDXL inference takes 388 seconds instead of around 9 seconds.

Expected Results:
SDXL inference takes around 9 seconds, as it does on kernel 7.0.11-200.fc44.

Environment:
GPU: AMD Radeon RX 6950 XT (gfx1030)
ROCm: 7.1.1-4.fc44
PyTorch: 2.12.0+rocm7.2
ComfyUI: 0.24.0

Logs:
https://gist.github.com/VibeCoding1337/8af1355ecf29ccb7d713ecadc14ed8d4

Reproducible: Always

This issue has also been reported to the ROCm Github:
https://github.com/ROCm/ROCm/issues/6358