After updating from kernel 7.0.11-200.fc44 to 7.0.12-200.fc44, SDXL model inference in ComfyUI became extremely slow, taking around 388 seconds instead of around 9 seconds. The regression is present on both 7.0.12-200.fc44 and 7.0.12-201.fc44. Booting back to 7.0.11-200 restores normal performance immediately.
Steps to Reproduce:
1. Boot into kernel 7.0.12-200.fc44 or kernel 7.0.12-201.fc44
2. Start ComfyUI with ROCm (AMD Radeon RX 6950 XT, gfx1030)
3. Run a workflow with an SD1.5 model
4. Switch to an SDXL model and run the same workflow
Actual Results:
SDXL inference takes 388 seconds instead of around 9 seconds.
Expected Results:
SDXL inference takes around 9 seconds, as it does on kernel 7.0.11-200.fc44.
Environment:
GPU: AMD Radeon RX 6950 XT (gfx1030)
ROCm: 7.1.1-4.fc44
PyTorch: 2.12.0+rocm7.2
ComfyUI: 0.24.0
Logs:
https://gist.github.com/VibeCoding1337/8af1355ecf29ccb7d713ecadc14ed8d4
Reproducible: Always
This issue has also been reported to the ROCm Github:
https://github.com/ROCm/ROCm/issues/6358